bioconductor / orchestratingsinglecellanalysis Goto Github PK

Content for the OSCA Book.

Home Page: http://bioconductor.org/books/devel/OSCA/

Dockerfile 100.00%

book book-base single-cell rna-seq single-cell-rna-seq

orchestratingsinglecellanalysis's Introduction

Orchestrating Single Cell Analysis

All book-related source code has been moved to the OSCA-source organization. This repository is retained only as a signpost... and also to provide a Docker image.

The Docker image itself contains all of the packages required to create the full set of OSCA books. The latest sources of the books themselves are stored in /home/book/.

orchestratingsinglecellanalysis's People

Contributors

Stargazers

Watchers

orchestratingsinglecellanalysis's Issues

Clean up the introduction section

Make it more concise.
Move the quick start section to this part.
Move the overview section to this part?

Interpretation Figure 7.7 in Chapter 7.4.2 Differential Expression

You show a violin plot in Chapter 7.4.2 (Figure 7.7) of the excellent workflow “Orchestrating Single-Cell Analysis with Bioconductor”. However, I am confused about the text accompanying the figure:
“This plot highlights that CD3D is more highly expressed in cluster 1 relative to some of the other clusters, but not all. This can also be seen from our raw output above, where the log fold-change is calculated with respect to each cluster. There, we see that the log fold-change for CD3D is very high only relative to clusters 2 and 3 (meeting our cutoff of 1.5).”

How do you see that CD3D is more highly expressed in cluster 1 relative to some other clusters? Based on the plot, I thought that CD3D is more highly expressed in cluster 3,4,5 compared to clusters 1 and 2 (because the mean expression of clusters 3,4,5 is higher than in cluster 1 and 2).
Moreover, CD3D is not present in the raw output above the figure.

Thanks in advance.

Add the Messmer example for scPCA

Tagging @PhilBoileau. You don't have to do anything, it's just a reminder for myself that you can keep an eye on.

A more relaxed QC procedure

People are occasionally worried about losing a cell type due to filtering. A more relaxed approach to QC would be to retain information about which cells are low-quality without actually discarding them, to facilitate downstream interpretation if weaker filtering is required to recover cell types with low RNA content.

Fixes to CITE-seq QC section

Need to explain why we use a different QC scheme for that dataset, related to the number of markers.

Need to distinguish "suggested" and "optional" approaches.

Aaron's weekly todo list

Replace spike-in example with RichardTCellData().
Add cell cycle assignment chapter.
Discuss the relative nature of marker gene interpretation.

Figure 5.1 broken link

Hello,

Thank you for writing such an excellent guide to single cell analysis.

Figure 5.1 is currently missing:
https://osca.bioconductor.org/overview.html

The image is still in the repository:
https://github.com/Bioconductor/OrchestratingSingleCellAnalysis/blob/master/images/Workflow.png

But the link appears to be incorrect, at line 46 of P2_W01.overview.md, which is then being propagated to line 676 of overview.html:
https://github.com/Bioconductor/OrchestratingSingleCellAnalysis/blob/ed0b0930ab65a157fd5af8e8d4915adfb885cf70/docs/overview.html#L676

Please could you look into this? Thanks!

Operationalize RNA velocity section

@csoneson to make an unspliced dataset available in scRNAseq.
other volunteers to create a dedicated R wrapper around scvelo or velocyto, or both.

Tagged release OSCA docker images

The bioconductor_docker container maintains tagged releases in parallel to the package releases. Is it possible to start pushing inherited tagged builds for bioconductor/orchestratingsinglecellanalysis to allow future reproducibility of the book and any derived works? I believe this should be as simple as:

On RELEASE_3_12, increment the bioconductor_docker tag to RELEASE_3_12.
Build as bioconductor/orchestratingsinglecellanalysis:RELEASE_3_12.
Push this tagged build to DockerHub.

It seems this is the intended release practice for Bioconductor images based on https://github.com/Bioconductor/bioconductor_docker/blob/master/best_practices.md.

A build of the Dockerfile modification above appears to work. Please let me know if there is anything further I could do to help (PR to relevant branches?).

Follow the best practices of Bioconductor containers

Since OSCA is in the Bioconductor organization on Dockerhub, adding https://github.com/Bioconductor/bioconductor_docker/blob/master/best_practices.md will be useful.

Particularly, adding LABELS,

LABEL name="bioconductor/bioconductor_docker_<image-name>" \
      version="0.99.0" \
      url="https://github.com/Bioconductor/bioconductor_docker_<image-name>" \
      maintainer="[email protected]" \
      description="Description of my image" \
      license="Artistic-2.0"

Question about licensing and derivatives

I am building a Snakemake workflow to analyse scRNA-seq data and would like to make it available through the Snakemake workflows repository. However, because the workflow itself is based on the material in the OSCA book I am not sure whether this is allowed under the current license:

"NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material."

If this use-case is not covered by the license, is it recommended that I do not share the workflow and instead make the repository private?

aggregateAcrossCells after fastMNN

Hi,

when I run aggregateAcrossCell on a data set created using fastMNN from multiple batches, an error is returned. Curiously, the object returned by fastMNN does not contain the raw counts, which I believe is the root of the error. How can this be fixed? Could it be due to the fact that I am not running the latest version of R/BioConductor?

Steven

Problem in downloading LunSpikeInData

Hi
When I run this code:

LunSpikeInData(which="416b")

it gives me the below output:

snapshotDate(): 2022-04-26
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
loading from cache
Error: failed to load resource
  name: EH2674
  title: Lun 416B plus spike-in counts
  reason: error reading from connection

I tried other functions of scRNAseq package, and I could download data, but for LunSpikeInData( ), I couldn't download data.

Fix the custom methods for marker detection section

It's now label, not cluster. And arguably we should be doing true pairwise comparisons within each loop, NAs be damned.

Fixes to iSEE chapter

It still runs but the apps will error out because some of the column metadata is no longer present, specifically related to the QC metrics. If you want the metrics, you will have to call addPerCellQC explicitly upon loading the SCE object, and then manage your own log-transformations.

@kevinrue @federicomarini @csoneson

Single cell analysis

Minor notes to self

Comment on the relationship between the entropy and library size.
Demonstrate how to use rowSubset() for HVG selection.
Add a "comparing clustering section", with clusterRand and coassignProb.
Add a demonstration of clusterPurity as another cluster diagnostic.
Truncate the SingleR section and refer people to the other book.

code can't run successfully in 5th and 6th chapter

When I read the 5th chapter about Overview, I run the code

library(scRNAseq)

sce <- MacoskoRetinaData() and then excute in Rstudio, it errors and indicates no MacoskoRetinaData function.

And the same problem exists in chapter 6, when I run

library(scRNAseq)

sce.416b<-LunSpikeInData(which="416b")

sce.416b$block<-factor(sce.416b$block)

it also errors with no LunSpikeInData function. Then I refer to the documentation of package scRNAseq, there is the same code in the documentation. So I’m confused and wonder why I can’t run the code successfully.

DEGs between two scRNA-seq datasets

Hey,

Sorry for asking here, don't know if this is the place to ask. First of all thanks a lot for this amazing book! OSCA is really helpful!!

I have a question regarding the section 3.4 OSCA Multisample.

So I have two SingleCellExperiment objects, let's say sce1 and sce2 from different batches and I want to compare the expression of a particular gene between both datasets. So I agree that using the corrected values obtained after applying MNN (or any other integration algorithm) could give results which do not reflect the true biology.

My concern is mostly about your suggestion which I am not sure if I fully understood when you suggested : "We suggest performing cross-batch comparisons on the original expression values wherever possible". I think that I should apply individually LogNormCounts to normalize sce1 cell effect and sce2 and then apply MultiBatchNorm to correct "between-batch effect" between sce1 and sce2. After that in the Violin plots and observe the graphical differences of a particular gene, but how should I statistically model these differences both for a particular gene and for all annotated genes?

Thanks a lot for any help you could provide about this,

Best,

Kike

An error occurs when removing swapped molecules

when working on section 15.5 I get the following error when trying to identify the swappedDrops() samples.
running teh commad

after.mat <- swappedDrops(swap.files, get.swapped=TRUE)

gives this error:

Error detected in HDF5 (1.10.6) thread 0:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1498 in H5F_open(): unable to open file: time = Mon Feb  1 17:06:00 2021
, name = '~/Library/Caches/ExperimentHub/b5c644a7a8a_3761', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = '~/Library/Caches/ExperimentHub/b5c644a7a8a_3761', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
    major: File accessibilty
    minor: Unable to open file

my swap.files object is this

> swap.files
                                                A1                                                 B1 
 "~/Library/Caches/ExperimentHub/b5c644a7a8a_3761" "~/Library/Caches/ExperimentHub/b5c635b2fc39_3762" 
                                                C1                                                 D1 
"~/Library/Caches/ExperimentHub/b5c679cd1db9_3763"  "~/Library/Caches/ExperimentHub/b5c695a9b28_3764" 
                                                E1                                                 F1 
"~/Library/Caches/ExperimentHub/b5c61b8063e4_3765"  "~/Library/Caches/ExperimentHub/b5c6a1e1bd7_3766" 
                                                G1                                                 H1 
"~/Library/Caches/ExperimentHub/b5c63eb5c571_3767"  "~/Library/Caches/ExperimentHub/b5c6fb59de1_3768"

Any ideas, where the problem is?

thanks

Assa

package ‘OSCAUtils’ is not available (for R version 3.6.2)”

hello
I'm working by R v.3.6
but it seems library (OSCAUtils) is not supported by this version of R. when ever I tried to install this package, I faced with the error package ‘OSCAUtils’ is not available (for R version 3.6.2)”

Typo on Chapter 9.2 PCA, runPCA(..., BSPARAM=RandomParam(), name='')

Hi there, according to the documentation of BiocSingularParam classes, BSPARAM=RandomParam() corresponds to a name of rsvd, instead of irlba (IrlbaParam()). This is a minor edit to make :)

Fix the HCA bone marrow chapter

For @robertamezquita:

Less "wall of code"; move comments into text and split up code chunks into sections.
Remove extraneous code in comments.
No tidyverse code.
Update function calls, e.g., trendVar/decomposeVar to modelGeneVar.
Limit your pedagogy. No need for duplicated findMarkers() calls.
Less aggressive downsampling, you want to show that the pipeline is scalable. Suggest only downsampling to 100000 cells, and see if you can coerce to a sparse matrix instead.

Need to add some words about the ARI

Need to mention how to interpret the ARI in the clustering chapter.

Clean up the multi-sample comparisons chapter

Fix the default recommendation for the number of cells per population.
Create diagnostic MA plots for the normalization step.
Streamline extraction of metadata from the friendlier aggregateAcrossCells output.
Mention the importance of diagnostic plots for serious work.
Try switching the convenient loop to use muscat again?

any(seqnames(location)=="MT") question

I just want to double-check this part of the code at OSCA devel - quality control:

is.mito <- any(seqnames(location)=="MT") will give me only a TRUE.

However, I understood this should be a vector or index of features. In that case, this should be
is.mito <- which(seqnames(location)=="MT") ?

Add `## Session Info` at the end of each chapter

(suggestion)
At least for the chapters that do contain references.
Otherwise, the references appear a bit out of nowhere, straight after the sessionInfo.

Guideline to contribute?

Dear all,

Is there any guideline to contribute to this awesome book?

For example, I would like to contribute to Chapter 8, which is on Feature Selection, by writing something about feature selection by deviance using glmpca. Should I just make a pull request or should I communicate my intent with someone beforehand?

Best regards,
Mikhael

does anybody know how to deal with?

Hello, I'd like to ask you a question.

I want to translate your open source single cell book into Chinese, is that ok?

Use scDblFinder

Demo scDblFinder's main function in the doublet section. Also show how to derive hard yes/no doublet calls from scores.

bioconductor / orchestratingsinglecellanalysis Goto Github PK

orchestratingsinglecellanalysis's Introduction

Orchestrating Single Cell Analysis

orchestratingsinglecellanalysis's People

Contributors

Stargazers

Watchers

Forkers

orchestratingsinglecellanalysis's Issues

Recommend Projects

Recommend Topics

Recommend Org