Giter Site home page Giter Site logo

orchestratingsinglecellanalysis's Introduction

Orchestrating Single Cell Analysis

All book-related source code has been moved to the OSCA-source organization. This repository is retained only as a signpost... and also to provide a Docker image.

The Docker image itself contains all of the packages required to create the full set of OSCA books. The latest sources of the books themselves are stored in /home/book/.

orchestratingsinglecellanalysis's People

Contributors

actions-user avatar alexnghiem avatar csoneson avatar federicomarini avatar grimbough avatar kevinrue avatar ltla avatar mtmorgan avatar nturaga avatar petehaitch avatar plger avatar robertamezquita avatar sjaenick avatar stephaniehicks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orchestratingsinglecellanalysis's Issues

Interpretation Figure 7.7 in Chapter 7.4.2 Differential Expression

You show a violin plot in Chapter 7.4.2 (Figure 7.7) of the excellent workflow “Orchestrating Single-Cell Analysis with Bioconductor”. However, I am confused about the text accompanying the figure:
“This plot highlights that CD3D is more highly expressed in cluster 1 relative to some of the other clusters, but not all. This can also be seen from our raw output above, where the log fold-change is calculated with respect to each cluster. There, we see that the log fold-change for CD3D is very high only relative to clusters 2 and 3 (meeting our cutoff of 1.5).”

How do you see that CD3D is more highly expressed in cluster 1 relative to some other clusters? Based on the plot, I thought that CD3D is more highly expressed in cluster 3,4,5 compared to clusters 1 and 2 (because the mean expression of clusters 3,4,5 is higher than in cluster 1 and 2).
Moreover, CD3D is not present in the raw output above the figure.

Thanks in advance.

A more relaxed QC procedure

People are occasionally worried about losing a cell type due to filtering. A more relaxed approach to QC would be to retain information about which cells are low-quality without actually discarding them, to facilitate downstream interpretation if weaker filtering is required to recover cell types with low RNA content.

Fixes to CITE-seq QC section

Need to explain why we use a different QC scheme for that dataset, related to the number of markers.

Need to distinguish "suggested" and "optional" approaches.

Aaron's weekly todo list

  • Replace spike-in example with RichardTCellData().
  • Add cell cycle assignment chapter.
  • Discuss the relative nature of marker gene interpretation.

Figure 5.1 broken link

Hello,

Thank you for writing such an excellent guide to single cell analysis.

Figure 5.1 is currently missing:
https://osca.bioconductor.org/overview.html

The image is still in the repository:
https://github.com/Bioconductor/OrchestratingSingleCellAnalysis/blob/master/images/Workflow.png

But the link appears to be incorrect, at line 46 of P2_W01.overview.md, which is then being propagated to line 676 of overview.html:
https://github.com/Bioconductor/OrchestratingSingleCellAnalysis/blob/ed0b0930ab65a157fd5af8e8d4915adfb885cf70/docs/overview.html#L676

Please could you look into this? Thanks!

Tagged release OSCA docker images

The bioconductor_docker container maintains tagged releases in parallel to the package releases. Is it possible to start pushing inherited tagged builds for bioconductor/orchestratingsinglecellanalysis to allow future reproducibility of the book and any derived works? I believe this should be as simple as:

  1. On RELEASE_3_12, increment the bioconductor_docker tag to RELEASE_3_12.
  2. Build as bioconductor/orchestratingsinglecellanalysis:RELEASE_3_12.
  3. Push this tagged build to DockerHub.

It seems this is the intended release practice for Bioconductor images based on https://github.com/Bioconductor/bioconductor_docker/blob/master/best_practices.md.

A build of the Dockerfile modification above appears to work. Please let me know if there is anything further I could do to help (PR to relevant branches?).

Question about licensing and derivatives

I am building a Snakemake workflow to analyse scRNA-seq data and would like to make it available through the Snakemake workflows repository. However, because the workflow itself is based on the material in the OSCA book I am not sure whether this is allowed under the current license:

"NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material."

If this use-case is not covered by the license, is it recommended that I do not share the workflow and instead make the repository private?

aggregateAcrossCells after fastMNN

Hi,

when I run aggregateAcrossCell on a data set created using fastMNN from multiple batches, an error is returned. Curiously, the object returned by fastMNN does not contain the raw counts, which I believe is the root of the error. How can this be fixed? Could it be due to the fact that I am not running the latest version of R/BioConductor?

Steven

Problem in downloading LunSpikeInData

Hi
When I run this code:

LunSpikeInData(which="416b")

it gives me the below output:

snapshotDate(): 2022-04-26
see ?scRNAseq and browseVignettes('scRNAseq') for documentation
loading from cache
Error: failed to load resource
  name: EH2674
  title: Lun 416B plus spike-in counts
  reason: error reading from connection

I tried other functions of scRNAseq package, and I could download data, but for LunSpikeInData( ), I couldn't download data.

Fixes to iSEE chapter

It still runs but the apps will error out because some of the column metadata is no longer present, specifically related to the QC metrics. If you want the metrics, you will have to call addPerCellQC explicitly upon loading the SCE object, and then manage your own log-transformations.

@kevinrue @federicomarini @csoneson

Minor notes to self

  • Comment on the relationship between the entropy and library size.
  • Demonstrate how to use rowSubset() for HVG selection.
  • Add a "comparing clustering section", with clusterRand and coassignProb.
  • Add a demonstration of clusterPurity as another cluster diagnostic.
  • Truncate the SingleR section and refer people to the other book.

code can't run successfully in 5th and 6th chapter

When I read the 5th chapter about Overview, I run the code

library(scRNAseq)

sce <- MacoskoRetinaData() and then excute in Rstudio, it errors and indicates no MacoskoRetinaData function.

And the same problem exists in chapter 6, when I run

library(scRNAseq)

sce.416b<-LunSpikeInData(which="416b")

sce.416b$block<-factor(sce.416b$block)

it also errors with no LunSpikeInData function. Then I refer to the documentation of package scRNAseq, there is the same code in the documentation. So I’m confused and wonder why I can’t run the code successfully.

DEGs between two scRNA-seq datasets

Hey,

Sorry for asking here, don't know if this is the place to ask. First of all thanks a lot for this amazing book! OSCA is really helpful!!

I have a question regarding the section 3.4 OSCA Multisample.

So I have two SingleCellExperiment objects, let's say sce1 and sce2 from different batches and I want to compare the expression of a particular gene between both datasets. So I agree that using the corrected values obtained after applying MNN (or any other integration algorithm) could give results which do not reflect the true biology.

My concern is mostly about your suggestion which I am not sure if I fully understood when you suggested : "We suggest performing cross-batch comparisons on the original expression values wherever possible". I think that I should apply individually LogNormCounts to normalize sce1 cell effect and sce2 and then apply MultiBatchNorm to correct "between-batch effect" between sce1 and sce2. After that in the Violin plots and observe the graphical differences of a particular gene, but how should I statistically model these differences both for a particular gene and for all annotated genes?

Thanks a lot for any help you could provide about this,

Best,

Kike

An error occurs when removing swapped molecules

when working on section 15.5 I get the following error when trying to identify the swappedDrops() samples.
running teh commad

after.mat <- swappedDrops(swap.files, get.swapped=TRUE)

gives this error:

Error detected in HDF5 (1.10.6) thread 0:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1498 in H5F_open(): unable to open file: time = Mon Feb  1 17:06:00 2021
, name = '~/Library/Caches/ExperimentHub/b5c644a7a8a_3761', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 734 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = '~/Library/Caches/ExperimentHub/b5c644a7a8a_3761', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
    major: File accessibilty
    minor: Unable to open file

my swap.files object is this

> swap.files
                                                A1                                                 B1 
 "~/Library/Caches/ExperimentHub/b5c644a7a8a_3761" "~/Library/Caches/ExperimentHub/b5c635b2fc39_3762" 
                                                C1                                                 D1 
"~/Library/Caches/ExperimentHub/b5c679cd1db9_3763"  "~/Library/Caches/ExperimentHub/b5c695a9b28_3764" 
                                                E1                                                 F1 
"~/Library/Caches/ExperimentHub/b5c61b8063e4_3765"  "~/Library/Caches/ExperimentHub/b5c6a1e1bd7_3766" 
                                                G1                                                 H1 
"~/Library/Caches/ExperimentHub/b5c63eb5c571_3767"  "~/Library/Caches/ExperimentHub/b5c6fb59de1_3768" 

Any ideas, where the problem is?

thanks

Assa

Fix the HCA bone marrow chapter

For @robertamezquita:

  • Less "wall of code"; move comments into text and split up code chunks into sections.
  • Remove extraneous code in comments.
  • No tidyverse code.
  • Update function calls, e.g., trendVar/decomposeVar to modelGeneVar.
  • Limit your pedagogy. No need for duplicated findMarkers() calls.
  • Less aggressive downsampling, you want to show that the pipeline is scalable. Suggest only downsampling to 100000 cells, and see if you can coerce to a sparse matrix instead.

Clean up the multi-sample comparisons chapter

  • Fix the default recommendation for the number of cells per population.
  • Create diagnostic MA plots for the normalization step.
  • Streamline extraction of metadata from the friendlier aggregateAcrossCells output.
  • Mention the importance of diagnostic plots for serious work.
  • Try switching the convenient loop to use muscat again?

any(seqnames(location)=="MT") question

I just want to double-check this part of the code at OSCA devel - quality control:

is.mito <- any(seqnames(location)=="MT") will give me only a TRUE.

However, I understood this should be a vector or index of features. In that case, this should be
is.mito <- which(seqnames(location)=="MT") ?

Screen Shot 2021-07-09 at 12 50 33 PM

Guideline to contribute?

Dear all,

Is there any guideline to contribute to this awesome book?

For example, I would like to contribute to Chapter 8, which is on Feature Selection, by writing something about feature selection by deviance using glmpca. Should I just make a pull request or should I communicate my intent with someone beforehand?

Best regards,
Mikhael

can not download LunSpikeInData

the error returns

Error in value[3L] : failed to connect
reason: Timeout was reached: [bioconductor.org] Operation timed out after 10002 milliseconds with 0 out of 0 bytes received
Consider rerunning with 'localHub=TRUE'

does anybody know how to deal with?

Use scDblFinder

Demo scDblFinder's main function in the doublet section. Also show how to derive hard yes/no doublet calls from scores.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.