Giter Site home page Giter Site logo

becavin-lab / checkatlas Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 53.13 MB

One liner tool to check the quality of your single-cell atlases.

Home Page: https://checkatlas.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.17% Python 7.17% R 0.03% HTML 92.63%
control multiqc python quality scanpy seurat single-cell

checkatlas's People

Contributors

drbecavin avatar paolaporracciolo avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ryan2han

checkatlas's Issues

Integration with nf-core pipelines

Hi @drbecavin,

I am an active contributor to the nf-core project and have been working on the scRNA-seq and spatialtranscriptomics pipelins in the past. For both pipelines, we are considering to integrate checkatlas to generate MultiQC reports (see nf-core/scrnaseq#80 and nf-core/spatialvi#40).

From what I understood, the checkatlas architecture is rather complex, consisting of

  • a python library that takes a h5ad object and computes various QC metrics
  • a nextflow workflow that executes the different parts of the python library via CLI wrappers. The Nextflow workflow itself is wrapped in another Python CLI script.
  • a MultiQC module that reads the outputs of this workflow to generate a report
  • an R script to convert Seurat to h5ad.

To integrate checkatlas in one of our pipelines, we need to define a nextflow module that takes h5ad files as input, and generates files that can be ingested by a downstream MultiQC process. In addition we need a standalone container including all required dependencies (see also #25).

While it would be totally possible to create a container that contains both the Python dependencies, nextflow+java and R dependencies it seems a bit convoluted to run a nextflow workflow that starts a docker container that runs a python script that runs a nextflow workflow that runs another python script. It's also suboptimal in terms of resource management, because the checkatlas-nextflow running in the container cannot make use of the cluster/cloud scheduler the "outer" nextflow pipeline was configured to run with.

From our perspective, it would be better to separate the python library from the nextflow workflow in checkatlas. That way we could have a lightweight container for the python part, and build a "checkatlas" nextflow (sub)workflow that can be integrated in both pipelines. If necessary, conversion from Seurat to h5ad would run in a separate process with a separate container -- avoiding manual installation of R packages (mitigating issues like #24). In general, I think it is best to have nextflow as the outermost layer, to let it handle all dependencies and take advantage of its flexible resource management (local vs. hpc vs cloud).

Let me know what you think!

Cheers,
Gregor

CC @fasterius @cavenel (nf-core/spatialtranscriptomics), @fmalmeida (nf-core/scrnaseq)

Parse additional cellranger outputs

Thank you for the nice looking tool. Would you consider adding to the multiQC reports parsing outputs from additional cellranger files? I'm thinking the metrics_summary.csv and (less easily) the web_summary.html for each sample. The .csv files could be aggregated into a table with basic quality scores, mapping metrics etc. The web_summary.html file is harder to parse I guess, but ideally could capture the knee plots of UMI counts for a quick comparison across samples.

Best wishes, Chris

Add Kruskal stress calculation

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Tutorial for adding metrics in checklatlas

Update and fix the metrics adding protocol in chekatlas

Create a tutorial to clearly define the steps for adding a metrics:

  • define in which group it is
  • Add the metric in chekatlas code
  • Add a rapid documentation (where)
  • Add a wiki about the metric in checkatlas doc

Implement metrics object

Implement a metric calculation structure for 4 types of metrics

  • count distribution
  • clustering
  • specificity
  • dimensionality reduction

Dimensionality reduction metrics

Add metrics for dim reduction analysis

  • caterogrize local, global estimation metrics
  • Choose atlas with "specific" umap and tsne.
  • Add mammuth test tool from "The spurious art of ..."
  • Implement benchmark
  • Add human estimation on good and bad UMAP
  • compare metrics to human estimation

Dim reduction metric management for seurat object

Describe the bug
Not working because the sparse matrix in python cannot be converted in R

To Reproduce
Run kruskal stress calc

To fix
Implement distance calculation in R and return the distance matrix not the count matrix.

Umap with seurat_cluster does not display cluster

When suerat_clusters is present it does not display as category but as numerical

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Add SpatialData management

  • add detectio of spatial data
  • addd fast screening of spatialdata QC metrics
  • Display spatial data in multiqc

Automatic/Semi-automatic search of celltype key

In the search of the feature used to describe celltype, need to add some obs keyword.
For the moment everything is in

atlas.OBS_CLUSTERS = [
    "CellType",
    "celltype",
    "seurat_clusters",
    "orig.ident",
]

Need to fix this !

Maybe add an argument in chckatlas software so theused can tell us what is the right obs key ?

Add knee plots

Hi @drbecavin,

this package looks great! We are currently considering to add it to the nf-core scrnaseq workflow (see nf-core/scrnaseq#80).

One feature I'd love to see are knee plots for QC metrics. I find them superior over the current violin plots for finding inflection points and they would also be easy to render for many samples simultaneously.

In particular, I think the following plots would be useful:

  • cell rank vs. total counts
  • cell rank vs. detected genes
  • cell rank vs. mitochondrial fraction

Here's an example from the cellranger report

image

Here's another example from some custom python script I usually use for single cell QC with scanpy:

image
(y-axis= cell rank, n_genes_by_counts = number of detected genes, red lines indicate cutoffs I chose)

The knee plots could be (as opposed to the violin plots) easily combined into a single, interactive multiQC figure. This helps identifying outliers with bad quality when working with many single-cell samples. Here's an example of such a plot from the nf-core/rnaseq multiqc report:

image

Add test datasets

Add datasetst list from scPermut in checkatlas tutorial.
Use cellxgene tools for that !

Multiome seurat object bad conversion

Multiome seurat object are not converted accordingly to Scanpy objkect
Should be converted to two or thrree object depending of the number of assays ?

Improve Mito and ribo QC calculation

For some Atlas the code:

# mitochondrial genes
adata.var["mt"] = adata.var_names.str.startswith("MT-")
# ribosomal genes
adata.var["ribo"] = adata.var_names.str.startswith(("RPS", "RPL"))

in atlas.create_qc_tables(adata, atlas_path, atlas_info, args)

Does not work because the annotation is in Ensembl format (ENSG0000...) or other.
Need to be fixed by adding MT and ribo annotation !

Or need to add an issue directly in scanpy ?

Seurat import

Is your feature request related to a problem? Please describe.
For the moment, the user has to manually install Seurat in its environments. Or use only conda for checkatlas install.

Describe the solution you'd like
It would be nice to add Seurat with pip (but seems impossible)

QC figures too big

QC figures too big for multiQC html file
Reduc the size when produced in checkatlas
atlas.create_qc_plots(adata, atlas_path, atlas_info, fig_path)

Create github with all metrics

Create a separate github named
singlecell-metrics for outsourcing all metrics and document them outside of checkatlas

Goal : Increased visibility !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.