Giter Site home page Giter Site logo

czbiohub-sf / tabula-muris Goto Github PK

View Code? Open in Web Editor NEW
185.0 11.0 90.0 549.33 MB

Code and annotations for the Tabula Muris single-cell transcriptomic dataset.

Home Page: https://www.nature.com/articles/s41586-018-0590-4

License: BSD 3-Clause "New" or "Revised" License

R 0.11% Jupyter Notebook 0.37% Shell 0.01% HTML 99.06% Python 0.02% Makefile 0.01% MATLAB 0.02% TeX 0.41%

tabula-muris's Introduction

tabula-muris

The Tabula muris data was generated by the Chan Zuckerberg Biohub. For a detailed description of the project please refer to our publication Transcriptomic characterization of 20 organs and tissues from mouse at single cell resolution creates a Tabula Muris. The Tabula muris project is a a compendium of single cell transcriptomic data from the mouse containing nearly 100,000 cells from 20 organs and tissues. The data allow for direct and controlled comparison of gene expression in cell types shared between tissues, such as immune cells from distinct anatomical locations. The resource also enables contrasting two distinct technical approaches:

  • microfluidic droplet-based 3'-end counting, which provides a survey of thousands of cells per organ at relatively low coverage.
  • FACS-based full length transcript analysis, which provides higher sensitivity and coverage.

This rich collection of annotated cells will be a useful resource for:

  • Defining gene expression in previously poorly-characterized cell populations.
  • Validating findings in future targeted single-cell studies.
  • Developing of methods for integrating datasets (eg between the FACS and droplet experiments), characterizing batch effects, and quantifying the variation of gene expression in many cell types between organs and animals.

Since late 2017, Tabula muris data have been made available to all users free of charge. AWS has made the data freely available on Amazon S3 so that anyone can download the resource to perform analysis and advance medical discovery without needing to worry about the cost of storing Tabula muris data or the time required to download it.

Learn more about how Tabula muris data is used in the project vignettes repo.

Installation - Python

To install the Python dependencies, create a tabula-muris-env environment by using the environment.yml file provided:

conda env create -f environment.yml

Activate the environment and install it to your Jupyter notebook with:

source activate tabula-muris-env
python -m ipykernel install --user --name tabula-muris-env --display-name "Python 3.6 (tabula-muris-env)"

Installation - R

Packages:

install.packages(c("here", "Seurat", "useful", "ontologyIndex", "tidyverse"))

Getting started

From "raw" gene-cell counts tables

If you want to start from the raw gene-cell counts tables, then first download the data from figshare. You can download manually from the links (FACS and Droplet) or run a script we've prepared:

bash 00_data_ingest/download_data.sh

This will download two zip files,droplet_raw_data.zip and facs_raw_data.zip and unzip them into the folder structure described below. Then you'll have two folders in 00_data_ingest (the location is important - everything here depends on the folder structure).

FACS

The FACS folder should look like this:

00_facs_raw_data
├── FACS
│   ├── Aorta-counts.csv
│   ├── Bladder-counts.csv
│   ├── Brain_Myeloid-counts.csv
│   ├── Brain_Non-Myeloid-counts.csv
│   ├── Diaphragm-counts.csv
│   ├── Fat-counts.csv
│   ├── Heart-counts.csv
│   ├── Kidney-counts.csv
│   ├── Large_Intestine-counts.csv
│   ├── Limb_Muscle-counts.csv
│   ├── Liver-counts.csv
│   ├── Lung-counts.csv
│   ├── Mammary_Gland-counts.csv
│   ├── Marrow-counts.csv
│   ├── Pancreas-counts.csv
│   ├── Skin-counts.csv
│   ├── Spleen-counts.csv
│   ├── Thymus-counts.csv
│   ├── Tongue-counts.csv
│   └── Trachea-counts.csv
├── FACS.zip
├── annotations_FACS.csv
└── metadata_FACS.csv

Droplet

Now your droplet folders should look like this:

01_droplet_raw_data
├── annotations_droplet.csv
├── droplet
│   ├── Bladder-10X_P4_3
│   ├── Bladder-10X_P4_4
│   ├── Bladder-10X_P7_7
│   ├── Heart_and_Aorta-10X_P7_4
│   ├── Kidney-10X_P4_5
│   ├── Kidney-10X_P4_6
│   ├── Kidney-10X_P7_5
│   ├── Limb_Muscle-10X_P7_14
│   ├── Limb_Muscle-10X_P7_15
│   ├── Liver-10X_P4_2
│   ├── Liver-10X_P7_0
│   ├── Liver-10X_P7_1
│   ├── Lung-10X_P7_8
│   ├── Lung-10X_P7_9
│   ├── Lung-10X_P8_12
│   ├── Lung-10X_P8_13
│   ├── Mammary_Gland-10X_P7_12
│   ├── Mammary_Gland-10X_P7_13
│   ├── Marrow-10X_P7_2
│   ├── Marrow-10X_P7_3
│   ├── Spleen-10X_P4_7
│   ├── Spleen-10X_P7_6
│   ├── Thymus-10X_P7_11
│   ├── Tongue-10X_P4_0
│   ├── Tongue-10X_P4_1
│   ├── Tongue-10X_P7_10
│   ├── Trachea-10X_P8_14
│   └── Trachea-10X_P8_15
├── droplet.zip
└── metadata_droplet.csv

All of the *-10X_* folders contain a barcodes.tsv, genes.tsv, and matrix.mtx file as output by cellranger from 10X genomics.

01_droplet_raw_data/droplet/Bladder-10X_P4_3
├── barcodes.tsv
├── genes.tsv
└── matrix.mtx

Folder Organization

  • FACS = SmartSeq2 on FACS-sorted plates
  • Microfluidic = 10x droplet-based unique molecular identifier (UMI)-barcoded transcripts and cells
tabula_muris/
    00_data_ingest/               # How the data was processed from gene-cell tables
        README.md
        download_robj.Rmd         # Download R objects for figures using this script
        02_tissue_analysis_rmd/                  # *Generate* R objects for figures yourself
            Aorta_facs.Rmd
            Brain-Non-microglia_facs.Rmd
            Brain-Microglia_facs.Rmd
            Bladder_facs.Rmd
            Bladder_droplet.Rmd
            Colon_facs.Rmd
            Heart_facs.Rmd
            Heart_droplet.Rmd
            ... more files ...
        03_tissue_annotation_csv/
            Aorta_facs_annotation.csv
            Brain-Non-microglia_facs_annotation.csv
            Brain-Microglia_facs_annotation.csv
            Bladder_facs_annotation.csv
            Bladder_droplet_annotation.csv
            Colon_facs_annotation.csv
            Heart_facs_annotation.csv
            Heart_droplet_annotation.csv
            ... more files ...
        04_tissue_robj_generated/
        10_tissue_robj_downloaded/
        11_global_robj/
        12_extract_number_of_genes_cells/
        13_ngenes_ncells_facs/
        14_ngenes_ncells_droplet/
        15_color_palette/
        16_genes_for_tissue_tsne/
        20_dissociation_genes/
        All_Droplet_Notebook.Rmd
        All_FACS_Notebook.Rmd
        Droplet_Notebook.Rmd
        FACS_Notebook.Rmd
        README.md
        cell_order_FACS.txt
        cell_order_droplets.txt
        download_data.sh
    01_figure1/                   # Overview + #cell barplots + #gene/#reads horizonplots
        README.md
        figure1{b-g}.ipynb
    02_figure2/                   # FACS TSNE plots + annotation barplots
        README.md
        figure2a.Rmd
        figure2b.Rmd
        figure2c.ipynb
    03_figure3/                   # All-cell clustering heatmap with dendrogram
        figure3.Rmd
    04_figure4/                   # Analysis of all T cells sorted by FACS
        figure4{a-d}.Rmd
    05_figure5/                   # Transcription factor expression analysis
        figure5.Rmd
    11_supplementary_figure1/     # Histograms of number of genes detected across tissues
    12_supplementary_figure2/     # FACS vs Microfluidics - # cells expressing a gene
    13_supplementary_figure3/     # FACS vs Microfluidics - # genes detected per cell
    14_supplementary_figure4/     # FACS vs Microfluidics - dynamic range
    15_supplementary_figure5/     # Microfluidics TSNE plots + annotation barplots
    16_supplementary_figure6/     # Analysis of dissociation-induced genes
    17_supplementary_figure7/     # Transcription factor enrichment in cell types

How to cite this dataset

If you find the Tabula muris data useful for your research please cite our publication

Contact

If you have questions about the data, you can create an Issue at the project repo on GitHub.

License

There are no restrictions on the use of data received from the Chan Zuckerberg Biohub, unless expressly identified prior to or at the time of receipt.

tabula-muris's People

Contributors

ahmetcansolak avatar akershner avatar anabhan123 avatar aopisco avatar batson avatar biterbilen avatar gmstanle avatar jamestwebber avatar ktravaglini avatar melocactus avatar mnw97001 avatar ndswxy avatar nschaum avatar olgabot avatar pknguyen1 avatar sdarmanis avatar sikandars avatar taliram avatar transcriptomics avatar wwkongstanford avatar ytherookie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tabula-muris's Issues

Heart - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Heart (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Mammary - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Mammary (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Lung - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Lung (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Add Windows support for download script

The download script 00_data_ingest/download_data.sh uses command line tools like curl that may not be available on Windows. We'll need to update the instructions or the script to make it compatible with Windows, e.g. Windows 10.

Muscle - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Muscle (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Muscle - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Muscle (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Spleen - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Spleen (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Bladder - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Bladder (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Heart - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Heart (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Thymus - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Thymus (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Marrow - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Marrow (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Fat - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Fat (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Colon - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Colon (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Liver - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Liver (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Colon - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Colon (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Brain_Microglia - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Brain_Microglia (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Heart - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Heart (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Liver - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Liver (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Marrow - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Marrow (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Trachea - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Trachea (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Skin - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Skin (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Diaphragm - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Diaphragm (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Trachea - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Trachea (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Thymus - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Thymus (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Fat - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Fat (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Pancreas - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Pancreas (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Genes per cell comparison between methods

Comparing UMI and reads isn't straightforward. One might pick a cutoff for reads and for UMIs, but that would be arbitrary.

To see how sensitive these are, we could plot a nGenes as a function of read cutoff and UMI cutoff, in each dataset.

Tongue - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Tongue (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Brain_Non-microglia - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Brain_Non-microglia (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Diaphragm - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Diaphragm (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Kidney - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Kidney (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Spleen - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Spleen (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Lung - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Lung (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Mammary - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Mammary (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Fix macrophage an endothelial annotations in bladder

Comment from Biorxiv

awesome job and the data is really great quality! one quick issue: in the bladder, the designation of macrophage and endothelial cell classes is reversed. once you fix that, the magnitude of organ specific variability per cell type is much less

Kidney - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Kidney (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Brain_Microglia - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Brain_Microglia (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Marrow - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Marrow (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Aorta - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Aorta (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Bladder - droplet

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Bladder (method=droplet)
  • Added all figures that we want to reference in the tissue supplement

Tongue - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Tongue (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Liver - facs

  • Re-ran PCA, clustering, cell ontology annotation and free annotation on Liver (method=facs)
  • Added all figures that we want to reference in the tissue supplement

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.