Giter Site home page Giter Site logo

easy-data's Introduction

easy-data

Easy access to benchmark datasets. Add your benchmarking desiderata and your datasets below.

benchmark for what?

A discussion of the problems for which benchmark datasets would allow for experimentation.

  • cell type annotation and reannotation at various levels of ontological depth
  • building and validating cell type classifiers
  • manifold alignment and batch-effect-aware analyses
  • assessing the variability in gene expression of cell types present in many organs
  • measuring sex differences in gene expression
  • measuring the variability in biological claims (like which genes are differentially expressed between populations) to be expected between different studies of the same cell types

datasets

To add a dataset, just create a section with a description and links to download it.

How easy can you make it for someone to get started?

tabula muris

Tabula Muris contains about 100,000 cells from 20 organs and tissues in mouse. The study is sex-balanced, with four male and four female mice. The organs included are skin, fat, mammary gland, heart, bladder, brain, thymus, spleen, kidney, limb muscle, tongue, marrow, trachea, pancreas, lung, large intestine, and liver. Many of these organs were processed using two methods: SMART-seq2 on FACS-sorted cells and microfluidic droplets from 10X Genomics.

Below are instructions for getting four files: metadata (including annotations) and count data for each dataset.

metadata

Version-controlled metadata are available on github.

TM_droplet_metadata.csv

TM_facs_metadata.csv

count files for R

You can download complete count files as sparse matrices in .rds format for easy loading into R. Unzip TabulaMuris.zip. Load:

tm.droplet.matrix = readRDS(here("data", "TM_droplet_mat.rds"))
tm.droplet.metadata = read_csv(here("data", "TM_droplet_metadata.csv"))

count files for Python

You can download complete count files as sparse matrices in AnnData-formatted h5ad files for use in Python here. You can load them using the Scanpy library:

import pandas
import scanpy

tm_facs_metadata = pd.read_csv('data/TM_facs_metadata.csv')
tm_facs_data = scanpy.anndata.read_h5ad('data/TM_facs_mat.h5ad')

CSV and MTX files

The original data release is on FigShare.

easy-data's People

Contributors

batson avatar freeman-lab avatar

Watchers

Gary Bader avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.