Giter Site home page Giter Site logo

jdblischak / singlecellseq Goto Github PK

View Code? Open in Web Editor NEW
95.0 11.0 67.0 1.33 GB

Batch effects and the effective design of single-cell gene expression studies

Home Page: http://jdblischak.github.io/singleCellSeq/analysis

License: Other

Shell 7.49% Python 29.62% R 49.53% Makefile 0.84% TeX 12.36% Stan 0.16%

singlecellseq's Introduction

SingleCellSeq

Important links

See the site to view the results.

Read the paper.

Download the raw FASTQ files at GEO record GSE77288.

See the contributing guidelines to add a new analysis.

Project organization

  • analysis/ contains the R Markdown files documenting the results
  • paper/ contains the R Markdown files used to write the paper
  • code/ contains command-line scripts
  • data/ contains various summarized data files
  • tests/ contains some sanity checks

The majority of the generated content, e.g. figures and HTML files, are in the analysis/ subdirectory of the gh-pages branch.

Useful data files

  • reads.txt - tab-delimited file contains read counts for the 19,027 Ensembl protein-coding genes with at least one observed read in at least one of the 864 single cell samples
  • molecules.txt - same as reads.txt, but for the molecule counts
  • reads-filter.txt - tab-delimited file contains read counts for the 13,106 Ensembl protein-coding genes that passed our expression level filter for the 564 single cells that passed our quality control filters
  • molecules-filter.txt - same as reads-filter.txt, but for the molecule counts
  • molecules-final.txt - tab-delimited file contains the final gene expression values after all of our processing steps

singlecellseq's People

Contributors

davidaknowles avatar jdblischak avatar jhsiao999 avatar pytung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

singlecellseq's Issues

Cell cycle score

Hello,

I am trying to use the estimator for cell cycle phase.
Can you give more information on how you estimate the cell cycle phase? I found the method super close to the CellCycleScoring function from Seurat. Is it the case? Do you base your estimation on a published method? I didn't find any information in the Scientific reports paper.

Thanks,
Aurélie

Publish data on bioconductor?

Dear jdblischak,

I was wondering if you can create a data package on bioconductor containing the processed data. The reason I am asking is that I wrote an R package for the quantification of batch effects (https://github.com/theislab/kBET), for which I created a vignette using your processed data. The guidelines of bioconductor require a separate storage of experimental data and I think that you are certainly unhappy if I am to publish your processed data on bioconductor.
You would certainly help me and it gives you the opportunity to reach a broader audience.

Best regards

A query about the usage of umitools

Hello!

I am trying to realize the Read mapping step described in your excellent paper "Batch effects and the effective design of single-cell gene expression studies" and I am looking into the file code/trim.sh.

For the sentence in that paper:

To handle the UMI sequences at the 5′ end of each read, we used umitools39 to find all reads with a UMI of the pattern NNNNNGGG (reads without UMIs were discarded).

May I ask how to discard reads without UMIs (is it accomplished using a certain option in umitools)? I observed that some reads have the first 8 bps "NNNNNGGG", where the last 3 are not all "G". Are those reads the reads without UMI that you mentioned in the paper?

By the way, the umitools seems to be updated, and the original code/trim.sh may not work. Would you mind teaching me how to achieve that step in the paper using the latest version's umitools please?

Thanks a lot!
Wenhao

A query about dealing with bam files

Hi,

Following the "Read mapping" section in your paper, now I have successfully obtained 2592 bam files for the single cells.

I am going to do the next step :

To obtain gene-level counts, we assigned reads to protein-coding genes (Ensembl GRCh37 release 82) and the ERCC spike-in genes using featureCounts.

When I look into the file singleCellSeq/code/Snakefile, it seems that I need to merge, sort and index those bam files.

Before those steps, should I merge bam files at first so that each bam file represents one cell? (Currently I have 2592 bam files. Since there are 864 cells, I guessed that every three of bam files correspond to one cell? Does three mean the library was sequenced in three lanes?)

Thank you very much!

Best wishes,
Wenhao

Overexpessed genes

Do we trust OEFinder? Shall we run downstream analysis after removing these genes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.