Giter Site home page Giter Site logo

sandberg-lab / spreading-correction Goto Github PK

View Code? Open in Web Editor NEW
15.0 4.0 4.0 11.68 MB

Supplementary information to "Computational correction of index switching in multiplexed sequencing libraries" (Larsson et. al 2018).

License: MIT License

Jupyter Notebook 97.52% Python 2.48%
single-cell-sequencing bioinformatics

spreading-correction's Issues

[--i5 STRING] [--i7 STRING]

Hi,

Would you please more explain how to define the [--i5 STRING] [--i7 STRING], I am using read counts from a 384 well plate with cell bar-code names as the column names.

Thanks in advance!

Tipps for applying "Spreading-Correction" on Single Cell Genome Assemblies?

Hello,
I want to try to apply this tool in order to rescue our bacterial single cell data that showed extremely high levels of cross-contamination when multiplexed on the HiSeq Xten. Since our librarys are based on Multiple Displacement Amplification (MDA) products, they have highly uneven coverage, similar to transcriptome data, and simple coverage cutoffs are not enough to identify cross-contaminants.
Also it is not enough to simply identify contigs occuring in multiple samples and just attributing them to the one library where they have the highest coverage, since I may have a few single cells originating from the same species. So your tool seems to be a blessing here.

My plan to apply your workflow for contamination here is as follows:
1.) assembled the data of each library seperately
2.) cluster the contigs of all assemblies using strict identity cutoffs, to obtain representative, mappable consensus-contigs for each (potential) contaminant
3.) Map reads onto the clustered assemblies in order to obtain coverage values for each contig-cluster in each library and create input files similar to your transcriptome input.
4.)use your script to correct the coverage data with respect to the cross-contamination and then filter contaminating contigs from my datasets based on coverage cutoffs.

However, since the contigs are differently sized, (and moreover are unlikely to be fully complete in all cross-contaminated samples), I do not want to use simple read counts per contig, but rather average coverage values (e.g. mean read coverage per base position). This would mostly result in decimal values.

Since your example input seems to consist exclusively of integer values: Does your script also accept decimal/float coverage values in the input table?Or do I need to round such data? Or would you recommend a completely different approach here?

Incomplete sequencing plate / incomplete index values

Hello!
We would like to use your tool to check some of our data as we (probably) had some cross-contamination. Our data is from 16S amplicon analysis to figure out the microbial community. Within the data, some samples have a very high count of certain ASV, some have very few and some (should) have none. We believe that the unspread.py script might help us solve this issue.

As far as we understand we can specify the number of samples by giving the number of rows and columns. However, when the samples were sequenced, not the whole sequencing plate was used. Thus, some combinations of the two indices are empty and we get the error message: "number of cells in count file not same as specified". Now we don’t know how to fix that, as filling up with 0’s could mess up the regression. Do you have any suggestion how to handle this situation? That would be very appreciated.
Thank you very much in advance!

Data availability

Hi, where can I find the data used for the correction, mHSC_plate1HiSeq_counts_IndexInfo.csv and mHSC_plate1NextSeq_counts_IndexInfo.csv ?
Many thanks.

IndexInfo file

Dear all,

During the process of a typical Illumina sequencing run, where do you retrieve the equivalent of the "mHSC_plate1HiSeq_counts_IndexInfo_anon.csv" you are using as input?

Is it a raw file you retrieve at a given step?
Is the file created from the concatenation / parsing of many files?

I did read the Nature Methods article and the Git repo but I am still unsure about this.
My wet lab team is asking for more details to retrieve such a file, thus the questions above.

Best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.