sandberg-lab / spreading-correction Goto Github PK
View Code? Open in Web Editor NEWSupplementary information to "Computational correction of index switching in multiplexed sequencing libraries" (Larsson et. al 2018).
License: MIT License
Supplementary information to "Computational correction of index switching in multiplexed sequencing libraries" (Larsson et. al 2018).
License: MIT License
Hi,
Would you please more explain how to define the [--i5 STRING] [--i7 STRING], I am using read counts from a 384 well plate with cell bar-code names as the column names.
Thanks in advance!
Hello,
I want to try to apply this tool in order to rescue our bacterial single cell data that showed extremely high levels of cross-contamination when multiplexed on the HiSeq Xten. Since our librarys are based on Multiple Displacement Amplification (MDA) products, they have highly uneven coverage, similar to transcriptome data, and simple coverage cutoffs are not enough to identify cross-contaminants.
Also it is not enough to simply identify contigs occuring in multiple samples and just attributing them to the one library where they have the highest coverage, since I may have a few single cells originating from the same species. So your tool seems to be a blessing here.
My plan to apply your workflow for contamination here is as follows:
1.) assembled the data of each library seperately
2.) cluster the contigs of all assemblies using strict identity cutoffs, to obtain representative, mappable consensus-contigs for each (potential) contaminant
3.) Map reads onto the clustered assemblies in order to obtain coverage values for each contig-cluster in each library and create input files similar to your transcriptome input.
4.)use your script to correct the coverage data with respect to the cross-contamination and then filter contaminating contigs from my datasets based on coverage cutoffs.
However, since the contigs are differently sized, (and moreover are unlikely to be fully complete in all cross-contaminated samples), I do not want to use simple read counts per contig, but rather average coverage values (e.g. mean read coverage per base position). This would mostly result in decimal values.
Since your example input seems to consist exclusively of integer values: Does your script also accept decimal/float coverage values in the input table?Or do I need to round such data? Or would you recommend a completely different approach here?
Hello!
We would like to use your tool to check some of our data as we (probably) had some cross-contamination. Our data is from 16S amplicon analysis to figure out the microbial community. Within the data, some samples have a very high count of certain ASV, some have very few and some (should) have none. We believe that the unspread.py script might help us solve this issue.
As far as we understand we can specify the number of samples by giving the number of rows and columns. However, when the samples were sequenced, not the whole sequencing plate was used. Thus, some combinations of the two indices are empty and we get the error message: "number of cells in count file not same as specified". Now we don’t know how to fix that, as filling up with 0’s could mess up the regression. Do you have any suggestion how to handle this situation? That would be very appreciated.
Thank you very much in advance!
Hi, where can I find the data used for the correction, mHSC_plate1HiSeq_counts_IndexInfo.csv and mHSC_plate1NextSeq_counts_IndexInfo.csv ?
Many thanks.
Hello, Anton.
We would like to apply this tool on our genome resequencing data obtained by Illumina novaseq 6000. Do you have any suggestion on how to calculate the read counts for each cell in WGS? How about read depth on certain position over the chromosomes?
Dear all,
During the process of a typical Illumina sequencing run, where do you retrieve the equivalent of the "mHSC_plate1HiSeq_counts_IndexInfo_anon.csv" you are using as input?
Is it a raw file you retrieve at a given step?
Is the file created from the concatenation / parsing of many files?
I did read the Nature Methods article and the Git repo but I am still unsure about this.
My wet lab team is asking for more details to retrieve such a file, thus the questions above.
Best regards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.