Comments (13)
Lowering -x to 100 is likely to be the cause of the error. For a smaller dataset, you might try simulating a larger dataset using the default parameters of LRSIM and then use a subset of the simulated reads randomly sampled from the whole fastq file. LongRanger checks the distribution and depth evenness of the barcode used very stringently. Changing the default parameters in LRSIM and using -o to disable parameter checking can make the simulated dataset look unreliable to LongRanger.
from lrsim.
Thanks for your answer!
Yeah, I could also do that, but the reason I lowered -x was actually because the size of the data generated was getting pretty big. I'm not exactly sure how far I got through the simulation process, but it grew up to a little more than 500 GB. Since I don't have access to lots of disk space, I thought lowering -x was a good compromise.
I'll try running again and leave -x at its default value then. Do you have any idea how much disk space it is gonna use in total, when running on E. coli? I would just like to be sure it's not gonna fully fill the available disk space I have left, since I have other experiments running in parallel, and that also require a little disk space.
Thanks again.
Pierre
from lrsim.
You could try running the test.sh
in the test
folder, it provides an example on Ecoli. The Ecoli reference is already in the folder so what you need to do is just to run the test.sh
script.
from lrsim.
I did run a full experiment with default parameters on E. coli last night. It ran successfully in a few hours and needed around 700 Go of disk space to run. However, I did not use the parameters specified in test.sh
because I did not want any SV to be included in the data (it might sound weird, but I'm interested in seeing how SV-callers tools, especially the one I'm working on, behave on datasets with no SVs). The command I used was the following: perl simulateLinkedReads.pl -r Ecoli.fasta -p /scratch/pmorisse/LRSIM/Ecoli/SimEcoli -n
I then used seqtk to randomly subsamble the fastq file, and performed LongRanger alignment with the subsambled fastq files I thus generated. The total size of the fastq files was around 7 GB, which seems like a reasonable coverage for a small test experiment.
However, I still got the same error, and LongRanger reported that a extremely high rate of incorrect barcodes was observed.
Am I forced to perform LongRanger alignment with the whole 700 GB fastq file generated with LRSIM? I'm afraid I won't have enough disk space if I have to do so. Or might it be because I deactivated SV simulation?
from lrsim.
from lrsim.
I just run test.sh and provided the generated data to LongRanger.
It crashed again, and output a different error message:
Log message: stage error:FASTQ parsing error: input fastq not consistent
from lrsim.
It ran well on my side. I uploaded the files generated at http://www.bio8.cs.hku.hk/lrsim/.
from lrsim.
Just downloaded and tested with your data, and got the same error.
Might be something to do with LongRanger I guess? Can you tell me which version you are using?
from lrsim.
from lrsim.
That might be why, I'm using LongRanger 2.2.2.
LongRanger 2.0 does not seem to be available for download on 10x genomics website though.
from lrsim.
I managed to pin down the problem.
As mentioned in a previous issue, this was caused by the "/1" and "/2" located at the end of the reads simulated by LRSIM, which seem to be incompatible with LongRanger. Removing them and re-rerunning LongRanger seemed to fix the problem with the data generated by the test.sh
script.
I also tried generated more data, using most of the parameters mentioned in test.sh
, but deactivating SV simulation, and all seems to work well. LongRanger is still running, but did not report any error.
I believe my initial with the high rate of incorrect barcodes was due to the fact I was using -x 1
without decreasing the -t
parameter in accordance.
from lrsim.
from lrsim.
Closing since the problem is solved.
from lrsim.
Related Issues (20)
- LongRanger crashes HOT 1
- LRSIM hangs during manifest generation step HOT 4
- LRSIM does not filter out duplicate reads with different barcodes HOT 1
- R1 R2 reads count inconsistent
- Problem with installing LRSIM HOT 4
- Small test-set sequence
- Can't locte Math/Random.pm HOT 1
- Barcode issue in few-molecules case HOT 2
- time HOT 13
- Increased read depth flanking N-stetches in reference HOT 1
- Few SNPs generated using two haplotype sequence HOT 1
- non overlapping region HOT 27
- Complile fails HOT 1
- LRSIM phase4 problem HOT 1
- Extension to BGI stLFR HOT 1
- SURVIVOR step not progressing HOT 1
- LRSIM crashes and reports "not defined chr1_182578874_182579@chr1" HOT 1
- Ran out of barcodes HOT 3
- 0 readpairs per molecule
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lrsim.