aquaskyline / lrsim Goto Github PK
View Code? Open in Web Editor NEW10x Genomics Reads Simulator
License: MIT License
10x Genomics Reads Simulator
License: MIT License
if set -x 10
, LRSIM stuck at 900000 reads remaining
. I think it goes into an infinite loop.
Works well when set -x 9
or other values.
Hi
we are trying to test LRSIM
on a sample data which is generated by concatenating the short scaffolds of a fragmented draft genome and now the scaffolds are all longer than 150K.
Now! while we launch it this error immediately stops the program:
Terminate program as it could not find a non overlapping region
The reads we have generated have a characteristic which might cause the error:
as you can see we have much N
at the middle of scaffolds.
AAAAAAAATAATAAATAAAATTTTTTTTATGAATTATTTTCCCTAAATTTTACGTGGGATTTTAAAGAGTTTTATCTGCTTATATGTAATTTATGAAACATTTTATCAACTTATATGTAATGTTTGACAAATTTGTTCTAATAAATCAAATAAGTTACCAAAATAATATTAAAATGAAATAGTTTGATCAATATTAATAAACTACAAATGTTACGGGATGGACTCTAGAATCGTTATTAGATTTTCAACAATTGTTTTTTTGAGTGTAATTTGTGTACTAGACTTTTGATTGTATTGTTTTAATGAGTTTTAATAAGTATACTTGCTTTTTTGACTGTACTGGGTTTAATGAATGTAAAAGGTTTAATGTTTTTATTGGCTTAATTAGTGTACTGAGTTTAACGAATGTATTTATTGGGTTTATTCTATGATTTTAATAACTGTATTGTTTTATTGTATGAAACTATGGTGGTTTTCTGTAACAAAAATTCTATTGGCTTTCTGCATAAGACTCATTTGATTTAATGAAATTAGCCGACATTTTATCAACAATGTGGTTTTATGTGAATGACATTACTGTATGATATTAGTACATATTTTTATGTTTTAGTAAACTTATTAAGATTTGTGTATAATTGTATAAGATAAGTGTATATCGTATTGAAATTAATACTTATTGTAATGAAATGAGCAATACAATTATTGAAATTACTATTTACTTATGAAATTTATGTGTATAATTTATTGGAATGATACGTTACTATTAAAATCTATGTATATTTTTAAATATGTATTGAATGTATTGCATTGATAGAACTACATACATTGATTGATTTAAGAGAGCGTATTAAATGAGTGAATTTGATGAATTGGTTGAGTATATTGTGTGAGTAGAGTCAGTGCATTGAGTGAATTTGATGGATTAAATTGGTTGAATGTAATGAATGCATTGTTATATTGAATTCTATTCATTTGATCAAATGCTATGAGATGATTGAATAGATAATTGAATGAAGTAAGTGTATCAAATAAATAATAATAGAATCAAGTTGTTCCATATCAACTTCTAGTAAATATTTGAAATATTATCTTGAATTCTAATAAATATTTGAAAATTTTCATTGACTTCTATGTAATTTTTGAAAACTATTTCCCATGTTTTACGTGGGATTGTGAAATATTTTATCAACTGAAATCTAAAAGCCTAAAACCTTAATGATACTTTAAAAATTTAAAAAGCTCTAAAAATAAAACTTCAAAATTACAATGGGCCATTCAAACAATTTTCAATATTTACATTACTTTTAATTTTGAAAGACTGTTTATTCCTGTTCAAATGTGAATCCATCAATTAATTTAAATTTTAAAACATTGCTTTTACAATTGTATAACAAGCGATAAAACCCTATAAAATCCTATACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAAAGCTCTAAAAATAAAACTTCAAAATTAAAATGGGCCATTCAAACAATTTTCAATATTTACATTACTTTTAATTTTGAAAGACTGTTTATTCCTGTTCAAATGTGAATCCATCAATTAATTTAAATTTTAAAACATTGCTTTTACAATTGTATAACAAGCGATAAAACCCTATAAAATCCTATACTAACACTATCATAAACCCAAAAGGGCCCTATAGCACTCCATTAAGACCAGATAAACGTCTAAGAAAACCCATAAAACCTTATTCAACATACACAATCCAACAGTCTAATAAACACTTTCAAAGGATTATATCATGTAAGTGGCAACAAACAAAAGAACCATTAATGAGAGTTTAGGCAAAAATACGCATGAGTCTTAGATAACTTTTAATCGGTTATTGATTATTATCATTGATTATTATAATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATTATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATTGATTATTATCATTGATTATTATCATTGATTATTATAATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATCATTGATTATTAAAATTGATTATTATTGATTATTATTATTATTATTGATTATTATTATTATTATGATTATTATTGATATTGAATATTATTGATTATTGATTATTATCATTGATTATTATCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTATTGTTATTATTATTATTATTTTGCAGAAGATTGGAATTCTGCAAATCTTCCCCTTGAAATAATGTTACAAGTTATTTATATGTCATTCGATCTGATTCTTTCAAAAGAAAGTACACAGATCAATGGAATAATTTGTTTTTTTGACCTAGAAGGCTGTGATAGAAAATCTTTGGAAACATGGTCCGATCCACAACTGTTAAAATCGATAAACAGAATATGGCAAGTAAAAAGTATTATGAAAATTATAATAATTGTTTTGATTAGGATGCTTTTCCAATAAGAATGAAAGGGATAATCTATTACAAAGCTCCAACAATATTTAATGTGGTGTTGAAAATTATCAAGTTTTTTATAGCAGAAAAACACAAACAAAGAATGTTTCAAATTGAGAACCTGGAAAATTTATTTAGTAAGAATCTAGGTTTGGATGAGATCATGCCAATTGAGTATGGAGGAAAAGGTGGAAAACTAGAAGATAAAGTTGGTAAGATTTTTAGATTATTAATCGCATACATAGAGAATCGAGTCTGTGAAGGAATACCAGTCAAAAGCTTTTTCAAAATCTGATTTGTAAAATCGATATTAAAGTTTATTATTTTTCTGCTTTTCAAACAGATGTAATAGATTTTCTAAAAACAAAATAAAACTAAATTTGTTTGCATTAATATCGATAAAAATACTAAATACGCGCGGCTGACAGTTGTGGTGCTAACTTTAAGAAATAATTTATATGATTCTTTGCATATAATAAATAATATTTGTTATATAAGAGTGTCAGATTATGAAAAAATTGAAACTGAAATAAAACAATCTTTACAGAGTAAGCAGAATTAAGAAATAAGCTCAAAGGTTAAGTATATTTAAATATGTTGAAAATGTTCTGAATGAAATCTTTAATTTTCCAGTAGACCTCATCAACCAAGAGTGGTTAAACTAAAATAATACACAGCTTTATCTGAGGTAATTGGAAATATGATGAGCTGATTTACAGGAAACGACATACTTTACGAAATGATCACATTGATAAATAATAGTTACTAGTCTTGACATTAATTATATTTACAGTAACTACTCTGTTATCCGCCAAATCCCAATAACCGCCATTTTTAATAAAGAAATTAATAAATAATTAAATTTTTTACTGGATTTTTTAATAAATTTTATTATTTTTTATTCTAAAAGCCGCCATTTTCTAGAAAGCACCACCTTTTTTCACAGTCCCGTTTTTGGTAGATATGAGAGTACTAGTCAAATGTCTTTTTCACTATTAGAGATATAATTAATAATATAATTATAATAAATATATTTGTTGAGATACTTTGGGGGATTATTACCAAAGTAAATCAGAAATGCGTGAGTTTAGTGATTGAGTTTAAATATTAGATAATCCCGATATATTAAAGTTATAAATTTTCCTTATATACCTTAATATGTTTAATATAGATATAATGATATTTATAGATATACTATATAGTAATCCTTATAAATCCAATTTAAAATTAACCTTTTAAAAATAAACGTATATTATTCAATAAACACTTTAACAATTCAAATAAATTTAATGCGTTAGAGATTTTTGCTCAGTTGTGAATCCGGAGATGATTCAAACTGTTGTACAGCTTAAAAAATGGTGAAAATAATGAGTTTCAACTCTCTATTTATACTAGATCGTTGCTCATTATTTTCGCGTAATTATAAATATTTACATGAAAAATCAAACACCATTTACCGACATTTGAACATTATTTATTAAAAGTTTTATATATTTACAAAAATATTTATACGAATGTCAAAAAATAATGTACTTACGTTTATGTAAAATTTGACAATTTCCATTAAATAAAAATAAAATCAAAATCCAATAATTTTTTAAATTAAAATCTCTAAAATATCAGCAAAATTCATAATATACCAGTTCAATAATTGATAACAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
@aquaskyline
Meanwhile, I corrected the reads and tested with the reads having no N, but the same error occurred and the complete log is:
(base) mostafa@srv-research:/mnt/hdd2/mostafa/Bio-Mostafa/data/10xSimulation/simulated_linkedReads/draftV01$ simulateLinkedReads.pl -r /mnt/hdd2/mostafa/Bio-Mostafa/data/10xSimulation/simulated_linkedReads/djBaseGenomeV01.fasta -p draftv01_sread
Tue Jul 9 09:58:26 2019: draftv01_sread.status
Tue Jul 9 09:58:26 2019: Variant simulation mode enabled
Tue Jul 9 09:58:26 2019: SURVIVOR start
Tue Jul 9 09:58:26 2019: Running: /mnt/hdd2/mostafa/Apps/LRSIM/SURVIVOR 0 /mnt/hdd2/mostafa/Bio-Mostafa/data/10xSimulation/simulated_linkedReads/djBaseGenomeV01.fasta draftv01_sread.hap.parameter 0 draftv01_sread.hap 1000
Terminate program as it could not find a non overlapping region
Tue Jul 9 09:58:34 2019: SURVIVOR error on missing draftv01_sread.hapA.fasta
No such file or directory at /mnt/hdd2/mostafa/Apps/LRSIM/simulateLinkedReads.pl line 748.
unable to delete draftv01_sread.hapA.fasta at exit
unable to delete draftv01_sread.hap.hetA.insertions.fa at exit
unable to delete draftv01_sread.hap.hetB.insertions.fa at exit
unable to delete draftv01_sread.hapB.fasta at exit
unable to delete draftv01_sread.hap.homAB.insertions.fa at exit
Regards
Just a small bug I noticed.
The LRSIM output read names have the suffix "/1", for both read 1 and read 2 in a pair. This could potentially confuse downstream tools.
Hello,
I have used LRSIM to generate a small set of linked reads for 60 MB reference:
perl simulateLinkedReads.pl -g ${REF}/selected_scfs_alleles_no_N.fa -p ${OUTDIR}/default_params -n -z 7 -x 1 -m 4 -t 3 -o
After generating, the folder looks so:
10X_FASTQ/
├── default_params.0.fp
├── default_params.0.manifest
├── default_params.0.sort.manifest
├── default_params.dwgsim.0.12.fastq
├── default_params.hap.0.clean.fasta
├── default_params.hap.0.clean.fasta.fai
├── default_params.status
├── default_params_S1_L001_R1_001.fastq.gz
└── default_params_S1_L001_R2_001.fastq.gz
0 directories, 9 files
Then, I have tried to use Long Raner align
mode to align the simulated reads to my reference:
longranger align --id=default_params --reference=${REF} --fastqs=/project/sweet/evgeny/10x/10X_FASTQ --sample=default_params
After some steps occurs the error:
Running preflight checks (please wait)...
2018-03-21 10:19:40 [runtime] (ready) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:43 [runtime] (split_complete) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:43 [runtime] (run:local) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS.fork0.chnk0.main
2018-03-21 10:19:46 [runtime] (chunks_complete) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:49 [runtime] (join_complete) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:55 [runtime] (ready) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS
2018-03-21 10:19:55 [runtime] (run:local) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS.fork0.split
2018-03-21 10:19:58 [runtime] (split_complete) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS
2018-03-21 10:19:58 [runtime] (run:local) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS.fork0.chnk0.main
2018-03-21 10:20:01 [runtime] (failed) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS
[error] Pipestance failed. Error log at:
default_params/ALIGNER_CS/ALIGNER/_LINKED_READS_ALIGNER/_FASTQ_PREP_NEW/BUCKET_FASTQS/fork0/chnk0-u28b3b223be/_errors
Log message:
stage error:FASTQ parsing error: input fastq not consistent
What could be the reason of this error? Could it happen due to small set size?
Do you maybe need some additional files / information?
Thank you!
Evgeny
When I run test.sh I get:
perl: symbol lookup error: ./lib/auto/Math/Random/Random.so: undefined symbol: Perl_Gthr_key_ptr
I am using perl version v5.20.2
Is it possible to allow smaller simulations (i.e. smaller -x
)? At the moment, I receive the message
The value of -x should be set between 400 and 800
I have tried using -o
(with -x 1
and -x 5
), but the program seems to hang:
...
Tue Jan 10 13:06:35 2017: DWGSIM round 0 thread 3 end
Tue Jan 10 13:06:35 2017: cat sim.dwgsim.0.3.12.fastq >> sim.dwgsim.0.12.fastq
[dwgsim_core] 187500
[dwgsim_core] Complete!
Tue Jan 10 13:06:53 2017: DWGSIM round 1 thread 1 end
[dwgsim_core] 187500
[dwgsim_core] Complete!
Tue Jan 10 13:06:53 2017: DWGSIM round 1 thread 2 end
[dwgsim_core] 187500
[dwgsim_core] Complete!
Tue Jan 10 13:06:58 2017: DWGSIM round 1 thread 0 end
Tue Jan 10 13:06:58 2017: cat sim.dwgsim.1.1.12.fastq >> sim.dwgsim.1.12.fastq
Tue Jan 10 13:06:58 2017: cat sim.dwgsim.1.2.12.fastq >> sim.dwgsim.1.12.fastq
Tue Jan 10 13:06:58 2017: cat sim.dwgsim.1.3.12.fastq >> sim.dwgsim.1.12.fastq
Tue Jan 10 13:06:58 2017: Simulate reads start
Tue Jan 10 13:06:58 2017: Load barcodes start
Tue Jan 10 13:07:00 2017: Load barcodes end
Tue Jan 10 13:07:00 2017: readPairsPerMolecule: 0
Tue Jan 10 13:07:00 2017: Simulating on haplotype: 0
Tue Jan 10 13:07:00 2017: Load read positions haplotype 0
Tue Jan 10 13:07:09 2017: 0 reads failed being loaded.
Tue Jan 10 13:07:09 2017: Exporting sim.0.fp
Tue Jan 10 13:08:35 2017: Exported sim.0.fp
Tue Jan 10 13:08:35 2017: readsCountDown: 500000 (stuck here)
My reference is hg19.
I'm getting this error. Which version of perl do I need to install on linux?
perl: symbol lookup error: ../lib/auto/Math/Random/Random.so: undefined symbol: Perl_Gthr_key_ptr
Hi there,
I use two haplotype fasta sequences as input to simulate 10X linked reads in human genome. By default, it should output 3M SNPs because -1 parameter (1 SNP per INT base pairs [1000]). But I only got ~10k SNPs. Could you help to figure out the problem? The command I used is below:
./simulateLinkedReads.pl -g hap1_genome.fa,hap2_genome.fa -p HG002_sim -7 0 -0 0
Best,
Peng Xu
Hi
As I was taking the steps to install LRSIM, faced the below error while running the sh test.sh
.
Can't locate Math/Random.pm in @INC (you may need to install the Math::Random module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at ../simulateLinkedReads.pl line 35.
BEGIN failed--compilation aborted at ../simulateLinkedReads.pl line 35.
un commented these two lines:
#use lib "./lib";
#use lib dirname($0)."/lib";
but was not helpful and faced this error:
then deleted the lib
folder and did the recommended steps in #1 , but was faced the same errors.
Hi aquaskyline,
I am trying to use LRSIM to simulate 50X linked reads for a small portion of human genome. My selected genomic region is around 1M bp. To adapt the small reference set, I intend to do following changes on linked reads options.
-x = coverage * my_reference_length/(insertion_size + sd of pairs) = 50*1M/(350+35)
-f = default setting
-t = human_genome_length / my_reference_length * default_t = 3,000,000,000 / 1,000,000 * 1,500,000
-m = default setting
To get the best simulated linked reads data, any suggestion on my modification?
Thanks a lot.
Lindsay
Hi,
I am attempting to run LRSIM on a human chr1, but I'm encountering the aforementioned error.
Here is the command I'm using: perl ../simulateLinkedReads.pl -r ./Chr1.fasta -p SapiensChr1 -c fragmentSizesList -x 30 -f 50 -t 500 -m 10 -0 0 -o
And here is LRSIM output:
Tue Mar 16 16:27:09 2021: SapiensChr1.status
Tue Mar 16 16:27:09 2021: Variant simulation mode enabled
Tue Mar 16 16:27:09 2021: SURVIVOR start
Tue Mar 16 16:27:09 2021: Running: /home/morispi/StructuralVariants/LRSIM/SURVIVOR 0 ./Chr1.fasta SapiensChr1.hap.parameter 0 SapiensChr1.hap 1000
Tue Mar 16 16:27:22 2021: SURVIVOR end
Tue Mar 16 16:27:22 2021: Build genome index start
Tue Mar 16 16:27:22 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.0.fasta 0 > SapiensChr1.hap.0.clean.fasta
Tue Mar 16 16:27:26 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.1.fasta 0 > SapiensChr1.hap.1.clean.fasta
Tue Mar 16 16:27:30 2021: /home/morispi/StructuralVariants/LRSIM/samtools faidx SapiensChr1.hap.0.clean.fasta
Tue Mar 16 16:27:32 2021: /home/morispi/StructuralVariants/LRSIM/samtools faidx SapiensChr1.hap.1.clean.fasta
Tue Mar 16 16:27:36 2021: Build genome index end
Tue Mar 16 16:27:36 2021: DWGSIM round 0 thread 0 start
Tue Mar 16 16:27:36 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.0
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
0Tue Mar 16 16:27:38 2021: DWGSIM round 0 thread 1 start
Tue Mar 16 16:27:38 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.1
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
0Tue Mar 16 16:27:40 2021: DWGSIM round 0 thread 2 start
Tue Mar 16 16:27:40 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.2
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
[dwgsim_core] 20000Tue Mar 16 16:27:43 2021: DWGSIM round 0 thread 3 start
Tue Mar 16 16:27:43 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.3
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
[dwgsim_core] 280000Tue Mar 16 16:27:46 2021: DWGSIM round 1 thread 0 start
Tue Mar 16 16:27:46 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.0
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
[dwgsim_core] 280000Tue Mar 16 16:27:50 2021: DWGSIM round 1 thread 1 start
Tue Mar 16 16:27:50 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.1
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
[dwgsim_core] 180000Tue Mar 16 16:27:53 2021: DWGSIM round 1 thread 2 start
Tue Mar 16 16:27:53 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.2
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
[dwgsim_core] 450000Tue Mar 16 16:27:56 2021: DWGSIM round 1 thread 3 start
Tue Mar 16 16:27:56 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.3
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on:
[dwgsim_core] 410000Tue Mar 16 16:27:58 2021: DWGSIM round 0 thread 3 end
[dwgsim_core] 510000Tue Mar 16 16:28:02 2021: DWGSIM round 0 thread 1 end
[dwgsim_core] 1290000
[dwgsim_core] Complete!
Tue Mar 16 16:28:38 2021: DWGSIM round 0 thread 0 end
Tue Mar 16 16:28:38 2021: cat SapiensChr1.dwgsim.0.1.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1490000
[dwgsim_core] Complete!
Tue Mar 16 16:28:45 2021: DWGSIM round 0 thread 2 end
Tue Mar 16 16:28:45 2021: cat SapiensChr1.dwgsim.0.2.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1330000Tue Mar 16 16:28:51 2021: cat SapiensChr1.dwgsim.0.3.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1750000
[dwgsim_core] Complete!
Tue Mar 16 16:28:54 2021: DWGSIM round 1 thread 1 end
[dwgsim_core] 1770000
[dwgsim_core] Complete!
[dwgsim_core] 1510000Tue Mar 16 16:28:55 2021: DWGSIM round 1 thread 0 end
Tue Mar 16 16:28:55 2021: cat SapiensChr1.dwgsim.1.1.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
[dwgsim_core] 1875000
[dwgsim_core] Complete!
Tue Mar 16 16:28:57 2021: DWGSIM round 1 thread 2 end
[dwgsim_core] 1700000Tue Mar 16 16:28:59 2021: cat SapiensChr1.dwgsim.1.2.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
[dwgsim_core] 1875000
[dwgsim_core] Complete!
Tue Mar 16 16:29:02 2021: DWGSIM round 1 thread 3 end
Tue Mar 16 16:29:02 2021: cat SapiensChr1.dwgsim.1.3.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
Tue Mar 16 16:29:07 2021: Simulate reads start
Tue Mar 16 16:29:07 2021: Load barcodes start
Tue Mar 16 16:29:09 2021: Load barcodes end
Tue Mar 16 16:29:09 2021: Using fragment sizes from fragmentSizesList instead of Poisson distribution
Tue Mar 16 16:29:09 2021: 10000 sizes loaded
Tue Mar 16 16:29:09 2021: Average fragment size: 50kbp
Tue Mar 16 16:29:09 2021: readPairsPerMolecule: 2
Tue Mar 16 16:29:09 2021: Simulating on haplotype: 0
Tue Mar 16 16:29:09 2021: Load read positions haplotype 0
Tue Mar 16 16:29:21 2021: not defined chr1_182578874_182579@chr1
Inappropriate ioctl for device at ../simulateLinkedReads.pl line 748, <$fh> line 19543360.
Command exited with non-zero status 25
It does not seem to be a memory issue, since it only uses 4 GB.
Moreover, when when I try to slightly modify the parameters (for instance setting -x 10
or -n
to skip the variants simulation), the error seems to change randomly.
I once had Cannot find correct chromosome and position in @chr1_82788009_82787815_1_0_0_0_0:0:0_0:0:0_0/2
, and once had Cannot find correct chromosome and position in IFIHIGIIFIGFEBFCHD@DEECDCBEDECCB@BCBABBFBCABCA@DC@BAAAB@?A@?@?>?B?C@?@B<>??:??@@>>?>A@==A@@@@<@A@@>>B=@>?>C>?>?=>??;;>>?=?>==?=>;?;;==<
and other variations.
I'm having a hard time understanding what is going on here.
I already managed to run LRSIM correctly on smaller datasets and never encountered this issue.
Do you have any suggestions?
Thanks,
Pierre
Dear respected LRSIM team
Thanks for your great package, I'm using LRSIM to generate linked reads for a small genome. I used the following
simulateLinkedReads.pl -g hap1.fa,hap2.fa,hap3.fa -p out/sim -o -n -x 1 -f 100 -m 3 -t 1
It works. But, I think it there is a problem in LRSIM's output. It seems that each haplotype are considered separately. By setting -m 3, in each partition, three molecules originates from a haplotype. But, in a real 10x device, each of three molecules may come from different haplotypes.
This can be interfered from the manifest file. There is no shared barcode within manifests correspond to haplotypes.
Would you please tell me whether I am right or not? If yes, It will be appreciated if you tell me how I can overcome this issue.
I recently discovered an issue in LRSIM execution which resulted in duplicate readnames with different barcodes to end up in the final .bam file, resulting in errors when running with downstream LongRanger alignment.
The problem can be traced back to DWGSIM, which appears to have simulated two reads in the same position, on top of this giving them the same name (I assumed DWGSIM checks for this type of event, so maybe this is a result of the LRSIM parallelization). Then during 10X read simulation, two barcodes were simulated which contained molecules that shared some overlap, which resulted in one of the duplicate reads to be assigned to one barcode and the other barcode to the other. A very rare event, but not impossible clearly.
Here is a trace of the duplicate reads throughout the various files:
** DWGSIM fastq **
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/1
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/2
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/1
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/2
** LRSIM .fastq R1 **
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 1:N:0:1
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 1:N:0:1
** LongRanger ALIGN .bam with line numbers prepended **
1207925978:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 163 chr10 119294682 60 151M = 119294844 290 CGGCTGCTCCCAGAGAGAGTTGGGGTCTTCTCAGGGCCCGCGATGGGGGAGTGGTCGTGGTCAGACCCCCGTGAGCCCCTTCGGAAGGTCCCAGTCCCTGTCCATTCTTCTGTCCCGCAGCTCTCTCCGCGCAGGCGGGGCAGAGCCGGGG A?@@???<?@??@<=C?CACB>AB@?A@A@>@?AA@=??@>=>@?@??>@?>>??A>A@@?BA???BB>@><>??@?@<@A<A?B?A@?<A>A??@??>B?>BB?>@B>B?>??AAC??=>>A@=A>@B???A@?===??><A==AC<B?? RX:Z:TCTGCGTAGTCCTGAT QX:Z:AAAFFFKKKKKKKKKK XS:i:-81 AS:i:0 XM:Z:0 AM:Z:1XT:i:0 BX:Z:TCTGCGTAGTCCTGAT-1 DM:Z:0.150000 RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0 OM:i:60
1207925979:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 163 chr10 119294682 60 151M = 119294844 290 CGGCTGCTCCCAGAGAGAGTTGGGGTCTTCTCAGGGCCCGCGATGGGGGAGTGGTCGTGGTCAGACCCCCGTGAGCCCCTTCGGAAGGTCCCAGTCCCTGTCCATTCTTCTGTCCCGCAGCTCTCTCCGCGCAGGCGGGGCAGAGCCGGGG A??A=?@>@=??????D?>?=@??B?A????CC???@??@A@?@?>B@B=@;?DA=B?>=?>??AAA???>@@=B??BAD@@?@?@B?@@CA@C??>=@C???BA?A??@A?BC??=@C?>C?=A@C?C??A?=A@A@@A?>?@??@=@@@ RX:Z:GTTTGTTTCGATGGCC QX:Z:AAAFFFKKKKKKKKKK XS:i:-82 AS:i:0 XM:Z:0 AM:Z:1XT:i:0 BX:Z:GTTTGTTTCGATGGCC-1 DM:Z:0.115385 RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0 OM:i:60
1207926094:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 83 chr10 119294844 60 128M = 119294682 -290 GGACGAGGGGTCTTGGGGCCGCCTCGCTGGCTGCGGTTGGAAGCACCCGTTTTCCCGCCCGCCCGCGCAGGCGCTGCTCTGTGGCCACCAGCAGAGGTTTCCCGGCCGCTGTGAGTCGCCCACGCGAG <>?AA??=??@???@ABB=??@?@>>=@??@>A???;?????????????@>@?A?A=>A@?????A?=B?@@>??A??D@@?>?>????=B???@A??>?<A@>?;@B??=??@@@A>???C>???? RX:Z:TCTGCGTAGTCCTGAT QX:Z:AAAFFFKKKKKKKKKK TR:Z:CACGTCG TQ:Z:A??<CC? XS:i:-70 AS:i:0 XM:Z:0 AM:Z:1 XT:i:0BX:Z:TCTGCGTAGTCCTGAT-1 DM:Z:0.150000 RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0 OM:i:60
1207926095:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 83 chr10 119294844 60 128M = 119294682 -290 GGACGAGGGGTCTTGGGGCCGCCTCGCTGGCTGCGGTTGGAAGCACCCGTTTTCCCGCCCGCCCGCGCAGGCGCTGCTCTGTGGCCACCAGCAGAGGTTTCCCGGCCGCTGTGAGTCGCCCACGCGAG =>=A???B??B@?BB?>;??@?B?A@=?>?AA>AE=<><@?@?AE?CAB@@???A>??@@B>A>@<@A>@<>A?>?<A?>?@@@@?A@?@?>?>?B??????B?AC?@<A???B@?>?BCCB?@A@?@ RX:Z:GTTTGTTTCGATGGCC QX:Z:AAAFFFKKKKKKKKKK TR:Z:CACGTCG TQ:Z:@AA?A?> XS:i:-70 AS:i:0 XM:Z:0 AM:Z:1 XT:i:0BX:Z:GTTTGTTTCGATGGCC-1 DM:Z:0.115385 RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0 OM:i:60
Note that the two read pairs share a name, but have different barcodes.
It would be great if there was some duplicate-detection and subsequent renaming of the files.
Hello,
I recently found about LRSIM which seems to be super useful for gaining better understanding of SV tools.
However, I'm trying to generate a toy dataset, and then align it to a reference with LongRanger, and LongRanger always stops and reports "stage error:Extremely high rate of incorrect barcodes observed (99.90 %). Check that input is 10x Chromium data, and that there are no missing cycles in the first 16bp of Read 1. Please note Long Ranger 2.0 and above do not support GemCode data.".
I did read from another issue that the "/1" and "/2" have to be removed from the end of the headers in order for LongRanger to work with LRSIM data, but removing them did not seem to help.
Here is the command line I'm using for generating the data: perl simulateLinkedReads.pl -r References/Ecoli.fasta -p Ecoli/SimEcoli -n -x 100 -o
I used a lower -x
value because I don't need a lot of reads for now. Can it be the cause of the issue? Leaving it to the default 600 seems to generate too many reads for the toy tests I want to perform, hence why I lowered it. I also used the -o
option as advised in another issue I found after a bit of searching.
Is there anything I'm doing wrong, or could you advice me how to properly use LongRanger with LRSIM data?
Thanks in advance.
Best,
Pierre
The following is what I got when sh make.sh
:
gcc -g -Wall -O3 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk .o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o padding.o bedcov.o bamshuf.o bam_tview_curs es.o bam_tview_html.o libbam.a -Lbcftools -lbcf -lcurses -lm -lz -lpthread
/usr/bin/ld: cannot find -lcurses
collect2: error: ld returned 1 exit status
Makefile:57: recipe for target 'samtools' failed
make[1]: *** [samtools] Error 1
make[1]: Leaving directory '/research/LRSIM/DWGSIMSrc/samtools'
Makefile:25: recipe for target 'all-recur' failed
make: *** [all-recur] Error 1
Hi
we are testing LRSIM for to generate 10X reads for an organism with genome size about 1.5Gbp. As I was noticed, you had mentioned that the normal time for a genome like human genome will take a time less than 10 hours. But it is 283 hours we are running the LRSIM on 16 threads and about 160 GB memory.
is it abnormal? do you recommend to stop it?
I see the following error when compiling msort
with both g++-7
and clang++
:
clang++ -c -o stdhashc.o stdhashc.cc
In file included from stdhashc.cc:2:
./stdhash.hh:496:13: error: use of undeclared identifier 'direct_insert_aux'
int ret = direct_insert_aux(key, this->n_capacity, this->keys, this->flags, &i);
^
this->
stdhashc.cc:72:34: note: in instantiation of member function 'hash_map_misc<unsigned int, int>::insert' requested here
return ((hashii_cpp_t*)h->ptr)->insert(key, value);
^
./stdhash.hh:295:13: note: must qualify identifier to find this declaration in dependent base class
inline int direct_insert_aux(const keytype_t &key, hashint_t m, keytype_t *K, __lh3_flag_t *F, hashint_t *i) {
^
1 error generated.
See the full log here: https://gist.github.com/sjackman/d4de672ec5f8a44f5276cef9edf9a28f#file-02-make-L12
Based on a reference genome, I simulated haplotypes using a different algorithme. Using these haplotypes as input, I used LRSIM to simulate 10x data using the command:
simulateLinkedReads.pl -g DMsim1.hap.0.clean.fasta,DMsim1.hap.1.clean.fasta,DMsim1.hap.2.clean.fasta,DMsim1.hap.3.clean.fasta -p DMsim1 -x 100 -f 20 -u 3 -z 4 -o
But, when mapping the reads of the simulated 10x dataset to the reference genome, I noticed an increased read dept on the left side of all N-stretches of my reference genome.
For example, one of my simulated reads is
@chr1_198003258_198003034_1_0_0_0_0:0:0_0:0:0_0/1 1:N:0:1 AGAACAAGTCTCTATTGGCCAACATCTGGACAGCTTGTAGTTGAGCTGAATATGCTGCTGTGTGAATTACAAAGGTATGACAAATTTTTTACTCTGTTCTAATTTGGCTCGGCCTGCCTGCCTTCAGCTTTTTTGGCACAGCTTCCCACAT + AAAFFFKKKKKKKKKKFIIHGGFHFFFHGEFGEBAEFCACECACCCCCCCBFBBBCBCE?B=B=@?D>ADAAB@@@>@?ABBBCB@@B>?C@??D?@?A@B???A=AA?=><=>??>??>@>@?>>@@=?>?=??;==>?8>><===>??;
which yields the alignments (using the SNAP aligner):
chr1_198003258_198003034_1_0_0_0_0:0:0_0:0:0_0:AGAACAAGTCTCTATT 163 chr1 197999799 70 151= = 198000023 352 GTGTCCTAGAGAAGCAGACTCAAATAACAAATCCCTGTGTAACTGCAAAGGTTTATACAAAGTGGCATTCCATGCAGAGTAGAGAATATGATGTAAAGAGCCATCAAACATTATGAGATCCCTCCCCTGCAGCACATAAACAAAGTGAGGT IIHIFGFFHDFFFDGFCECDCCDDEDBCDBBBDCCBABBBBBEBDBE?A@EAAA@AA@@B?B=A@?@A@@E@??C@@AA@==?A??=??>>?>@A@@=?=>>@B>B>>>>>?<>@C?>@?@=<@=>=?9??=>=<=>>? PG:Z:SNAP NM:i:0 RG:Z:FASTQ PL:Z:Illumina PU:Z:pu LB:Z:lb SM:Z:sm chr1_198003258_198003034_1_0_0_0_0:0:0_0:0:0_0:AGAACAAGTCTCTATT 83 chr1 198000023 70 128= = 197999799 -352 ATGTGGGAAGCTGTGCCAAAAAAGCTGAAGGCAGGCAGGCCGAGCCAAATTAGAACAGAGTAAAAAATTTGTCATACCTTTGTAATTCACACAGCAGCATATTCAGCTCAACTACAAGCTGTCCAGAT ;??>===<>>8?>==;??=?>?=@@>>?@>@>??>??>=<>=?AA=A???B@A?@?D??@C?>B@@BCBBBA?@>@@@BAADA>D?@=B=B?ECBCBBBFBCCCCCCCACECACFEABEGFEGHFFFH PG:Z:SNAP NM:i:0 RG:Z:FASTQ PL:Z:Illumina PU:Z:pu LB:Z:lb SM:Z:sm
However, these alignments are both offset by 3235 (and I have checked that there are no secondary alignments at the reported positions). All the other reads also seem to be offset by various amounts relative to the reported true positions.
I downloaded LRSIM, but when running make.sh, I got some error messages:
gcc -g -Wall -O3 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o padding.o bedcov.o bamshuf.o bam_tview_curses.o bam_tview_html.o libbam.a -Lbcftools -lbcf -lcurses -lm -lz -lpthread
/usr/bin/ld: bcftools/libbcf.a(bcf.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: bcftools/libbcf.a(bcfutils.o): relocation R_X86_64_32S against
.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
Makefile:57: recipe for target 'samtools' failed
make[1]: *** [samtools] Error 1
make[1]: Leaving directory '/media/bulk_01/users/lavri002/bin/LRSIM/DWGSIMSrc/samtools'
Makefile:25: recipe for target 'all-recur' failed
make: *** [all-recur] Error 1
As a result, the exacutables needed to run LRSIM are not copied to the LRSIM directory. How can I solve this?
simulateLinkedReads
failed with the error Floating point exception
. I'm trying to simulate reads with no simulated variations. Perhaps that's related.
❯❯❯ simulateLinkedReads.pl -z 16 -x 524 -d 1 -1 0 -4 0 -7 0 -0 0 -r GRCh38.wrap.fa -p sim.lr
Sat Jan 20 14:19:27 2018: sim.lr.status
Sat Jan 20 14:19:27 2018: Variant simulation mode enabled
Sat Jan 20 14:19:27 2018: SURVIVOR start
Sat Jan 20 14:19:27 2018: Running: /gsc/btl/linuxbrew/Cellar/lrsim/1.0/SURVIVOR 0 GRCh38.wrap.fa sim.lr.hap.parameter 0 sim.lr.hap 0
sh: line 1: 21814 Floating point exception(core dumped) /gsc/btl/linuxbrew/Cellar/lrsim/1.0/SURVIVOR 0 GRCh38.wrap.fa sim.lr.hap.parameter 0 sim.lr.hap 0 > /dev/null
Sat Jan 20 14:21:21 2018: SURVIVOR error on missing sim.lr.hapA.fasta
No such file or directory at /gsc/btl/linuxbrew/Cellar/lrsim/1.0/simulateLinkedReads.pl line 738.
unable to delete sim.lr.hapA.fasta at exit
unable to delete sim.lr.hap.homAB.insertions.fa at exit
unable to delete sim.lr.hap.hetA.insertions.fa at exit
unable to delete sim.lr.hap.hetB.insertions.fa at exit
unable to delete sim.lr.hapB.fasta at exit
Command exited with non-zero status 2
❯❯❯ cat sim.lr.status
Sat Jan 20 14:19:27 2018: sim.lr.status
Sat Jan 20 14:19:27 2018: Variant simulation mode enabled
Sat Jan 20 14:19:27 2018: SURVIVOR start
Sat Jan 20 14:19:27 2018: Running: /gsc/btl/linuxbrew/Cellar/lrsim/1.0/SURVIVOR 0 GRCh38.wrap.fa sim.lr.hap.parameter 0 sim.lr.hap 0
Sat Jan 20 14:21:21 2018: SURVIVOR error on missing sim.lr.hapA.fasta
Hello,
I have simulated some human chromium data using LRSIM, and while I can run longranger basic fine, Supernova is failing.
Here's the error in the main log:
2017-09-13 23:26:37 [runtime] (failed) ID.GRCh38_LRSIM_supernova.ASSEMBLER_CS._ASSEMBLER.ASSEMBLER_DF
[error] An unexpected error has occurred.
Saving pipestance info to GRCh38_LRSIM_supernova/GRCh38_LRSIM_supernova.mri.tgz
And here's the error in the ASSEMBLER_DF stdout log:
Wed Sep 13 11:25:51 2017: reading in paths --> pathsX, mem = 11.98 GB
ForceAssertGe(numReads,0) at src/10X/paths/ReadPathVecX.cc:295 failed in function
void ReadPathVecX::reserve(int64_t, int64_t)
wwith values arg1 = -2 and arg2 = 0
ForceAssertGe(numReads,0) at src/10X/paths/ReadPathVecX.cc:295 failed in function
void ReadPathVecX::reserve(int64_t, int64_t)
wwith values arg1 = -2 and arg2 = 0
ForceAssertGe(numReads,0) at src/10X/paths/ReadPathVecX.cc:295 failed in function
void ReadPathVecX::reserve(int64_t, int64_t)
with values arg1 = -6 and arg2 = 0
Just wondering if you have seen this error when running Supernova with the simulated data, and if you know what is causing this failure?
Hello, I'm trying to run LRSIM on a drosophila genome and finding that it keeps saying that it's simulating 0 read pairs per molecule. Perhaps I'm putting in the wrong parameters, but I can't seem to determine what's going on. The genome is >100mb as recommended.
The runtime parameters are:
NUMREADS=1 # in millions
MOLLEN=80 # in kbp
MOLPER=5
NUMINV=50
NUMINDEL=0
SNPPER=200000
INVMIN=1000
INVMAX=10000
TRANSLOC=0
PARTITIONS=2500 # default: 1500
NUMTHREADS=8
outprefix="./1millionreads/sims"
BARCODES="barcodes-500M.txt"
./simulateLinkedReads.pl \
-r dmel.trunc.noN.fa \
-p $outprefix \
-b $BARCODES \
-x $NUMREADS \
-f $MOLLEN \
-m $MOLPER \
-1 $SNPPER \
-4 $NUMINDEL \
-5 $INVMIN \
-6 $INVMAX \
-7 $NUMINV \
-0 $TRANSLOC \
-z $NUMTHREADS \
-o
[dwgsim_core] 2L length: 23582449
[dwgsim_core] 2R length: 25422852
[dwgsim_core] 3L length: 28239365
[dwgsim_core] 3R length: 32150935
[dwgsim_core] 4 length: 1430978
[dwgsim_core] X length: 23654338
[dwgsim_core] Y length: 3765010
[dwgsim_core] 7 sequences, total length: 138245927
[dwgsim_core] Currently on:
[dwgsim_core] 187500
[dwgsim_core] Complete!
[dwgsim_core] 120000Thu Nov 30 11:50:31 2023: DWGSIM round 1 thread 1 end
[dwgsim_core] 187500
[dwgsim_core] Complete!
Thu Nov 30 11:50:33 2023: DWGSIM round 1 thread 2 end
[dwgsim_core] 70000Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.0.1.12.fastq >> ./1millionreads/sims.dwgsim.0.12.fastq
[dwgsim_core] 80000Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.0.2.12.fastq >> ./1millionreads/sims.dwgsim.0.12.fastq
Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.0.3.12.fastq >> ./1millionreads/sims.dwgsim.0.12.fastq
[dwgsim_core] 90000Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.1.1.12.fastq >> ./1millionreads/sims.dwgsim.1.12.fastq
Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.1.2.12.fastq >> ./1millionreads/sims.dwgsim.1.12.fastq
[dwgsim_core] 187500
[dwgsim_core] Complete!
Thu Nov 30 11:50:37 2023: DWGSIM round 1 thread 3 end
Thu Nov 30 11:50:37 2023: cat ./1millionreads/sims.dwgsim.1.3.12.fastq >> ./1millionreads/sims.dwgsim.1.12.fastq
Thu Nov 30 11:50:37 2023: Simulate reads start
Thu Nov 30 11:50:37 2023: Load barcodes start
Thu Nov 30 11:54:38 2023: Load barcodes end
Thu Nov 30 11:54:38 2023: readPairsPerMolecule: 0 <----- THIS PART
Thu Nov 30 11:54:38 2023: Simulating on haplotype: 0
Thu Nov 30 11:54:38 2023: Load read positions haplotype 0
Thu Nov 30 11:54:41 2023: 0 reads failed being loaded.
Thu Nov 30 11:54:41 2023: Exporting ./1millionreads/sims.0.fp
Thu Nov 30 11:54:42 2023: Exported ./1millionreads/sims.0.fp
Thu Nov 30 11:54:42 2023: readsCountDown: 500000 <------ NEVER MOVES PAST THIS PART
Hi,
I am running LRSIM on human chr22 and I am having some troubles with the SURVIVOR step. This is the status file:
humanChr22.status exists
Tue Nov 7 15:19:59 2017: humanChr22.status
Tue Nov 7 15:19:59 2017: Variant simulation mode enabled
Tue Nov 7 15:19:59 2017: SURVIVOR start
Tue Nov 7 15:19:59 2017: Running: /home/myname/path/todir/bin/LRSIM/SURVIVOR 0 chr22.fa humanChr22.hap.parameter 0 humanChr22.hap 1000
I have tried to run the example independently and it hangs at this point, no error:
/home/myname/path/todir/bin/LRSIM/SURVIVOR 0 chr22NoN.fa humanChr22.hap.parameter 0 humanChr22.hap 1000
# Chrs passed size threshold:1
# Chrs passed size threshold:1
First: Genome checking:
First: Genome checking:
generate SV
generate_mutations_diploid function
Any suggestions?
Thank you in advance
Hi,
I am running LRSIM on human chr1 and the SURVIVOR step does not seem to be progressing.
Here is the command I'm using: perl ../simulateLinkedReads.pl -r ./Chr1.fasta -p SapiensChr1 -c fragmentSizesList -x 50 -f 50 -t 500 -m 10 -1 10000 -4 1 -7 1 -0 1 -o
And here is the status: (runtime is short cause I had to rerun the command, but I've let it run for a few hours previously)
SapiensChr1.status exists
Tue Mar 16 13:24:16 2021: SapiensChr1.status
Tue Mar 16 13:24:16 2021: Variant simulation mode enabled
Tue Mar 16 13:24:16 2021: SURVIVOR start
Tue Mar 16 13:24:16 2021: Running: /home/morispi/StructuralVariants/LRSIM/SURVIVOR 0 ./Chr1.fasta SapiensChr1.hap.parameter 0 SapiensChr1.hap 10000
I saw on #12 that the problem could be caused by the lack of new lines in the reference genome file, but my reference genome does contain new lines. Moreover, I'm using the latest version of LRSIM, which includes the updated code for SURVIVOR, which should solve this issue, as mentioned in the replies prior to closing #12.
Moreover, just like in #12, if I use another set of parameters, and set the numbers of variants to simulate to 0, it works just fine, for instance: perl ../simulateLinkedReads.pl -r ./Chr1.fasta -p SapiensChr1 -c fragmentSizesList -x 50 -f 50 -t 500 -m 10 -1 10000 -4 0 -7 0 -0 0 -o
Status is:
Tue Mar 16 13:26:28 2021: SapiensChr1.status
Tue Mar 16 13:26:28 2021: Variant simulation mode enabled
Tue Mar 16 13:26:28 2021: SURVIVOR start
Tue Mar 16 13:26:28 2021: Running: /home/morispi/StructuralVariants/LRSIM/SURVIVOR 0 ./Chr1.fasta SapiensChr1.hap.parameter 0 SapiensChr1.hap 10000
Tue Mar 16 13:26:42 2021: SURVIVOR end
Tue Mar 16 13:26:42 2021: Build genome index start
Tue Mar 16 13:26:42 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.0.fasta 0 > SapiensChr1.hap.0.clean.fasta
And LRSIM then keeps going normally.
I'm not sure to understand what might be causing the issue. Do you have any suggestion?
Thanks a lot.
Pierre
Hi, i tried to simulate 10X reads from a 20MB sequence with :
perl ../simulateLinkedReads.pl -g fastaOriginal.fasta -p OutputLRSIM/default_params -b ../4M-with-alts-february-2016.txt -n -x 1 -f 150 -t 1 -m 3 -o -u 4
and with the test files.
Phase 1 2 and 3 run well, but phase 4 stop at the end :
Thu Oct 17 14:02:47 2019: Simulate reads start
Thu Oct 17 14:02:47 2019: Load barcodes start
Thu Oct 17 14:02:48 2019: Load barcodes end
Thu Oct 17 14:02:48 2019: Using fragment sizes from fragmentSizesList instead of Poisson distribution
Thu Oct 17 14:02:48 2019: 10000 sizes loaded
Thu Oct 17 14:02:48 2019: Average fragment size: 50kbp
Thu Oct 17 14:02:48 2019: readPairsPerMolecule: 100
Thu Oct 17 14:02:48 2019: Simulating on haplotype: 0
Thu Oct 17 14:02:48 2019: Load read positions haplotype 0
Thu Oct 17 14:02:48 2019: 0 reads failed being loaded.
Thu Oct 17 14:02:48 2019: Exporting ./test1.0.fp
Thu Oct 17 14:02:48 2019: Exported ./test1.0.fp
Thu Oct 17 14:02:48 2019: readsCountDown: 500000
Thu Oct 17 16:46:22 2019: Reached end of barcodes list. No more barcodes. Last read processed: 500000. Exiting.
Inappropriate ioctl for device at ../simulateLinkedReads.pl line 748.
How can i fix this problem please ? the software is well installed (sh make.sh end with "Done, please run 'perl simulateLinkedReads.pl'")
when I run sh make.sh I get this error. any suggestions?
make[2]: Leaving directory `/mnt/home/stephen/Apps/LRSIM/DWGSIMSrc/samtools/misc' gcc -g -Wall -O3 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o padding.o bedcov.o bamshuf.o bam_tview_curses.o bam_tview_html.o libbam.a -Lbcftools -lbcf -lcurses -lm -lz -lpthread /mnt/home/stephen/.linuxbrew/bin/ld: cannot find -lcurses collect2: error: ld returned 1 exit status make[1]: *** [samtools] Error 1 make[1]: Leaving directory `/mnt/home/stephen/Apps/LRSIM/DWGSIMSrc/samtools' make: *** [all-recur] Error 1
Also FYI, the first step of the install should be cd LRSIM not cd 10xReadsSimulator
I'm trying to simulate reads to test efficacy of linked reads on low-coverage datasets. This testing will occur for a range of low coverages and sample counts. However, when I try to use LRSIM
, I keep getting a prompt that I've ran out of barcodes. I'm using a truncated d.melanogaster genome that's just the 4 largest chromosomes (for simplicity).
invmin=1000
invmax=10000
mollen=80
milreads=1
molper=10
prefix="sims.${milreads}mil.${molper}per"
threads=8
./LRSIM/simulateLinkedReads.pl -r dmel.trunc.fa -p $prefix -0 0 -x $milreads -f $mollen -m $molper -z $threads -o
How can I successfully produce low-coverage data without the warning that I've run out of barcodes? There is also an error of being unable to concatenate a file. LRSIM output:
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.0.1.12.fastq >> sims.1mil.10per.dwgsim.0.12.fastq
cat: sims.1mil.10per.dwgsim.0.1.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.0.2.12.fastq >> sims.1mil.10per.dwgsim.0.12.fastq
cat: sims.1mil.10per.dwgsim.0.2.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.0.3.12.fastq >> sims.1mil.10per.dwgsim.0.12.fastq
cat: sims.1mil.10per.dwgsim.0.3.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.1.1.12.fastq >> sims.1mil.10per.dwgsim.1.12.fastq
cat: sims.1mil.10per.dwgsim.1.1.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.1.2.12.fastq >> sims.1mil.10per.dwgsim.1.12.fastq
cat: sims.1mil.10per.dwgsim.1.2.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.1.3.12.fastq >> sims.1mil.10per.dwgsim.1.12.fastq
cat: sims.1mil.10per.dwgsim.1.3.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: Simulate reads start
Wed Mar 22 12:07:10 2023: Load barcodes start
Wed Mar 22 12:07:10 2023: Load barcodes end
Wed Mar 22 12:07:10 2023: readPairsPerMolecule: 0
Wed Mar 22 12:07:10 2023: Simulating on haplotype: 0
Wed Mar 22 12:07:10 2023: Load read positions haplotype 0
Wed Mar 22 12:07:11 2023: Importing sims.1mil.10per.0.fp
Wed Mar 22 12:07:12 2023: Imported sims.1mil.10per.0.fp
Wed Mar 22 12:07:12 2023: readsCountDown: 500000
Wed Mar 22 12:08:35 2023: Reached end of barcodes list. No more barcodes. Last read processed: 500000. Exiti
ng.
Inappropriate ioctl for device at ./LRSIM/simulateLinkedReads.pl line 748.
Thanks for developing this great tool!
IMO, LRSIM should not generate reads with synthetic structural variants by default -- that is likely a surprising behaviour for end users.
For example, I was doing some evaluation of assembly algorithms with LRSIM reads and eventually found out that some of the disagreements were caused by the LRSIM reads and not by the assembler.
I saw some comments in the readme, that implied that the current settings are not suitable for smaller regions: "Note that the default barcoding parameters do not perform well for small genomes (<100Mbp)."
I tried several options and I also tried to search the 10x website, but for someone not working with 10x myself it is hard to find parameters that could fit a small test sample.
Could you help me with standard settings for a 1Mb region? (or settings for a smaller than 100Mb region, that you consider minimally possible)
I simulated reads using
perl simulateLinkedReads.pl -r refdata-hg19-2.1.0/fasta/genome.fa -p sim -x 400
and tried running Long Ranger via
longranger align --id=lrsim --reference=refdata-hg19-2.1.0 --fastqs=lrsim_data --fastqprefix=sim --localcores=12
However, the run fails:
...
2017-01-10 12:17:59 [runtime] (run:local) ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk37.main
2017-01-10 12:17:59 [runtime] (run:local) ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk38.main
2017-01-10 12:17:59 [runtime] (run:local) ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk39.main
2017-01-10 12:17:59 [runtime] (run:local) ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk40.main
2017-01-10 12:17:59 [runtime] (run:local) ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk41.main
2017-01-10 12:18:17 [runtime] (failed) ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS
[error] Pipestance failed. Please see log at:
lrsim/ALIGNER_CS/ALIGNER/_ALIGNER/ATTACH_BCS/fork0/chnk0/_errors
The error appears to arise from a failed assertion:
Traceback (most recent call last):
File "/mnt/work/arshajii/10x_aligner/longranger-2.0.1/martian-cs/2.0.1/adapters/python/main.py", line 20, in <module>
martian.run("martian.module.main(args, outs)")
File "/mnt/work/arshajii/10x_aligner/longranger-2.0.1/martian-cs/2.0.1/adapters/python/martian.py", line 417, in run
exec(cmd, __main__.__dict__, __main__.__dict__)
File "<string>", line 1, in <module>
File "/mnt/work/arshajii/10x_aligner/longranger-2.0.1/longranger-cs/2.0.1/mro/stages/reads/attach_bcs/__init__.py", line 179, in main
assert(reads_attached >= 2)
AssertionError
Hello,
I was wondering if it would be possible and if there were any plans, to extend this simulation software to model stLFR linked reads as well? I think that biggest difference is generally in the number of molecules per barcode. stLFR averages around 1.2 or so. Please let me know if I can get you anymore information on stLFR and if this could potentially be an added feature.
Best,
Ellis
I am trying to align LRSIM simulation with LongRanger. I am getting following error. Any ideas on how to avoid this?
[error] Pipestance failed. Error log at:
Compgen10XSim/ALIGNER_CS/ALIGNER/_ALIGNER/ATTACH_BCS/fork0/chnk0/_errors
Log message:
Traceback (most recent call last):
File "/mnt/compgen/inhouse/src/longranger/longranger-2.1.6/martian-cs/2.2.2/adapters/python/main.py", line 23, in
martian.run("martian.module.main(args, outs)")
File "/mnt/compgen/inhouse/src/longranger/longranger-2.1.6/martian-cs/2.2.2/adapters/python/martian.py", line 544, in run
exec(cmd, main.dict, main.dict)
File "", line 1, in
File "/mnt/compgen/inhouse/src/longranger/longranger-2.1.6/longranger-cs/2.1.6/mro/stages/reads/attach_bcs/init.py", line 198, in main
assert(reads_attached >= 2)
AssertionError
I have been noticing some strange behavior on some of my LRSIM runs. Namely, LRSIM seems to hang during the manifest generation step. Sometimes the process hangs at the very beginning and will not even proceed past the first number. Other times, it may hang in the middle of the process for a long time without progressing. Is this a known issue?
This is an example from today, where the program stopped in the middle of manifest-file-generation, when the manifest file was at 6.7Gb.
Mon Jul 16 14:01:02 2018: 245200000 reads remaining
Mon Jul 16 14:01:04 2018: 245100000 reads remaining
Mon Jul 16 14:01:07 2018: 245000000 reads remaining
Mon Jul 16 14:01:09 2018: 244900000 reads remaining
Mon Jul 16 14:01:12 2018: 244800000 reads remaining
Mon Jul 16 14:01:15 2018: 244700000 reads remaining
Mon Jul 16 14:01:19 2018: 244600000 reads remaining
Mon Jul 16 14:01:23 2018: 244500000 reads remaining
Mon Jul 16 14:01:29 2018: 244400000 reads remaining
Mon Jul 16 14:01:40 2018: 244300000 reads remaining
Mon Jul 16 14:35:25 2018: 244200000 reads remaining
Some background on this run:
This is a run with the '-g' option, using a relatively small portion of the human genome
(~160megabases)
-x 400 -m 4 -f 84 -i 340 -t 1500 -o
A run with an identical set of parameters but with -x
set to '27' worked without issue, but only after a restart.
I suspect this may have to do with the fact that the parameters are outside of the normal desired range of values, specifically the number of reads -x
that are being generated, so it would be great to get some clarity on this issue.
Hi aquaskyline,
When I run a small haplotype fasta for test LRSIM, the programme always stuck for a lot of time at Fri Sep 8 14:08:37 2017: 100000 reads remaining
in CHECKPOINT 4, Simulate Read Part.
Fri Sep 8 14:07:55 2017: Simulate reads start
Fri Sep 8 14:07:55 2017: Load barcodes start
Fri Sep 8 14:07:59 2017: Load barcodes end
Fri Sep 8 14:07:59 2017: readPairsPerMolecule: 35
Fri Sep 8 14:07:59 2017: Simulate reads begin on haplotype 0. Total 1
Fri Sep 8 14:07:59 2017: Simulating on haplotype: 0
Fri Sep 8 14:07:59 2017: Load read positions haplotype 0
Fri Sep 8 14:08:02 2017: 0 reads failed being loaded.
Fri Sep 8 14:08:02 2017: Exporting test.0.fp
Fri Sep 8 14:08:02 2017: Exported test.0.fp
Fri Sep 8 14:08:02 2017: readsCountDown: 562554
Fri Sep 8 14:08:07 2017: 500000 reads remaining
Fri Sep 8 14:08:14 2017: 400000 reads remaining
Fri Sep 8 14:08:22 2017: 300000 reads remaining
Fri Sep 8 14:08:30 2017: 200000 reads remaining
Fri Sep 8 14:08:37 2017: 100000 reads remaining
And when I stop the programme with Ctrl-C, and rerun the script, the remaining 100000 reads can processed very quickly.
So, why when 100000 reads remaining
, the process time suddenly slow? And, why restart can make the progress faster?
Is it okay to stop and restart the process for a quicker running time? Any effect on simulated result?
Thanks a lot.
Lindsay
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.