Giter Site home page Giter Site logo

aquaskyline / lrsim Goto Github PK

View Code? Open in Web Editor NEW
45.0 45.0 15.0 29.91 MB

10x Genomics Reads Simulator

License: MIT License

Shell 0.12% C++ 12.36% Perl 29.33% C 52.64% Makefile 0.87% TeX 0.20% Java 0.32% Lua 1.13% Python 1.29% R 0.06% Roff 1.35% Perl 6 0.33%
bioinformatics computational-biology human-genomes longranger read-simulation supernova

lrsim's People

Contributors

aquaskyline avatar codingkaiser avatar ljyanesm avatar mschatz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lrsim's Issues

non overlapping region

Hi
we are trying to test LRSIM on a sample data which is generated by concatenating the short scaffolds of a fragmented draft genome and now the scaffolds are all longer than 150K.
Now! while we launch it this error immediately stops the program:

Terminate program as it could not find a non overlapping region

The reads we have generated have a characteristic which might cause the error:
as you can see we have much N at the middle of scaffolds.

AAAAAAAATAATAAATAAAATTTTTTTTATGAATTATTTTCCCTAAATTTTACGTGGGATTTTAAAGAGTTTTATCTGCTTATATGTAATTTATGAAACATTTTATCAACTTATATGTAATGTTTGACAAATTTGTTCTAATAAATCAAATAAGTTACCAAAATAATATTAAAATGAAATAGTTTGATCAATATTAATAAACTACAAATGTTACGGGATGGACTCTAGAATCGTTATTAGATTTTCAACAATTGTTTTTTTGAGTGTAATTTGTGTACTAGACTTTTGATTGTATTGTTTTAATGAGTTTTAATAAGTATACTTGCTTTTTTGACTGTACTGGGTTTAATGAATGTAAAAGGTTTAATGTTTTTATTGGCTTAATTAGTGTACTGAGTTTAACGAATGTATTTATTGGGTTTATTCTATGATTTTAATAACTGTATTGTTTTATTGTATGAAACTATGGTGGTTTTCTGTAACAAAAATTCTATTGGCTTTCTGCATAAGACTCATTTGATTTAATGAAATTAGCCGACATTTTATCAACAATGTGGTTTTATGTGAATGACATTACTGTATGATATTAGTACATATTTTTATGTTTTAGTAAACTTATTAAGATTTGTGTATAATTGTATAAGATAAGTGTATATCGTATTGAAATTAATACTTATTGTAATGAAATGAGCAATACAATTATTGAAATTACTATTTACTTATGAAATTTATGTGTATAATTTATTGGAATGATACGTTACTATTAAAATCTATGTATATTTTTAAATATGTATTGAATGTATTGCATTGATAGAACTACATACATTGATTGATTTAAGAGAGCGTATTAAATGAGTGAATTTGATGAATTGGTTGAGTATATTGTGTGAGTAGAGTCAGTGCATTGAGTGAATTTGATGGATTAAATTGGTTGAATGTAATGAATGCATTGTTATATTGAATTCTATTCATTTGATCAAATGCTATGAGATGATTGAATAGATAATTGAATGAAGTAAGTGTATCAAATAAATAATAATAGAATCAAGTTGTTCCATATCAACTTCTAGTAAATATTTGAAATATTATCTTGAATTCTAATAAATATTTGAAAATTTTCATTGACTTCTATGTAATTTTTGAAAACTATTTCCCATGTTTTACGTGGGATTGTGAAATATTTTATCAACTGAAATCTAAAAGCCTAAAACCTTAATGATACTTTAAAAATTTAAAAAGCTCTAAAAATAAAACTTCAAAATTACAATGGGCCATTCAAACAATTTTCAATATTTACATTACTTTTAATTTTGAAAGACTGTTTATTCCTGTTCAAATGTGAATCCATCAATTAATTTAAATTTTAAAACATTGCTTTTACAATTGTATAACAAGCGATAAAACCCTATAAAATCCTATACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAAAGCTCTAAAAATAAAACTTCAAAATTAAAATGGGCCATTCAAACAATTTTCAATATTTACATTACTTTTAATTTTGAAAGACTGTTTATTCCTGTTCAAATGTGAATCCATCAATTAATTTAAATTTTAAAACATTGCTTTTACAATTGTATAACAAGCGATAAAACCCTATAAAATCCTATACTAACACTATCATAAACCCAAAAGGGCCCTATAGCACTCCATTAAGACCAGATAAACGTCTAAGAAAACCCATAAAACCTTATTCAACATACACAATCCAACAGTCTAATAAACACTTTCAAAGGATTATATCATGTAAGTGGCAACAAACAAAAGAACCATTAATGAGAGTTTAGGCAAAAATACGCATGAGTCTTAGATAACTTTTAATCGGTTATTGATTATTATCATTGATTATTATAATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATTATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATTGATTATTATCATTGATTATTATCATTGATTATTATAATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATTATCATTGATTATCATTGATTATTAAAATTGATTATTATTGATTATTATTATTATTATTGATTATTATTATTATTATGATTATTATTGATATTGAATATTATTGATTATTGATTATTATCATTGATTATTATCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTATTGTTATTATTATTATTATTTTGCAGAAGATTGGAATTCTGCAAATCTTCCCCTTGAAATAATGTTACAAGTTATTTATATGTCATTCGATCTGATTCTTTCAAAAGAAAGTACACAGATCAATGGAATAATTTGTTTTTTTGACCTAGAAGGCTGTGATAGAAAATCTTTGGAAACATGGTCCGATCCACAACTGTTAAAATCGATAAACAGAATATGGCAAGTAAAAAGTATTATGAAAATTATAATAATTGTTTTGATTAGGATGCTTTTCCAATAAGAATGAAAGGGATAATCTATTACAAAGCTCCAACAATATTTAATGTGGTGTTGAAAATTATCAAGTTTTTTATAGCAGAAAAACACAAACAAAGAATGTTTCAAATTGAGAACCTGGAAAATTTATTTAGTAAGAATCTAGGTTTGGATGAGATCATGCCAATTGAGTATGGAGGAAAAGGTGGAAAACTAGAAGATAAAGTTGGTAAGATTTTTAGATTATTAATCGCATACATAGAGAATCGAGTCTGTGAAGGAATACCAGTCAAAAGCTTTTTCAAAATCTGATTTGTAAAATCGATATTAAAGTTTATTATTTTTCTGCTTTTCAAACAGATGTAATAGATTTTCTAAAAACAAAATAAAACTAAATTTGTTTGCATTAATATCGATAAAAATACTAAATACGCGCGGCTGACAGTTGTGGTGCTAACTTTAAGAAATAATTTATATGATTCTTTGCATATAATAAATAATATTTGTTATATAAGAGTGTCAGATTATGAAAAAATTGAAACTGAAATAAAACAATCTTTACAGAGTAAGCAGAATTAAGAAATAAGCTCAAAGGTTAAGTATATTTAAATATGTTGAAAATGTTCTGAATGAAATCTTTAATTTTCCAGTAGACCTCATCAACCAAGAGTGGTTAAACTAAAATAATACACAGCTTTATCTGAGGTAATTGGAAATATGATGAGCTGATTTACAGGAAACGACATACTTTACGAAATGATCACATTGATAAATAATAGTTACTAGTCTTGACATTAATTATATTTACAGTAACTACTCTGTTATCCGCCAAATCCCAATAACCGCCATTTTTAATAAAGAAATTAATAAATAATTAAATTTTTTACTGGATTTTTTAATAAATTTTATTATTTTTTATTCTAAAAGCCGCCATTTTCTAGAAAGCACCACCTTTTTTCACAGTCCCGTTTTTGGTAGATATGAGAGTACTAGTCAAATGTCTTTTTCACTATTAGAGATATAATTAATAATATAATTATAATAAATATATTTGTTGAGATACTTTGGGGGATTATTACCAAAGTAAATCAGAAATGCGTGAGTTTAGTGATTGAGTTTAAATATTAGATAATCCCGATATATTAAAGTTATAAATTTTCCTTATATACCTTAATATGTTTAATATAGATATAATGATATTTATAGATATACTATATAGTAATCCTTATAAATCCAATTTAAAATTAACCTTTTAAAAATAAACGTATATTATTCAATAAACACTTTAACAATTCAAATAAATTTAATGCGTTAGAGATTTTTGCTCAGTTGTGAATCCGGAGATGATTCAAACTGTTGTACAGCTTAAAAAATGGTGAAAATAATGAGTTTCAACTCTCTATTTATACTAGATCGTTGCTCATTATTTTCGCGTAATTATAAATATTTACATGAAAAATCAAACACCATTTACCGACATTTGAACATTATTTATTAAAAGTTTTATATATTTACAAAAATATTTATACGAATGTCAAAAAATAATGTACTTACGTTTATGTAAAATTTGACAATTTCCATTAAATAAAAATAAAATCAAAATCCAATAATTTTTTAAATTAAAATCTCTAAAATATCAGCAAAATTCATAATATACCAGTTCAATAATTGATAACAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

@aquaskyline
Meanwhile, I corrected the reads and tested with the reads having no N, but the same error occurred and the complete log is:

(base) mostafa@srv-research:/mnt/hdd2/mostafa/Bio-Mostafa/data/10xSimulation/simulated_linkedReads/draftV01$ simulateLinkedReads.pl -r /mnt/hdd2/mostafa/Bio-Mostafa/data/10xSimulation/simulated_linkedReads/djBaseGenomeV01.fasta -p draftv01_sread
Tue Jul  9 09:58:26 2019: draftv01_sread.status
Tue Jul  9 09:58:26 2019: Variant simulation mode enabled
Tue Jul  9 09:58:26 2019: SURVIVOR start
Tue Jul  9 09:58:26 2019: Running: /mnt/hdd2/mostafa/Apps/LRSIM/SURVIVOR 0 /mnt/hdd2/mostafa/Bio-Mostafa/data/10xSimulation/simulated_linkedReads/djBaseGenomeV01.fasta draftv01_sread.hap.parameter 0 draftv01_sread.hap 1000
Terminate program as it could not find a non overlapping region
Tue Jul  9 09:58:34 2019: SURVIVOR error on missing draftv01_sread.hapA.fasta
No such file or directory at /mnt/hdd2/mostafa/Apps/LRSIM/simulateLinkedReads.pl line 748.
unable to delete draftv01_sread.hapA.fasta at exit
unable to delete draftv01_sread.hap.hetA.insertions.fa at exit
unable to delete draftv01_sread.hap.hetB.insertions.fa at exit
unable to delete draftv01_sread.hapB.fasta at exit
unable to delete draftv01_sread.hap.homAB.insertions.fa at exit

Regards

both reads 1 and 2 have "/1" suffix

Just a small bug I noticed.

The LRSIM output read names have the suffix "/1", for both read 1 and read 2 in a pair. This could potentially confuse downstream tools.

Long Ranger error: input fastq not consistent

Hello,

I have used LRSIM to generate a small set of linked reads for 60 MB reference:

perl simulateLinkedReads.pl -g ${REF}/selected_scfs_alleles_no_N.fa -p ${OUTDIR}/default_params -n -z 7 -x 1 -m 4 -t 3 -o

After generating, the folder looks so:

10X_FASTQ/
├── default_params.0.fp
├── default_params.0.manifest
├── default_params.0.sort.manifest
├── default_params.dwgsim.0.12.fastq
├── default_params.hap.0.clean.fasta
├── default_params.hap.0.clean.fasta.fai
├── default_params.status
├── default_params_S1_L001_R1_001.fastq.gz
└── default_params_S1_L001_R2_001.fastq.gz

0 directories, 9 files

Then, I have tried to use Long Raner align mode to align the simulated reads to my reference:

longranger align --id=default_params --reference=${REF} --fastqs=/project/sweet/evgeny/10x/10X_FASTQ --sample=default_params

After some steps occurs the error:

Running preflight checks (please wait)...
2018-03-21 10:19:40 [runtime] (ready)           ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:43 [runtime] (split_complete)  ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:43 [runtime] (run:local)       ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS.fork0.chnk0.main
2018-03-21 10:19:46 [runtime] (chunks_complete) ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:49 [runtime] (join_complete)   ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.SETUP_CHUNKS
2018-03-21 10:19:55 [runtime] (ready)           ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS
2018-03-21 10:19:55 [runtime] (run:local)       ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS.fork0.split
2018-03-21 10:19:58 [runtime] (split_complete)  ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS
2018-03-21 10:19:58 [runtime] (run:local)       ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS.fork0.chnk0.main
2018-03-21 10:20:01 [runtime] (failed)          ID.default_params.ALIGNER_CS.ALIGNER._LINKED_READS_ALIGNER._FASTQ_PREP_NEW.BUCKET_FASTQS

[error] Pipestance failed. Error log at:
default_params/ALIGNER_CS/ALIGNER/_LINKED_READS_ALIGNER/_FASTQ_PREP_NEW/BUCKET_FASTQS/fork0/chnk0-u28b3b223be/_errors

Log message:
stage error:FASTQ parsing error: input fastq not consistent

Long Ranger log

What could be the reason of this error? Could it happen due to small set size?
Do you maybe need some additional files / information?

Thank you!
Evgeny

perl: symbol lookup error

When I run test.sh I get:

perl: symbol lookup error: ./lib/auto/Math/Random/Random.so: undefined symbol: Perl_Gthr_key_ptr

I am using perl version v5.20.2

Allow smaller simulations

Is it possible to allow smaller simulations (i.e. smaller -x)? At the moment, I receive the message

The value of -x should be set between 400 and 800

I have tried using -o (with -x 1 and -x 5), but the program seems to hang:

...
Tue Jan 10 13:06:35 2017: DWGSIM round 0 thread 3 end
Tue Jan 10 13:06:35 2017: cat sim.dwgsim.0.3.12.fastq >> sim.dwgsim.0.12.fastq
[dwgsim_core] 187500
[dwgsim_core] Complete!
Tue Jan 10 13:06:53 2017: DWGSIM round 1 thread 1 end
[dwgsim_core] 187500
[dwgsim_core] Complete!
Tue Jan 10 13:06:53 2017: DWGSIM round 1 thread 2 end
[dwgsim_core] 187500
[dwgsim_core] Complete!
Tue Jan 10 13:06:58 2017: DWGSIM round 1 thread 0 end
Tue Jan 10 13:06:58 2017: cat sim.dwgsim.1.1.12.fastq >> sim.dwgsim.1.12.fastq
Tue Jan 10 13:06:58 2017: cat sim.dwgsim.1.2.12.fastq >> sim.dwgsim.1.12.fastq
Tue Jan 10 13:06:58 2017: cat sim.dwgsim.1.3.12.fastq >> sim.dwgsim.1.12.fastq
Tue Jan 10 13:06:58 2017: Simulate reads start
Tue Jan 10 13:06:58 2017: Load barcodes start
Tue Jan 10 13:07:00 2017: Load barcodes end
Tue Jan 10 13:07:00 2017: readPairsPerMolecule: 0
Tue Jan 10 13:07:00 2017: Simulating on haplotype: 0
Tue Jan 10 13:07:00 2017: Load read positions haplotype 0
Tue Jan 10 13:07:09 2017: 0 reads failed being loaded.
Tue Jan 10 13:07:09 2017: Exporting sim.0.fp
Tue Jan 10 13:08:35 2017: Exported sim.0.fp
Tue Jan 10 13:08:35 2017: readsCountDown: 500000   (stuck here)

My reference is hg19.

Complile fails

I'm getting this error. Which version of perl do I need to install on linux?
perl: symbol lookup error: ../lib/auto/Math/Random/Random.so: undefined symbol: Perl_Gthr_key_ptr

Few SNPs generated using two haplotype sequence

Hi there,

I use two haplotype fasta sequences as input to simulate 10X linked reads in human genome. By default, it should output 3M SNPs because -1 parameter (1 SNP per INT base pairs [1000]). But I only got ~10k SNPs. Could you help to figure out the problem? The command I used is below:

./simulateLinkedReads.pl -g hap1_genome.fa,hap2_genome.fa -p HG002_sim -7 0 -0 0

Best,
Peng Xu

Can't locte Math/Random.pm

Hi
As I was taking the steps to install LRSIM, faced the below error while running the sh test.sh.

Can't locate Math/Random.pm in @INC (you may need to install the Math::Random module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at ../simulateLinkedReads.pl line 35.
BEGIN failed--compilation aborted at ../simulateLinkedReads.pl line 35.

un commented these two lines:

#use lib "./lib";
#use lib dirname($0)."/lib";

but was not helpful and faced this error:

then deleted the lib folder and did the recommended steps in #1 , but was faced the same errors.

linked reads option setting

Hi aquaskyline,

I am trying to use LRSIM to simulate 50X linked reads for a small portion of human genome. My selected genomic region is around 1M bp. To adapt the small reference set, I intend to do following changes on linked reads options.
-x = coverage * my_reference_length/(insertion_size + sd of pairs) = 50*1M/(350+35)
-f = default setting
-t = human_genome_length / my_reference_length * default_t = 3,000,000,000 / 1,000,000 * 1,500,000
-m = default setting

To get the best simulated linked reads data, any suggestion on my modification?

Thanks a lot.
Lindsay

LRSIM crashes and reports "not defined chr1_182578874_182579@chr1"

Hi,

I am attempting to run LRSIM on a human chr1, but I'm encountering the aforementioned error.

Here is the command I'm using: perl ../simulateLinkedReads.pl -r ./Chr1.fasta -p SapiensChr1 -c fragmentSizesList -x 30 -f 50 -t 500 -m 10 -0 0 -o

And here is LRSIM output:

Tue Mar 16 16:27:09 2021: SapiensChr1.status
Tue Mar 16 16:27:09 2021: Variant simulation mode enabled
Tue Mar 16 16:27:09 2021: SURVIVOR start
Tue Mar 16 16:27:09 2021: Running: /home/morispi/StructuralVariants/LRSIM/SURVIVOR 0 ./Chr1.fasta SapiensChr1.hap.parameter 0 SapiensChr1.hap 1000
Tue Mar 16 16:27:22 2021: SURVIVOR end
Tue Mar 16 16:27:22 2021: Build genome index start
Tue Mar 16 16:27:22 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.0.fasta 0 > SapiensChr1.hap.0.clean.fasta
Tue Mar 16 16:27:26 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.1.fasta 0 > SapiensChr1.hap.1.clean.fasta
Tue Mar 16 16:27:30 2021: /home/morispi/StructuralVariants/LRSIM/samtools faidx SapiensChr1.hap.0.clean.fasta
Tue Mar 16 16:27:32 2021: /home/morispi/StructuralVariants/LRSIM/samtools faidx SapiensChr1.hap.1.clean.fasta
Tue Mar 16 16:27:36 2021: Build genome index end
Tue Mar 16 16:27:36 2021: DWGSIM round 0 thread 0 start
Tue Mar 16 16:27:36 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.0
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
0Tue Mar 16 16:27:38 2021: DWGSIM round 0 thread 1 start
Tue Mar 16 16:27:38 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.1
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
0Tue Mar 16 16:27:40 2021: DWGSIM round 0 thread 2 start
Tue Mar 16 16:27:40 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.2
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 20000Tue Mar 16 16:27:43 2021: DWGSIM round 0 thread 3 start
Tue Mar 16 16:27:43 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.3
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 280000Tue Mar 16 16:27:46 2021: DWGSIM round 1 thread 0 start
Tue Mar 16 16:27:46 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.0
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 280000Tue Mar 16 16:27:50 2021: DWGSIM round 1 thread 1 start
Tue Mar 16 16:27:50 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.1
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 180000Tue Mar 16 16:27:53 2021: DWGSIM round 1 thread 2 start
Tue Mar 16 16:27:53 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.2
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 450000Tue Mar 16 16:27:56 2021: DWGSIM round 1 thread 3 start
Tue Mar 16 16:27:56 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.3
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 410000Tue Mar 16 16:27:58 2021: DWGSIM round 0 thread 3 end
[dwgsim_core] 510000Tue Mar 16 16:28:02 2021: DWGSIM round 0 thread 1 end
[dwgsim_core] 1290000
[dwgsim_core] Complete!
Tue Mar 16 16:28:38 2021: DWGSIM round 0 thread 0 end
Tue Mar 16 16:28:38 2021: cat SapiensChr1.dwgsim.0.1.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1490000
[dwgsim_core] Complete!
Tue Mar 16 16:28:45 2021: DWGSIM round 0 thread 2 end
Tue Mar 16 16:28:45 2021: cat SapiensChr1.dwgsim.0.2.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1330000Tue Mar 16 16:28:51 2021: cat SapiensChr1.dwgsim.0.3.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1750000
[dwgsim_core] Complete!
Tue Mar 16 16:28:54 2021: DWGSIM round 1 thread 1 end
[dwgsim_core] 1770000
[dwgsim_core] Complete!
[dwgsim_core] 1510000Tue Mar 16 16:28:55 2021: DWGSIM round 1 thread 0 end
Tue Mar 16 16:28:55 2021: cat SapiensChr1.dwgsim.1.1.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
[dwgsim_core] 1875000
[dwgsim_core] Complete!
Tue Mar 16 16:28:57 2021: DWGSIM round 1 thread 2 end
[dwgsim_core] 1700000Tue Mar 16 16:28:59 2021: cat SapiensChr1.dwgsim.1.2.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
[dwgsim_core] 1875000
[dwgsim_core] Complete!
Tue Mar 16 16:29:02 2021: DWGSIM round 1 thread 3 end
Tue Mar 16 16:29:02 2021: cat SapiensChr1.dwgsim.1.3.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
Tue Mar 16 16:29:07 2021: Simulate reads start
Tue Mar 16 16:29:07 2021: Load barcodes start
Tue Mar 16 16:29:09 2021: Load barcodes end
Tue Mar 16 16:29:09 2021: Using fragment sizes from fragmentSizesList instead of Poisson distribution
Tue Mar 16 16:29:09 2021: 10000 sizes loaded
Tue Mar 16 16:29:09 2021: Average fragment size: 50kbp
Tue Mar 16 16:29:09 2021: readPairsPerMolecule: 2
Tue Mar 16 16:29:09 2021: Simulating on haplotype: 0
Tue Mar 16 16:29:09 2021: Load read positions haplotype 0
Tue Mar 16 16:29:21 2021: not defined chr1_182578874_182579@chr1
Inappropriate ioctl for device at ../simulateLinkedReads.pl line 748, <$fh> line 19543360.
Command exited with non-zero status 25

It does not seem to be a memory issue, since it only uses 4 GB.

Moreover, when when I try to slightly modify the parameters (for instance setting -x 10 or -n to skip the variants simulation), the error seems to change randomly.
I once had Cannot find correct chromosome and position in @chr1_82788009_82787815_1_0_0_0_0:0:0_0:0:0_0/2, and once had Cannot find correct chromosome and position in IFIHIGIIFIGFEBFCHD@DEECDCBEDECCB@BCBABBFBCABCA@DC@BAAAB@?A@?@?>?B?C@?@B<>??:??@@>>?>A@==A@@@@<@A@@>>B=@>?>C>?>?=>??;;>>?=?>==?=>;?;;==<and other variations.

I'm having a hard time understanding what is going on here.
I already managed to run LRSIM correctly on smaller datasets and never encountered this issue.

Do you have any suggestions?

Thanks,
Pierre

Barcode issue in few-molecules case

Dear respected LRSIM team
Thanks for your great package, I'm using LRSIM to generate linked reads for a small genome. I used the following

simulateLinkedReads.pl -g hap1.fa,hap2.fa,hap3.fa -p out/sim -o -n -x 1 -f 100 -m 3 -t 1

It works. But, I think it there is a problem in LRSIM's output. It seems that each haplotype are considered separately. By setting -m 3, in each partition, three molecules originates from a haplotype. But, in a real 10x device, each of three molecules may come from different haplotypes.
This can be interfered from the manifest file. There is no shared barcode within manifests correspond to haplotypes.

Would you please tell me whether I am right or not? If yes, It will be appreciated if you tell me how I can overcome this issue.

LRSIM does not filter out duplicate reads with different barcodes

I recently discovered an issue in LRSIM execution which resulted in duplicate readnames with different barcodes to end up in the final .bam file, resulting in errors when running with downstream LongRanger alignment.

The problem can be traced back to DWGSIM, which appears to have simulated two reads in the same position, on top of this giving them the same name (I assumed DWGSIM checks for this type of event, so maybe this is a result of the LRSIM parallelization). Then during 10X read simulation, two barcodes were simulated which contained molecules that shared some overlap, which resulted in one of the duplicate reads to be assigned to one barcode and the other barcode to the other. A very rare event, but not impossible clearly.

Here is a trace of the duplicate reads throughout the various files:

** DWGSIM fastq **
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/1
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/2
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/1
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f/2

** LRSIM .fastq R1 **
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 1:N:0:1
@chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f 1:N:0:1

** LongRanger ALIGN .bam with line numbers prepended **
1207925978:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f	163	chr10	119294682	60	151M	=	119294844	290	CGGCTGCTCCCAGAGAGAGTTGGGGTCTTCTCAGGGCCCGCGATGGGGGAGTGGTCGTGGTCAGACCCCCGTGAGCCCCTTCGGAAGGTCCCAGTCCCTGTCCATTCTTCTGTCCCGCAGCTCTCTCCGCGCAGGCGGGGCAGAGCCGGGG	A?@@???<?@??@<=C?CACB>AB@?A@A@>@?AA@=??@>=>@?@??>@?>>??A>A@@?BA???BB>@><>??@?@<@A<A?B?A@?<A>A??@??>B?>BB?>@B>B?>??AAC??=>>A@=A>@B???A@?===??><A==AC<B??	RX:Z:TCTGCGTAGTCCTGAT	QX:Z:AAAFFFKKKKKKKKKK	XS:i:-81	AS:i:0	XM:Z:0	AM:Z:1XT:i:0 BX:Z:TCTGCGTAGTCCTGAT-1	DM:Z:0.150000	RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0	OM:i:60
1207925979:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f	163	chr10	119294682	60	151M	=	119294844	290	CGGCTGCTCCCAGAGAGAGTTGGGGTCTTCTCAGGGCCCGCGATGGGGGAGTGGTCGTGGTCAGACCCCCGTGAGCCCCTTCGGAAGGTCCCAGTCCCTGTCCATTCTTCTGTCCCGCAGCTCTCTCCGCGCAGGCGGGGCAGAGCCGGGG	A??A=?@>@=??????D?>?=@??B?A????CC???@??@A@?@?>B@B=@;?DA=B?>=?>??AAA???>@@=B??BAD@@?@?@B?@@CA@C??>=@C???BA?A??@A?BC??=@C?>C?=A@C?C??A?=A@A@@A?>?@??@=@@@	RX:Z:GTTTGTTTCGATGGCC	QX:Z:AAAFFFKKKKKKKKKK	XS:i:-82	AS:i:0	XM:Z:0	AM:Z:1XT:i:0	BX:Z:GTTTGTTTCGATGGCC-1	DM:Z:0.115385	RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0	OM:i:60
1207926094:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f	83	chr10	119294844	60	128M	=	119294682	-290	GGACGAGGGGTCTTGGGGCCGCCTCGCTGGCTGCGGTTGGAAGCACCCGTTTTCCCGCCCGCCCGCGCAGGCGCTGCTCTGTGGCCACCAGCAGAGGTTTCCCGGCCGCTGTGAGTCGCCCACGCGAG	<>?AA??=??@???@ABB=??@?@>>=@??@>A???;?????????????@>@?A?A=>A@?????A?=B?@@>??A??D@@?>?>????=B???@A??>?<A@>?;@B??=??@@@A>???C>????	RX:Z:TCTGCGTAGTCCTGAT	QX:Z:AAAFFFKKKKKKKKKK	TR:Z:CACGTCG	TQ:Z:A??<CC?	XS:i:-70	AS:i:0	XM:Z:0	AM:Z:1	XT:i:0BX:Z:TCTGCGTAGTCCTGAT-1	DM:Z:0.150000	RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0	OM:i:60
1207926095:chr10_120544649_120544795_0_1_0_0_0:0:0_0:0:0_6b6f4f	83	chr10	119294844	60	128M	=	119294682	-290	GGACGAGGGGTCTTGGGGCCGCCTCGCTGGCTGCGGTTGGAAGCACCCGTTTTCCCGCCCGCCCGCGCAGGCGCTGCTCTGTGGCCACCAGCAGAGGTTTCCCGGCCGCTGTGAGTCGCCCACGCGAG	=>=A???B??B@?BB?>;??@?B?A@=?>?AA>AE=<><@?@?AE?CAB@@???A>??@@B>A>@<@A>@<>A?>?<A?>?@@@@?A@?@?>?>?B??????B?AC?@<A???B@?>?BCCB?@A@?@	RX:Z:GTTTGTTTCGATGGCC	QX:Z:AAAFFFKKKKKKKKKK	TR:Z:CACGTCG	TQ:Z:@AA?A?>	XS:i:-70	AS:i:0	XM:Z:0	AM:Z:1	XT:i:0BX:Z:GTTTGTTTCGATGGCC-1	DM:Z:0.115385	RG:Z:longranger_align:LibraryNotSpecified:1:unknown_fc:0	OM:i:60

Note that the two read pairs share a name, but have different barcodes.

It would be great if there was some duplicate-detection and subsequent renaming of the files.

Using LRSIM with LongRanger: Extremely high rate of incorrect barcodes observed (99.90 %)

Hello,

I recently found about LRSIM which seems to be super useful for gaining better understanding of SV tools.

However, I'm trying to generate a toy dataset, and then align it to a reference with LongRanger, and LongRanger always stops and reports "stage error:Extremely high rate of incorrect barcodes observed (99.90 %). Check that input is 10x Chromium data, and that there are no missing cycles in the first 16bp of Read 1. Please note Long Ranger 2.0 and above do not support GemCode data.".

I did read from another issue that the "/1" and "/2" have to be removed from the end of the headers in order for LongRanger to work with LRSIM data, but removing them did not seem to help.

Here is the command line I'm using for generating the data: perl simulateLinkedReads.pl -r References/Ecoli.fasta -p Ecoli/SimEcoli -n -x 100 -o

I used a lower -x value because I don't need a lot of reads for now. Can it be the cause of the issue? Leaving it to the default 600 seems to generate too many reads for the toy tests I want to perform, hence why I lowered it. I also used the -o option as advised in another issue I found after a bit of searching.

Is there anything I'm doing wrong, or could you advice me how to properly use LongRanger with LRSIM data?

Thanks in advance.

Best,
Pierre

Error when `sh make.sh` : cannot find -lcurses

The following is what I got when sh make.sh:

gcc -g -Wall -O3  -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk                                                                                                        .o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o padding.o bedcov.o bamshuf.o bam_tview_curs                                                                                                        es.o bam_tview_html.o  libbam.a -Lbcftools -lbcf  -lcurses  -lm -lz -lpthread
/usr/bin/ld: cannot find -lcurses
collect2: error: ld returned 1 exit status
Makefile:57: recipe for target 'samtools' failed
make[1]: *** [samtools] Error 1
make[1]: Leaving directory '/research/LRSIM/DWGSIMSrc/samtools'
Makefile:25: recipe for target 'all-recur' failed
make: *** [all-recur] Error 1

time

Hi

we are testing LRSIM for to generate 10X reads for an organism with genome size about 1.5Gbp. As I was noticed, you had mentioned that the normal time for a genome like human genome will take a time less than 10 hours. But it is 283 hours we are running the LRSIM on 16 threads and about 160 GB memory.

is it abnormal? do you recommend to stop it?

msort: error: use of undeclared identifier 'direct_insert_aux'

I see the following error when compiling msort with both g++-7 and clang++:

clang++    -c -o stdhashc.o stdhashc.cc
In file included from stdhashc.cc:2:
./stdhash.hh:496:13: error: use of undeclared identifier 'direct_insert_aux'
                int ret = direct_insert_aux(key, this->n_capacity, this->keys, this->flags, &i);
                          ^
                          this->
stdhashc.cc:72:34: note: in instantiation of member function 'hash_map_misc<unsigned int, int>::insert' requested here
        return ((hashii_cpp_t*)h->ptr)->insert(key, value);
                                        ^
./stdhash.hh:295:13: note: must qualify identifier to find this declaration in dependent base class
        inline int direct_insert_aux(const keytype_t &key, hashint_t m, keytype_t *K, __lh3_flag_t *F, hashint_t *i) {
                   ^
1 error generated.

See the full log here: https://gist.github.com/sjackman/d4de672ec5f8a44f5276cef9edf9a28f#file-02-make-L12

Increased read depth flanking N-stetches in reference

Based on a reference genome, I simulated haplotypes using a different algorithme. Using these haplotypes as input, I used LRSIM to simulate 10x data using the command:

simulateLinkedReads.pl -g DMsim1.hap.0.clean.fasta,DMsim1.hap.1.clean.fasta,DMsim1.hap.2.clean.fasta,DMsim1.hap.3.clean.fasta -p DMsim1 -x 100 -f 20 -u 3 -z 4 -o

But, when mapping the reads of the simulated 10x dataset to the reference genome, I noticed an increased read dept on the left side of all N-stretches of my reference genome.

image

Reported read positions appear to be offset

For example, one of my simulated reads is

@chr1_198003258_198003034_1_0_0_0_0:0:0_0:0:0_0/1 1:N:0:1
AGAACAAGTCTCTATTGGCCAACATCTGGACAGCTTGTAGTTGAGCTGAATATGCTGCTGTGTGAATTACAAAGGTATGACAAATTTTTTACTCTGTTCTAATTTGGCTCGGCCTGCCTGCCTTCAGCTTTTTTGGCACAGCTTCCCACAT
+
AAAFFFKKKKKKKKKKFIIHGGFHFFFHGEFGEBAEFCACECACCCCCCCBFBBBCBCE?B=B=@?D>ADAAB@@@>@?ABBBCB@@B>?C@??D?@?A@B???A=AA?=><=>??>??>@>@?>>@@=?>?=??;==>?8>><===>??;

which yields the alignments (using the SNAP aligner):

chr1_198003258_198003034_1_0_0_0_0:0:0_0:0:0_0:AGAACAAGTCTCTATT	163	chr1	197999799	70	151=	=	198000023	352	GTGTCCTAGAGAAGCAGACTCAAATAACAAATCCCTGTGTAACTGCAAAGGTTTATACAAAGTGGCATTCCATGCAGAGTAGAGAATATGATGTAAAGAGCCATCAAACATTATGAGATCCCTCCCCTGCAGCACATAAACAAAGTGAGGT	IIHIFGFFHDFFFDGFCECDCCDDEDBCDBBBDCCBABBBBBEBDBE?A@EAAA@AA@@B?B=A@?@A@@E@??C@@AA@==?A??=??>>?>@A@@=?=>>@B>B>>>>>?<>@C?>@?@=<@=>=?9??=>=<=>>?	PG:Z:SNAP	NM:i:0	RG:Z:FASTQ	PL:Z:Illumina	PU:Z:pu	LB:Z:lb	SM:Z:sm
chr1_198003258_198003034_1_0_0_0_0:0:0_0:0:0_0:AGAACAAGTCTCTATT	83	chr1	198000023	70	128=	=	197999799	-352	ATGTGGGAAGCTGTGCCAAAAAAGCTGAAGGCAGGCAGGCCGAGCCAAATTAGAACAGAGTAAAAAATTTGTCATACCTTTGTAATTCACACAGCAGCATATTCAGCTCAACTACAAGCTGTCCAGAT	;??>===<>>8?>==;??=?>?=@@>>?@>@>??>??>=<>=?AA=A???B@A?@?D??@C?>B@@BCBBBA?@>@@@BAADA>D?@=B=B?ECBCBBBFBCCCCCCCACECACFEABEGFEGHFFFH	PG:Z:SNAP	NM:i:0	RG:Z:FASTQ	PL:Z:Illumina	PU:Z:pu	LB:Z:lb	SM:Z:sm

However, these alignments are both offset by 3235 (and I have checked that there are no secondary alignments at the reported positions). All the other reads also seem to be offset by various amounts relative to the reported true positions.

Problem with installing LRSIM

I downloaded LRSIM, but when running make.sh, I got some error messages:

gcc -g -Wall -O3 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o padding.o bedcov.o bamshuf.o bam_tview_curses.o bam_tview_html.o libbam.a -Lbcftools -lbcf -lcurses -lm -lz -lpthread
/usr/bin/ld: bcftools/libbcf.a(bcf.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: bcftools/libbcf.a(bcfutils.o): relocation R_X86_64_32S against .rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
Makefile:57: recipe for target 'samtools' failed
make[1]: *** [samtools] Error 1
make[1]: Leaving directory '/media/bulk_01/users/lavri002/bin/LRSIM/DWGSIMSrc/samtools'
Makefile:25: recipe for target 'all-recur' failed
make: *** [all-recur] Error 1

As a result, the exacutables needed to run LRSIM are not copied to the LRSIM directory. How can I solve this?

Floating point exception

simulateLinkedReads failed with the error Floating point exception. I'm trying to simulate reads with no simulated variations. Perhaps that's related.

❯❯❯ simulateLinkedReads.pl -z 16 -x 524 -d 1 -1 0 -4 0 -7 0 -0 0 -r GRCh38.wrap.fa -p sim.lr
Sat Jan 20 14:19:27 2018: sim.lr.status
Sat Jan 20 14:19:27 2018: Variant simulation mode enabled
Sat Jan 20 14:19:27 2018: SURVIVOR start
Sat Jan 20 14:19:27 2018: Running: /gsc/btl/linuxbrew/Cellar/lrsim/1.0/SURVIVOR 0 GRCh38.wrap.fa sim.lr.hap.parameter 0 sim.lr.hap 0
sh: line 1: 21814 Floating point exception(core dumped) /gsc/btl/linuxbrew/Cellar/lrsim/1.0/SURVIVOR 0 GRCh38.wrap.fa sim.lr.hap.parameter 0 sim.lr.hap 0 > /dev/null
Sat Jan 20 14:21:21 2018: SURVIVOR error on missing sim.lr.hapA.fasta
No such file or directory at /gsc/btl/linuxbrew/Cellar/lrsim/1.0/simulateLinkedReads.pl line 738.
unable to delete sim.lr.hapA.fasta at exit
unable to delete sim.lr.hap.homAB.insertions.fa at exit
unable to delete sim.lr.hap.hetA.insertions.fa at exit
unable to delete sim.lr.hap.hetB.insertions.fa at exit
unable to delete sim.lr.hapB.fasta at exit
Command exited with non-zero status 2
❯❯❯ cat sim.lr.status
Sat Jan 20 14:19:27 2018: sim.lr.status
Sat Jan 20 14:19:27 2018: Variant simulation mode enabled
Sat Jan 20 14:19:27 2018: SURVIVOR start
Sat Jan 20 14:19:27 2018: Running: /gsc/btl/linuxbrew/Cellar/lrsim/1.0/SURVIVOR 0 GRCh38.wrap.fa sim.lr.hap.parameter 0 sim.lr.hap 0
Sat Jan 20 14:21:21 2018: SURVIVOR error on missing sim.lr.hapA.fasta

Supernova failing with Simulated data

Hello,

I have simulated some human chromium data using LRSIM, and while I can run longranger basic fine, Supernova is failing.

Here's the error in the main log:

2017-09-13 23:26:37 [runtime] (failed)          ID.GRCh38_LRSIM_supernova.ASSEMBLER_CS._ASSEMBLER.ASSEMBLER_DF

[error] An unexpected error has occurred.

Saving pipestance info to GRCh38_LRSIM_supernova/GRCh38_LRSIM_supernova.mri.tgz

And here's the error in the ASSEMBLER_DF stdout log:

Wed Sep 13 11:25:51 2017: reading in paths --> pathsX, mem = 11.98 GB
ForceAssertGe(numReads,0) at src/10X/paths/ReadPathVecX.cc:295 failed in function
void ReadPathVecX::reserve(int64_t, int64_t)
wwith values arg1 = -2 and arg2 = 0
ForceAssertGe(numReads,0) at src/10X/paths/ReadPathVecX.cc:295 failed in function
void ReadPathVecX::reserve(int64_t, int64_t)
wwith values arg1 = -2 and arg2 = 0
ForceAssertGe(numReads,0) at src/10X/paths/ReadPathVecX.cc:295 failed in function
void ReadPathVecX::reserve(int64_t, int64_t)
with values arg1 = -6 and arg2 = 0

Just wondering if you have seen this error when running Supernova with the simulated data, and if you know what is causing this failure?

0 readpairs per molecule

Hello, I'm trying to run LRSIM on a drosophila genome and finding that it keeps saying that it's simulating 0 read pairs per molecule. Perhaps I'm putting in the wrong parameters, but I can't seem to determine what's going on. The genome is >100mb as recommended.

The runtime parameters are:

NUMREADS=1   # in millions
MOLLEN=80    # in kbp
MOLPER=5
NUMINV=50
NUMINDEL=0
SNPPER=200000
INVMIN=1000
INVMAX=10000
TRANSLOC=0
PARTITIONS=2500    # default: 1500
NUMTHREADS=8
outprefix="./1millionreads/sims"
BARCODES="barcodes-500M.txt"

./simulateLinkedReads.pl \
    -r dmel.trunc.noN.fa \
    -p $outprefix \
    -b $BARCODES \
    -x $NUMREADS \
    -f $MOLLEN \
    -m $MOLPER \
    -1 $SNPPER \
    -4 $NUMINDEL \
    -5 $INVMIN \
    -6 $INVMAX \
    -7 $NUMINV \
    -0 $TRANSLOC \
    -z $NUMTHREADS \
    -o
[dwgsim_core] 2L length: 23582449
[dwgsim_core] 2R length: 25422852
[dwgsim_core] 3L length: 28239365
[dwgsim_core] 3R length: 32150935
[dwgsim_core] 4 length: 1430978
[dwgsim_core] X length: 23654338
[dwgsim_core] Y length: 3765010
[dwgsim_core] 7 sequences, total length: 138245927
[dwgsim_core] Currently on:
[dwgsim_core] 187500
[dwgsim_core] Complete!
[dwgsim_core] 120000Thu Nov 30 11:50:31 2023: DWGSIM round 1 thread 1 end
[dwgsim_core] 187500
[dwgsim_core] Complete!
Thu Nov 30 11:50:33 2023: DWGSIM round 1 thread 2 end
[dwgsim_core] 70000Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.0.1.12.fastq >> ./1millionreads/sims.dwgsim.0.12.fastq
[dwgsim_core] 80000Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.0.2.12.fastq >> ./1millionreads/sims.dwgsim.0.12.fastq
Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.0.3.12.fastq >> ./1millionreads/sims.dwgsim.0.12.fastq
[dwgsim_core] 90000Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.1.1.12.fastq >> ./1millionreads/sims.dwgsim.1.12.fastq
Thu Nov 30 11:50:34 2023: cat ./1millionreads/sims.dwgsim.1.2.12.fastq >> ./1millionreads/sims.dwgsim.1.12.fastq
[dwgsim_core] 187500
[dwgsim_core] Complete!
Thu Nov 30 11:50:37 2023: DWGSIM round 1 thread 3 end
Thu Nov 30 11:50:37 2023: cat ./1millionreads/sims.dwgsim.1.3.12.fastq >> ./1millionreads/sims.dwgsim.1.12.fastq
Thu Nov 30 11:50:37 2023: Simulate reads start
Thu Nov 30 11:50:37 2023: Load barcodes start
Thu Nov 30 11:54:38 2023: Load barcodes end
Thu Nov 30 11:54:38 2023: readPairsPerMolecule: 0                    <----- THIS PART
Thu Nov 30 11:54:38 2023: Simulating on haplotype: 0
Thu Nov 30 11:54:38 2023: Load read positions haplotype 0
Thu Nov 30 11:54:41 2023: 0 reads failed being loaded.
Thu Nov 30 11:54:41 2023: Exporting ./1millionreads/sims.0.fp
Thu Nov 30 11:54:42 2023: Exported ./1millionreads/sims.0.fp
Thu Nov 30 11:54:42 2023: readsCountDown: 500000                 <------ NEVER MOVES PAST THIS PART

SURVIVOR step is not progressing

Hi,
I am running LRSIM on human chr22 and I am having some troubles with the SURVIVOR step. This is the status file:

humanChr22.status exists
Tue Nov  7 15:19:59 2017: humanChr22.status
Tue Nov  7 15:19:59 2017: Variant simulation mode enabled
Tue Nov  7 15:19:59 2017: SURVIVOR start
Tue Nov  7 15:19:59 2017: Running: /home/myname/path/todir/bin/LRSIM/SURVIVOR 0 chr22.fa humanChr22.hap.parameter 0 humanChr22.hap 1000

I have tried to run the example independently and it hangs at this point, no error:

/home/myname/path/todir/bin/LRSIM/SURVIVOR 0 chr22NoN.fa humanChr22.hap.parameter 0 humanChr22.hap 1000
# Chrs passed size threshold:1
# Chrs passed size threshold:1
First: Genome checking:
First: Genome checking:
generate SV
generate_mutations_diploid function

Any suggestions?
Thank you in advance

SURVIVOR step not progressing

Hi,

I am running LRSIM on human chr1 and the SURVIVOR step does not seem to be progressing.

Here is the command I'm using: perl ../simulateLinkedReads.pl -r ./Chr1.fasta -p SapiensChr1 -c fragmentSizesList -x 50 -f 50 -t 500 -m 10 -1 10000 -4 1 -7 1 -0 1 -o

And here is the status: (runtime is short cause I had to rerun the command, but I've let it run for a few hours previously)

SapiensChr1.status exists
Tue Mar 16 13:24:16 2021: SapiensChr1.status
Tue Mar 16 13:24:16 2021: Variant simulation mode enabled
Tue Mar 16 13:24:16 2021: SURVIVOR start
Tue Mar 16 13:24:16 2021: Running: /home/morispi/StructuralVariants/LRSIM/SURVIVOR 0 ./Chr1.fasta SapiensChr1.hap.parameter 0 SapiensChr1.hap 10000

I saw on #12 that the problem could be caused by the lack of new lines in the reference genome file, but my reference genome does contain new lines. Moreover, I'm using the latest version of LRSIM, which includes the updated code for SURVIVOR, which should solve this issue, as mentioned in the replies prior to closing #12.

Moreover, just like in #12, if I use another set of parameters, and set the numbers of variants to simulate to 0, it works just fine, for instance: perl ../simulateLinkedReads.pl -r ./Chr1.fasta -p SapiensChr1 -c fragmentSizesList -x 50 -f 50 -t 500 -m 10 -1 10000 -4 0 -7 0 -0 0 -o

Status is:

Tue Mar 16 13:26:28 2021: SapiensChr1.status
Tue Mar 16 13:26:28 2021: Variant simulation mode enabled
Tue Mar 16 13:26:28 2021: SURVIVOR start
Tue Mar 16 13:26:28 2021: Running: /home/morispi/StructuralVariants/LRSIM/SURVIVOR 0 ./Chr1.fasta SapiensChr1.hap.parameter 0 SapiensChr1.hap 10000
Tue Mar 16 13:26:42 2021: SURVIVOR end
Tue Mar 16 13:26:42 2021: Build genome index start
Tue Mar 16 13:26:42 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.0.fasta 0 > SapiensChr1.hap.0.clean.fasta

And LRSIM then keeps going normally.

I'm not sure to understand what might be causing the issue. Do you have any suggestion?

Thanks a lot.
Pierre

LRSIM phase4 problem

Hi, i tried to simulate 10X reads from a 20MB sequence with :

perl ../simulateLinkedReads.pl -g fastaOriginal.fasta -p OutputLRSIM/default_params -b ../4M-with-alts-february-2016.txt -n -x 1 -f 150 -t 1 -m 3 -o -u 4

and with the test files.
Phase 1 2 and 3 run well, but phase 4 stop at the end :

Thu Oct 17 14:02:47 2019: Simulate reads start
Thu Oct 17 14:02:47 2019: Load barcodes start
Thu Oct 17 14:02:48 2019: Load barcodes end
Thu Oct 17 14:02:48 2019: Using fragment sizes from fragmentSizesList instead of Poisson distribution
Thu Oct 17 14:02:48 2019: 10000 sizes loaded
Thu Oct 17 14:02:48 2019: Average fragment size: 50kbp
Thu Oct 17 14:02:48 2019: readPairsPerMolecule: 100
Thu Oct 17 14:02:48 2019: Simulating on haplotype: 0
Thu Oct 17 14:02:48 2019: Load read positions haplotype 0
Thu Oct 17 14:02:48 2019: 0 reads failed being loaded.
Thu Oct 17 14:02:48 2019: Exporting ./test1.0.fp
Thu Oct 17 14:02:48 2019: Exported ./test1.0.fp
Thu Oct 17 14:02:48 2019: readsCountDown: 500000
Thu Oct 17 16:46:22 2019: Reached end of barcodes list. No more barcodes. Last read processed: 500000. Exiting.
Inappropriate ioctl for device at ../simulateLinkedReads.pl line 748.

How can i fix this problem please ? the software is well installed (sh make.sh end with "Done, please run 'perl simulateLinkedReads.pl'")

Install error - cannot find -lcurses

when I run sh make.sh I get this error. any suggestions?

make[2]: Leaving directory `/mnt/home/stephen/Apps/LRSIM/DWGSIMSrc/samtools/misc' gcc -g -Wall -O3 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o padding.o bedcov.o bamshuf.o bam_tview_curses.o bam_tview_html.o libbam.a -Lbcftools -lbcf -lcurses -lm -lz -lpthread /mnt/home/stephen/.linuxbrew/bin/ld: cannot find -lcurses collect2: error: ld returned 1 exit status make[1]: *** [samtools] Error 1 make[1]: Leaving directory `/mnt/home/stephen/Apps/LRSIM/DWGSIMSrc/samtools' make: *** [all-recur] Error 1

Also FYI, the first step of the install should be cd LRSIM not cd 10xReadsSimulator

Ran out of barcodes

I'm trying to simulate reads to test efficacy of linked reads on low-coverage datasets. This testing will occur for a range of low coverages and sample counts. However, when I try to use LRSIM, I keep getting a prompt that I've ran out of barcodes. I'm using a truncated d.melanogaster genome that's just the 4 largest chromosomes (for simplicity).

invmin=1000
invmax=10000
mollen=80
milreads=1
molper=10
prefix="sims.${milreads}mil.${molper}per"
threads=8

./LRSIM/simulateLinkedReads.pl -r dmel.trunc.fa -p $prefix -0 0 -x $milreads -f $mollen -m $molper -z $threads -o

How can I successfully produce low-coverage data without the warning that I've run out of barcodes? There is also an error of being unable to concatenate a file. LRSIM output:

Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.0.1.12.fastq >> sims.1mil.10per.dwgsim.0.12.fastq
cat: sims.1mil.10per.dwgsim.0.1.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.0.2.12.fastq >> sims.1mil.10per.dwgsim.0.12.fastq
cat: sims.1mil.10per.dwgsim.0.2.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.0.3.12.fastq >> sims.1mil.10per.dwgsim.0.12.fastq
cat: sims.1mil.10per.dwgsim.0.3.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.1.1.12.fastq >> sims.1mil.10per.dwgsim.1.12.fastq
cat: sims.1mil.10per.dwgsim.1.1.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.1.2.12.fastq >> sims.1mil.10per.dwgsim.1.12.fastq
cat: sims.1mil.10per.dwgsim.1.2.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: cat sims.1mil.10per.dwgsim.1.3.12.fastq >> sims.1mil.10per.dwgsim.1.12.fastq
cat: sims.1mil.10per.dwgsim.1.3.12.fastq: No such file or directory
Wed Mar 22 12:07:10 2023: Simulate reads start
Wed Mar 22 12:07:10 2023: Load barcodes start
Wed Mar 22 12:07:10 2023: Load barcodes end
Wed Mar 22 12:07:10 2023: readPairsPerMolecule: 0
Wed Mar 22 12:07:10 2023: Simulating on haplotype: 0
Wed Mar 22 12:07:10 2023: Load read positions haplotype 0
Wed Mar 22 12:07:11 2023: Importing sims.1mil.10per.0.fp
Wed Mar 22 12:07:12 2023: Imported sims.1mil.10per.0.fp
Wed Mar 22 12:07:12 2023: readsCountDown: 500000
Wed Mar 22 12:08:35 2023: Reached end of barcodes list. No more barcodes. Last read processed: 500000. Exiti
ng.
Inappropriate ioctl for device at ./LRSIM/simulateLinkedReads.pl line 748.

request: don't generate structural variants by default

Thanks for developing this great tool!

IMO, LRSIM should not generate reads with synthetic structural variants by default -- that is likely a surprising behaviour for end users.

For example, I was doing some evaluation of assembly algorithms with LRSIM reads and eventually found out that some of the disagreements were caused by the LRSIM reads and not by the assembler.

Small test-set sequence

I saw some comments in the readme, that implied that the current settings are not suitable for smaller regions: "Note that the default barcoding parameters do not perform well for small genomes (<100Mbp)."

I tried several options and I also tried to search the 10x website, but for someone not working with 10x myself it is hard to find parameters that could fit a small test sample.

Could you help me with standard settings for a 1Mb region? (or settings for a smaller than 100Mb region, that you consider minimally possible)

Long Ranger fails on simulated data

I simulated reads using

perl simulateLinkedReads.pl -r refdata-hg19-2.1.0/fasta/genome.fa -p sim -x 400

and tried running Long Ranger via

longranger align --id=lrsim --reference=refdata-hg19-2.1.0 --fastqs=lrsim_data --fastqprefix=sim --localcores=12

However, the run fails:

...
2017-01-10 12:17:59 [runtime] (run:local)       ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk37.main
2017-01-10 12:17:59 [runtime] (run:local)       ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk38.main
2017-01-10 12:17:59 [runtime] (run:local)       ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk39.main
2017-01-10 12:17:59 [runtime] (run:local)       ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk40.main
2017-01-10 12:17:59 [runtime] (run:local)       ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS.fork0.chnk41.main
2017-01-10 12:18:17 [runtime] (failed)          ID.lrsim.ALIGNER_CS.ALIGNER._ALIGNER.ATTACH_BCS

[error] Pipestance failed. Please see log at:
lrsim/ALIGNER_CS/ALIGNER/_ALIGNER/ATTACH_BCS/fork0/chnk0/_errors

The error appears to arise from a failed assertion:

Traceback (most recent call last):
  File "/mnt/work/arshajii/10x_aligner/longranger-2.0.1/martian-cs/2.0.1/adapters/python/main.py", line 20, in <module>
    martian.run("martian.module.main(args, outs)")
  File "/mnt/work/arshajii/10x_aligner/longranger-2.0.1/martian-cs/2.0.1/adapters/python/martian.py", line 417, in run
    exec(cmd, __main__.__dict__, __main__.__dict__)
  File "<string>", line 1, in <module>
  File "/mnt/work/arshajii/10x_aligner/longranger-2.0.1/longranger-cs/2.0.1/mro/stages/reads/attach_bcs/__init__.py", line 179, in main
    assert(reads_attached >= 2)
AssertionError

Extension to BGI stLFR

Hello,

I was wondering if it would be possible and if there were any plans, to extend this simulation software to model stLFR linked reads as well? I think that biggest difference is generally in the number of molecules per barcode. stLFR averages around 1.2 or so. Please let me know if I can get you anymore information on stLFR and if this could potentially be an added feature.

Best,
Ellis

LongRanger crashes

I am trying to align LRSIM simulation with LongRanger. I am getting following error. Any ideas on how to avoid this?

[error] Pipestance failed. Error log at:
Compgen10XSim/ALIGNER_CS/ALIGNER/_ALIGNER/ATTACH_BCS/fork0/chnk0/_errors
Log message:
Traceback (most recent call last):
File "/mnt/compgen/inhouse/src/longranger/longranger-2.1.6/martian-cs/2.2.2/adapters/python/main.py", line 23, in
martian.run("martian.module.main(args, outs)")
File "/mnt/compgen/inhouse/src/longranger/longranger-2.1.6/martian-cs/2.2.2/adapters/python/martian.py", line 544, in run
exec(cmd, main.dict, main.dict)
File "", line 1, in
File "/mnt/compgen/inhouse/src/longranger/longranger-2.1.6/longranger-cs/2.1.6/mro/stages/reads/attach_bcs/init.py", line 198, in main
assert(reads_attached >= 2)
AssertionError

LRSIM hangs during manifest generation step

I have been noticing some strange behavior on some of my LRSIM runs. Namely, LRSIM seems to hang during the manifest generation step. Sometimes the process hangs at the very beginning and will not even proceed past the first number. Other times, it may hang in the middle of the process for a long time without progressing. Is this a known issue?

This is an example from today, where the program stopped in the middle of manifest-file-generation, when the manifest file was at 6.7Gb.

Mon Jul 16 14:01:02 2018: 245200000 reads remaining
Mon Jul 16 14:01:04 2018: 245100000 reads remaining
Mon Jul 16 14:01:07 2018: 245000000 reads remaining
Mon Jul 16 14:01:09 2018: 244900000 reads remaining
Mon Jul 16 14:01:12 2018: 244800000 reads remaining
Mon Jul 16 14:01:15 2018: 244700000 reads remaining
Mon Jul 16 14:01:19 2018: 244600000 reads remaining
Mon Jul 16 14:01:23 2018: 244500000 reads remaining
Mon Jul 16 14:01:29 2018: 244400000 reads remaining
Mon Jul 16 14:01:40 2018: 244300000 reads remaining
Mon Jul 16 14:35:25 2018: 244200000 reads remaining

Some background on this run:
This is a run with the '-g' option, using a relatively small portion of the human genome
(~160megabases)
-x 400 -m 4 -f 84 -i 340 -t 1500 -o

A run with an identical set of parameters but with -x set to '27' worked without issue, but only after a restart.

I suspect this may have to do with the fact that the parameters are outside of the normal desired range of values, specifically the number of reads -x that are being generated, so it would be great to get some clarity on this issue.

running slow at CHECKPOINT 4, Simulate Read Part

Hi aquaskyline,

When I run a small haplotype fasta for test LRSIM, the programme always stuck for a lot of time at Fri Sep 8 14:08:37 2017: 100000 reads remaining in CHECKPOINT 4, Simulate Read Part.

Fri Sep  8 14:07:55 2017: Simulate reads start
Fri Sep  8 14:07:55 2017: Load barcodes start
Fri Sep  8 14:07:59 2017: Load barcodes end
Fri Sep  8 14:07:59 2017: readPairsPerMolecule: 35
Fri Sep  8 14:07:59 2017: Simulate reads begin on haplotype 0. Total 1
Fri Sep  8 14:07:59 2017: Simulating on haplotype: 0
Fri Sep  8 14:07:59 2017: Load read positions haplotype 0
Fri Sep  8 14:08:02 2017: 0 reads failed being loaded.
Fri Sep  8 14:08:02 2017: Exporting test.0.fp
Fri Sep  8 14:08:02 2017: Exported test.0.fp
Fri Sep  8 14:08:02 2017: readsCountDown: 562554
Fri Sep  8 14:08:07 2017: 500000 reads remaining
Fri Sep  8 14:08:14 2017: 400000 reads remaining
Fri Sep  8 14:08:22 2017: 300000 reads remaining
Fri Sep  8 14:08:30 2017: 200000 reads remaining
Fri Sep  8 14:08:37 2017: 100000 reads remaining

And when I stop the programme with Ctrl-C, and rerun the script, the remaining 100000 reads can processed very quickly.

So, why when 100000 reads remaining, the process time suddenly slow? And, why restart can make the progress faster?
Is it okay to stop and restart the process for a quicker running time? Any effect on simulated result?

Thanks a lot.
Lindsay

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.