Comments (5)
Hi @moldach! The workflow is failing at a k-mer counting step due to insufficient memory. There are a few ways to address this.
- If memory is abundant on your machine, you could simply increase the amount of memory allocated to counting k-mers in each case/proband and control/parent sample.
- Alternatively, you could use a tool like Lighter to do error correction* on the reads before running Kevlar. The amount of memory required for counting k-mers accurately depends on the number of distinct k-mers in a data set: sequencing errors often account for the majority of k-mers in a sequencing run, so eliminating those errors will bring the false positive rate down significantly.
- Another alternative is to increase the tolerance for error (
max_fpr
) in some samples. I'd recommend limiting the parents' FPRs to the default 0.05, but I've had decent success while relaxingmax_fpr
to >0.3 for case/proband samples.
None of these solutions is mutually exclusive: you could increase memory AND do error correction AND increase the max_fpr
for the controls. 1) and 3) would be the quickest to try, but only if you have access to a machine with sufficient memory. Note that at some steps of the workflow, all case + control + reference k-mer counts are loaded into memory simultaneously. With your current setup, that looks like 16 + (16 + 16) + 12 GB of memory.
*Error correction for low-coverage reads is challenging, and there were a few instances in which Lighter erroneously "corrected" reads that contained an actual (low coverage) variant rather than a sequencing error. But depending on the constraints of the system to which one has access, missing 1 or 2 variants out of 90-100 is worth the reduction in memory required for k-mer counting.
from kevlar.
Hi @standage thank you for getting back to me.
I do have abundant memory so I increased everywhere it said "memory"
in the config.json
file to 80G
- as this seems to be how rules are given memory in the Snakefile
.
This time the pipeline ran through 45 of 47 steps before failing with the following error:
[Sun Nov 1 17:19:16 2020]
Finished job 11.
45 of 47 steps (96%) done
[Sun Nov 1 17:19:16 2020]
Job 1: Filter calls, compute likelihood scores, and sort calls by score.
Job counts:
count jobs
1 like_scores
1
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
warnings.warn("Spaces are not permitted in the name. Converted to '_'")
kevlar --tee --logfile Logs/simlike.log simlike --mu 30.0 --sigma 10.0 --epsilon 0.001 --case-min 6 --refr Reference/refr-counts.smallcounttable --sample-labels Proband Mother Father --out calls.scored.sorted.vcf.gz --controls Sketches/ctrl0-counts.counttable Sketches/ctrl1-counts.counttable --case Sketches/case-counts.counttable calls.0.prelim.vcf.gz calls.1.prelim.vcf.gz calls.2.prelim.vcf.gz calls.3.prelim.vcf.gz calls.4.prelim.vcf.gz calls.5.prelim.vcf.gz calls.6.prelim.vcf.gz calls.7.prelim.vcf.gz calls.8.prelim.vcf.gz calls.9.prelim.vcf.gz calls.10.prelim.vcf.gz calls.11.prelim.vcf.gz calls.12.prelim.vcf.gz calls.13.prelim.vcf.gz calls.14.prelim.vcf.gz calls.15.prelim.vcf.gz
[kevlar] running version 0.7+15.gebabd62
[kevlar::simlike] Loading k-mer counts for each sample
Traceback (most recent call last):
File "/export/home/moldach/kavlar-test/kevlar-env/bin/kevlar", line 33, in <module>
sys.exit(load_entry_point('biokevlar==0.7+15.gebabd62', 'console_scripts', 'kevlar')())
File "/export/home/moldach/kavlar-test/kevlar-env/lib/python3.8/site-packages/kevlar/__main__.py", line 30, in main
mainmethod(args)
File "/export/home/moldach/kavlar-test/kevlar-env/lib/python3.8/site-packages/kevlar/simlike.py", line 363, in main
refr = kevlar.sketch.load(args.refr)
File "/export/home/moldach/kavlar-test/kevlar-env/lib/python3.8/site-packages/kevlar/sketch.py", line 92, in load
return loadfunc(filename)
File "khmer/_oxli/graphs.pyx", line 306, in khmer._oxli.graphs.Hashtable.load
OSError: Error reading from k-mer count file: Reference/refr-counts.smallcounttable Cannot allocate memory
[Sun Nov 1 17:23:40 2020]
Error in rule like_scores:
jobid: 0
output: calls.scored.sorted.vcf.gz, Logs/simlike.log
RuleException:
CalledProcessError in line 375 of /gpfs/home/moldach/projects/CG00018/Snakefile:
Command 'set -euo pipefail; kevlar --tee --logfile Logs/simlike.log simlike --mu 30.0 --sigma 10.0 --epsilon 0.001 --case-min 6 --refr Reference/refr-counts.smallcounttable --sample-labels Proband Mother Father --out calls.scored.sorted.vcf.gz --controls Sketches/ctrl0-counts.counttable Sketches/ctrl1-counts.counttable --case Sketches/case-counts.counttable calls.0.prelim.vcf.gz calls.1.prelim.vcf.gz calls.2.prelim.vcf.gz calls.3.prelim.vcf.gz calls.4.prelim.vcf.gz calls.5.prelim.vcf.gz calls.6.prelim.vcf.gz calls.7.prelim.vcf.gz calls.8.prelim.vcf.gz calls.9.prelim.vcf.gz calls.10.prelim.vcf.gz calls.11.prelim.vcf.gz calls.12.prelim.vcf.gz calls.13.prelim.vcf.gz calls.14.prelim.vcf.gz calls.15.prelim.vcf.gz' returned non-zero exit status 1.
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
File "/gpfs/home/moldach/projects/CG00018/Snakefile", line 375, in __rule_like_scores
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/moldach/miniconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/home/moldach/projects/CG00018/.snakemake/log/2020-10-29T164206.431603.snakemake.log
I'd like to be able to re-submit the job with more memory to finish these last two steps but I'm not sure which part of the config.json
I should be adjusting the memory for.
Also, when I tried to re-submit again I got the following error:
Building DAG of jobs...
ChildIOException:
File/directory is a child to another output:
('/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa', link_reference)
('/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa', link_mask)
from kevlar.
OSError: Error reading from k-mer count file: Reference/refr-counts.smallcounttable Cannot allocate memory
This means you ran out of memory on the machine: it cannot hold all the k-mer count tables in memory. Using 80GB for the reference and the mask file is overkill. Those can (and according to this error, probably should) be kept at their original values. If you delete the mask and reference counttables/nodetables, you should be able to rebuild them and continue with the workflow without the need to start over again from scratch.
from kevlar.
Sorry I should have asked earlier.
Was it only "recountmem"
which needed to be increased?
Seems like the run completed successfully with your suggestion - so, to confirm, I'll be looking at the result in calls.scored.sorted.vcf.gz
?
Thanks
from kevlar.
Was it only
recountmem
which needed to be increased?
Actually, not much memory is required for recounting. Increasing the memory
for each case and control sample would have been recommended.
so, to confirm, I'll be looking at the result in
calls.scored.sorted.vcf.gz
?
Yep, that's the one!
from kevlar.
Related Issues (20)
- Bug with log-space transformation
- Command or script for inspecting ambiguous calls
- MemoryError HOT 3
- Execution failed due to high FPR in case HOT 4
- Failed tests HOT 8
- Dead links to tutorial files HOT 3
- Empty files produced by kevlar split cause crash of kevlar assemble HOT 3
- python 3.6 install - pytest failures - 26 failed, 347 passed (pytest 5.3.5?) HOT 5
- Tutorial: Error on running kevlar filter HOT 9
- Clarification - how to avoid high FPR HOT 3
- kevlar memory error HOT 2
- partition, node error HOT 3
- Controls for simlike HOT 2
- VCF Parsing Issue HOT 1
- Where is kevlar? HOT 3
- How to just get the Kmer?
- Kevlar novel multithreading HOT 2
- > 30,000 variants reported from trio-analysis with FPR 0.001? HOT 6
- Is kevlar able to detect duplications/repeat expansions and in general CNV? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kevlar.