Giter Site home page Giter Site logo

Comments (5)

standage avatar standage commented on September 8, 2024

Hi @moldach! The workflow is failing at a k-mer counting step due to insufficient memory. There are a few ways to address this.

  1. If memory is abundant on your machine, you could simply increase the amount of memory allocated to counting k-mers in each case/proband and control/parent sample.
  2. Alternatively, you could use a tool like Lighter to do error correction* on the reads before running Kevlar. The amount of memory required for counting k-mers accurately depends on the number of distinct k-mers in a data set: sequencing errors often account for the majority of k-mers in a sequencing run, so eliminating those errors will bring the false positive rate down significantly.
  3. Another alternative is to increase the tolerance for error (max_fpr) in some samples. I'd recommend limiting the parents' FPRs to the default 0.05, but I've had decent success while relaxing max_fpr to >0.3 for case/proband samples.

None of these solutions is mutually exclusive: you could increase memory AND do error correction AND increase the max_fpr for the controls. 1) and 3) would be the quickest to try, but only if you have access to a machine with sufficient memory. Note that at some steps of the workflow, all case + control + reference k-mer counts are loaded into memory simultaneously. With your current setup, that looks like 16 + (16 + 16) + 12 GB of memory.


*Error correction for low-coverage reads is challenging, and there were a few instances in which Lighter erroneously "corrected" reads that contained an actual (low coverage) variant rather than a sequencing error. But depending on the constraints of the system to which one has access, missing 1 or 2 variants out of 90-100 is worth the reduction in memory required for k-mer counting.

from kevlar.

moldach avatar moldach commented on September 8, 2024

Hi @standage thank you for getting back to me.

I do have abundant memory so I increased everywhere it said "memory" in the config.json file to 80G - as this seems to be how rules are given memory in the Snakefile.

This time the pipeline ran through 45 of 47 steps before failing with the following error:

[Sun Nov  1 17:19:16 2020]
Finished job 11.
45 of 47 steps (96%) done

[Sun Nov  1 17:19:16 2020]
Job 1: Filter calls, compute likelihood scores, and sort calls by score.

Job counts:
	count	jobs
	1	like_scores
	1
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
kevlar --tee --logfile Logs/simlike.log simlike --mu 30.0 --sigma 10.0 --epsilon 0.001 --case-min 6 --refr Reference/refr-counts.smallcounttable --sample-labels Proband Mother Father --out calls.scored.sorted.vcf.gz --controls Sketches/ctrl0-counts.counttable Sketches/ctrl1-counts.counttable --case Sketches/case-counts.counttable calls.0.prelim.vcf.gz calls.1.prelim.vcf.gz calls.2.prelim.vcf.gz calls.3.prelim.vcf.gz calls.4.prelim.vcf.gz calls.5.prelim.vcf.gz calls.6.prelim.vcf.gz calls.7.prelim.vcf.gz calls.8.prelim.vcf.gz calls.9.prelim.vcf.gz calls.10.prelim.vcf.gz calls.11.prelim.vcf.gz calls.12.prelim.vcf.gz calls.13.prelim.vcf.gz calls.14.prelim.vcf.gz calls.15.prelim.vcf.gz
[kevlar] running version 0.7+15.gebabd62
[kevlar::simlike] Loading k-mer counts for each sample
Traceback (most recent call last):
  File "/export/home/moldach/kavlar-test/kevlar-env/bin/kevlar", line 33, in <module>
    sys.exit(load_entry_point('biokevlar==0.7+15.gebabd62', 'console_scripts', 'kevlar')())
  File "/export/home/moldach/kavlar-test/kevlar-env/lib/python3.8/site-packages/kevlar/__main__.py", line 30, in main
    mainmethod(args)
  File "/export/home/moldach/kavlar-test/kevlar-env/lib/python3.8/site-packages/kevlar/simlike.py", line 363, in main
    refr = kevlar.sketch.load(args.refr)
  File "/export/home/moldach/kavlar-test/kevlar-env/lib/python3.8/site-packages/kevlar/sketch.py", line 92, in load
    return loadfunc(filename)
  File "khmer/_oxli/graphs.pyx", line 306, in khmer._oxli.graphs.Hashtable.load
OSError: Error reading from k-mer count file: Reference/refr-counts.smallcounttable Cannot allocate memory
[Sun Nov  1 17:23:40 2020]
Error in rule like_scores:
    jobid: 0
    output: calls.scored.sorted.vcf.gz, Logs/simlike.log

RuleException:
CalledProcessError in line 375 of /gpfs/home/moldach/projects/CG00018/Snakefile:
Command 'set -euo pipefail;  kevlar --tee --logfile Logs/simlike.log simlike --mu 30.0 --sigma 10.0 --epsilon 0.001 --case-min 6 --refr Reference/refr-counts.smallcounttable --sample-labels Proband Mother Father --out calls.scored.sorted.vcf.gz --controls Sketches/ctrl0-counts.counttable Sketches/ctrl1-counts.counttable --case Sketches/case-counts.counttable calls.0.prelim.vcf.gz calls.1.prelim.vcf.gz calls.2.prelim.vcf.gz calls.3.prelim.vcf.gz calls.4.prelim.vcf.gz calls.5.prelim.vcf.gz calls.6.prelim.vcf.gz calls.7.prelim.vcf.gz calls.8.prelim.vcf.gz calls.9.prelim.vcf.gz calls.10.prelim.vcf.gz calls.11.prelim.vcf.gz calls.12.prelim.vcf.gz calls.13.prelim.vcf.gz calls.14.prelim.vcf.gz calls.15.prelim.vcf.gz' returned non-zero exit status 1.
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
  File "/gpfs/home/moldach/projects/CG00018/Snakefile", line 375, in __rule_like_scores
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/home/moldach/miniconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/home/moldach/projects/CG00018/.snakemake/log/2020-10-29T164206.431603.snakemake.log

I'd like to be able to re-submit the job with more memory to finish these last two steps but I'm not sure which part of the config.json I should be adjusting the memory for.

Also, when I tried to re-submit again I got the following error:

Building DAG of jobs...
ChildIOException:
File/directory is a child to another output:
('/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa', link_reference)
('/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa', link_mask)

from kevlar.

standage avatar standage commented on September 8, 2024

OSError: Error reading from k-mer count file: Reference/refr-counts.smallcounttable Cannot allocate memory

This means you ran out of memory on the machine: it cannot hold all the k-mer count tables in memory. Using 80GB for the reference and the mask file is overkill. Those can (and according to this error, probably should) be kept at their original values. If you delete the mask and reference counttables/nodetables, you should be able to rebuild them and continue with the workflow without the need to start over again from scratch.

from kevlar.

moldach avatar moldach commented on September 8, 2024

Sorry I should have asked earlier.

Was it only "recountmem" which needed to be increased?

Seems like the run completed successfully with your suggestion - so, to confirm, I'll be looking at the result in calls.scored.sorted.vcf.gz?

Thanks

from kevlar.

standage avatar standage commented on September 8, 2024

Was it only recountmem which needed to be increased?

Actually, not much memory is required for recounting. Increasing the memory for each case and control sample would have been recommended.

so, to confirm, I'll be looking at the result in calls.scored.sorted.vcf.gz?

Yep, that's the one!

from kevlar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.