Comments (9)
Hi @exeter-matthew-wakeling, could you share all of the commands you ran prior to kevlar filter
on the tutorial data? That will help me troubleshoot what went wrong and where. Thanks.
from kevlar.
These are the commands that I ran:
kevlar count --memory 250M mother.ct mother.fq.gz
kevlar count --memory 250M father.ct father.fq.gz
kevlar count --memory 250M proband.ct proband.fq.gz
kevlar novel --case proband.fq.gz --case-counts proband.ct --control-counts father.ct mother.ct -o novel.output
mv novel.output novel.augfastq
kevlar filter -o novel_filtered.augfastq novel.augfastq
This last command failed. At the time, the contents of the directory are:
-rw-rw-r-- 1 mw501 research 249999800 Feb 14 16:13 father.ct
-rw-rw-r-- 1 mw501 research 32314239 Feb 14 15:22 father.fq.gz
-rw-rw-r-- 1 mw501 research 249999800 Feb 14 16:12 mother.ct
-rw-rw-r-- 1 mw501 research 32314752 Feb 14 15:19 mother.fq.gz
-rw-rw-r-- 1 mw501 research 394075 Feb 14 16:22 novel.augfastq
-rw-rw-r-- 1 mw501 research 249999800 Feb 14 16:16 proband.ct
-rw-rw-r-- 1 mw501 research 32312614 Feb 14 15:25 proband.fq.gz
-rw-rw-r-- 1 mw501 research 717771 Feb 14 15:28 refr.fa.gz
-rw-rw-r-- 1 mw501 research 12 Feb 14 15:29 refr.fa.gz.amb
-rw-rw-r-- 1 mw501 research 39 Feb 14 15:29 refr.fa.gz.ann
-rw-rw-r-- 1 mw501 research 2500088 Feb 14 15:29 refr.fa.gz.bwt
-rw-rw-r-- 1 mw501 research 625002 Feb 14 15:29 refr.fa.gz.pac
-rw-rw-r-- 1 mw501 research 1250056 Feb 14 15:29 refr.fa.gz.sa
from kevlar.
I get the following when I run the first command.
$ kevlar count --memory 250M mother.ct mother.fq.gz
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "mother.fq.gz"
[kevlar::count] Done loading k-mers;
7500000 reads processed, 86100250 distinct k-mers stored;
estimated false positive rate is 0.399 (FPR too high, bailing out!!!)
Two thoughts.
- The quick start data is different from the tutorial data but is named the same. Did you download the new data, or did you re-use data from the quick start?
- It looks like you're using kevlar version 0.7. Several changes to the software have been made since then. I'm hoping to release a new version soon, but in the mean time you may consider installing the latest version from GitHub with
pip install git+https://github.com/dib-lab/kevlar.git
.
from kevlar.
You must be right - the files I am using are smaller than that, so they must be the quickstart files. I got the following when I ran the first command:
[mw501@login01 kevlar]$ kevlar count --memory 250M mother.ct mother.fq.gz
[kevlar] running version 0.7
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "mother.fq.gz"
[kevlar::count] Done loading k-mers;
750000 reads processed, 2507236 distinct k-mers stored;
estimated false positive rate is 0.000;
saved to "mother.ct"
[kevlar::count] Total time: 22.01 seconds
I just updated kevlar using the command you quoted (with --user added, as I don't have root access on this box). I then tried running kevlar, but got an error about khmer missing, so I installed that (again?). Now, when I try to run the first command, I get:
[mw501@login01 kevlar]$ kevlar count --memory 250M mother.ct mother.fq.gz
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "mother.fq.gz"
Exception in thread Thread-1:
Traceback (most recent call last):
File "/gpfs/ts0/shared/software/Miniconda3/4.7.10/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/gpfs/ts0/shared/software/Miniconda3/4.7.10/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
TypeError: argument 1 must be str, not _khmer.ReadParser
[kevlar::count] Done loading k-mers;
0 reads processed, 0 distinct k-mers stored;
estimated false positive rate is 0.000;
saved to "mother.ct"
[kevlar::count] Total time: 0.47 seconds
from kevlar.
Which version of khmer are you running? (Can test with normalize-by-median.py --version
.) That might be an issue with your latest post. The latest version of kevlar also relies on some updates to khmer that haven't yet been published in a stable release https://github.com/dib-lab/khmer.git
) that would be best. That may require getting a sysadmin's help if you don't have root privileges though.
Otherwise, you can try downgrading back to kevlar version 0.7 and increasing the memory you use for k-mer counting. If you're seeing differences between kevlar 0.7 and the latest documentation, you could also consider using the kevlar 0.7 documentation.
from kevlar.
[mw501@login01 kevlar]$ normalize-by-median.py --version
|| This is the script normalize-by-median.py in khmer.
|| You are running khmer version 2.1.1
|| You are also using screed version 1.0.4
||
|| If you use this script in a publication, please cite EACH of the following:
||
|| * MR Crusoe et al., 2015. http://dx.doi.org/10.12688/f1000research.6924.1
|| * CT Brown et al., arXiv:1203.4802 [q-bio.GN]
||
|| Please see http://khmer.readthedocs.io/en/latest/citations.html for details.
khmer 2.1.1
[mw501@login01 kevlar]$ pip install --user git+https://github.com/dib-lab/khmer.git
Collecting git+https://github.com/dib-lab/khmer.git
Cloning https://github.com/dib-lab/khmer.git to /tmp/pip-req-build-y38jy0gp
Running command git clone -q https://github.com/dib-lab/khmer.git /tmp/pip-req-build-y38jy0gp
Requirement already satisfied: screed>=1.0 in /gpfs/ts0/home/mw501/.local/lib/python3.7/site-packages (from khmer==3.0.0a3) (1.0.4)
Requirement already satisfied: bz2file in /gpfs/ts0/home/mw501/.local/lib/python3.7/site-packages (from khmer==3.0.0a3) (0.98)
Building wheels for collected packages: khmer
Building wheel for khmer (setup.py) ... done
Stored in directory: /tmp/pip-ephem-wheel-cache-0nh1y_ub/wheels/6b/c2/6a/ec82249e368a3b7a8efe8514e946e845451960517d9c50d8e8
Successfully built khmer
Installing collected packages: khmer
Found existing installation: khmer 2.1.1
Uninstalling khmer-2.1.1:
Successfully uninstalled khmer-2.1.1
Successfully installed khmer-3.0.0a3
[mw501@login01 kevlar]$
[mw501@login01 kevlar]$ normalize-by-median.py --version
|| This is the script normalize-by-median.py in khmer.
|| You are running khmer version 3.0.0a3
|| You are also using screed version 1.0.4
||
|| If you use this script in a publication, please cite EACH of the following:
||
|| * MR Crusoe et al., 2015. https://doi.org/10.12688/f1000research.6924.1
|| * CT Brown et al., arXiv:1203.4802 [q-bio.GN]
||
|| Please see http://khmer.readthedocs.io/en/latest/citations.html for details.
khmer 3.0.0a3
Ok, so the latest git version is now installed. I'm going to try the commands again - note that I'm running it on the quickstart files.
[mw501@login01 kevlar]$ kevlar count --memory 250M mother.ct mother.fq.gz
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "mother.fq.gz"
[kevlar::count] Done loading k-mers;
750000 reads processed, 2507236 distinct k-mers stored;
estimated false positive rate is 0.000;
saved to "mother.ct"
[kevlar::count] Total time: 20.20 seconds
[mw501@login01 kevlar]$ kevlar count --memory 250M father.ct father.fq.gz
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "father.fq.gz"
[kevlar::count] Done loading k-mers;
750000 reads processed, 2507691 distinct k-mers stored;
estimated false positive rate is 0.000;
saved to "father.ct"
[kevlar::count] Total time: 22.25 seconds
[mw501@login01 kevlar]$ kevlar count --memory 250M proband.ct proband.fq.gz
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "proband.fq.gz"
[kevlar::count] Done loading k-mers;
750000 reads processed, 2507598 distinct k-mers stored;
estimated false positive rate is 0.000;
saved to "proband.ct"
[kevlar::count] Total time: 20.41 seconds
[mw501@login01 kevlar]$ kevlar novel --case proband.fq.gz --case-counts proband.ct --control-counts father.ct mother.ct -o novel.augfastq
[kevlar] running version 0.7+15.gebabd62
[kevlar::novel] Loading control samples
[kevlar::novel] INFO: counttables for 2 sample(s) provided, any corresponding FASTA/FASTQ input will be ignored for computing k-mer abundances
[kevlar::sketch] loading sketchfile "father.ct"...done! estimated false positive rate is 0.000
[kevlar::sketch] loading sketchfile "mother.ct"...done! estimated false positive rate is 0.000
[kevlar::novel] Control samples loaded in 0.94 sec
[kevlar::novel] Loading case samples
[kevlar::novel] INFO: counttables for 1 sample(s) provided, any corresponding FASTA/FASTQ input will be ignored for computing k-mer abundances
[kevlar::sketch] loading sketchfile "proband.ct"...done! estimated false positive rate is 0.000
[kevlar::novel] Case samples loaded in 0.48 sec
[kevlar::novel] All samples loaded in 1.42 sec
[kevlar::novel] Iterating over reads from 1 case sample(s)
[kevlar::novel] Found 4274 instances of 370 unique novel kmers in 134 reads in 108.68 seconds
[kevlar::novel] Iterated over all case reads in 108.68 seconds
[kevlar::novel] Total time: 110.10 seconds
[mw501@login01 kevlar]$ kevlar filter -o novel_filtered.augfastq novel.augfastq
[kevlar] running version 0.7+15.gebabd62
Traceback (most recent call last):
File "/gpfs/ts0/home/mw501/.local/bin/kevlar", line 10, in <module>
sys.exit(main())
File "/gpfs/ts0/home/mw501/.local/lib/python3.7/site-packages/kevlar/__main__.py", line 30, in main
mainmethod(args)
File "/gpfs/ts0/home/mw501/.local/lib/python3.7/site-packages/kevlar/filter.py", line 100, in main
mask = kevlar.sketch.load(args.mask)
File "/gpfs/ts0/home/mw501/.local/lib/python3.7/site-packages/kevlar/sketch.py", line 87, in load
if not filename.endswith(extensions):
AttributeError: 'NoneType' object has no attribute 'endswith'
So it is still failing with the same error message. Any clues?
from kevlar.
Wait, why does the error message mention files in ~/.local/bin/python3.7 when I'm using python 3.6?
[mw501@login01 kevlar]$ module list
Currently Loaded Modulefiles:
1) GCCcore/7.3.0 6) hwloc/1.11.10-GCCcore-7.3.0 11) ScaLAPACK/2.0.2-gompi-2018b-OpenBLAS-0.3.1 16) Tcl/8.6.8-GCCcore-7.3.0 21) Python/3.6.6-foss-2018b
2) binutils/2.30-GCCcore-7.3.0 7) OpenMPI/3.1.1-GCC-7.3.0-2.30 12) foss/2018b 17) SQLite/3.24.0-GCCcore-7.3.0 22) Miniconda3/4.7.10
3) GCC/7.3.0-2.30 8) OpenBLAS/0.3.1-GCC-7.3.0-2.30 13) bzip2/1.0.6-GCCcore-7.3.0 18) XZ/5.2.4-GCCcore-7.3.0
4) zlib/1.2.11-GCCcore-7.3.0 9) gompi/2018b 14) ncurses/6.1-GCCcore-7.3.0 19) GMP/6.1.2-GCCcore-7.3.0
5) numactl/2.0.11-GCCcore-7.3.0 10) FFTW/3.3.8-gompi-2018b 15) libreadline/7.0-GCCcore-7.3.0 20) libffi/3.2.1-GCCcore-7.3.0
[mw501@login01 kevlar]$ which python
/gpfs/ts0/shared/software/Miniconda3/4.7.10/bin/python
I have no idea what this means, but it doesn't look right.
Possibly false alarm:
[mw501@login01 kevlar]$ python --version
Python 3.7.3
[mw501@login01 kevlar]$
[mw501@login01 kevlar]$ pip --version
pip 19.1.1 from /gpfs/ts0/shared/software/Miniconda3/4.7.10/lib/python3.7/site-packages/pip (python 3.7)
from kevlar.
Whoops, I'm sorry. I failed to notice something very obvious: that you are not providing any sequences with k-mers to mask. In part this is a bugโkevlar should either halt or at least issue a loud warning when this step is run without a mask. But it is also expected that the user will provide a using the mask to the kevlar filter
command.
In general, the mask should include k-mers that we're not interested in. For example, if a k-mer is high abundance in the proband but absent from the parents, we're still not interested in that k-mer if it's present in the reference genome. We also want to ignore k-mers from contaminants, such as vector sequences or bacterial genomes. So I often create a mask (use kevlar count
) from the sequences in the reference genome, UniVec, and E. coli. For this demo data though, creating the mask from only the reference genome should suffice.
from kevlar.
Fabulous. It works now. Many thanks. I have managed to find the de novo variants in the quickstart trio. Now I'll see if I can get it working on our real WGS data.
from kevlar.
Related Issues (20)
- Bug with log-space transformation
- Command or script for inspecting ambiguous calls
- MemoryError HOT 3
- Execution failed due to high FPR in case HOT 4
- Failed tests HOT 8
- Dead links to tutorial files HOT 3
- Empty files produced by kevlar split cause crash of kevlar assemble HOT 3
- python 3.6 install - pytest failures - 26 failed, 347 passed (pytest 5.3.5?) HOT 5
- Clarification - how to avoid high FPR HOT 3
- kevlar memory error HOT 2
- partition, node error HOT 3
- Controls for simlike HOT 2
- VCF Parsing Issue HOT 1
- Where is kevlar? HOT 3
- How to just get the Kmer?
- Kevlar novel multithreading HOT 2
- count_control error: estimated false positive rate is 0.385 (FPR too high, bailing out!!! HOT 5
- > 30,000 variants reported from trio-analysis with FPR 0.001? HOT 6
- Is kevlar able to detect duplications/repeat expansions and in general CNV? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kevlar.