csb5 / lofreq Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 30.0 22.71 MB

LoFreq Star: Sensitive variant calling from sequencing data

Home Page: http://csb5.github.io/lofreq/

License: Other

Shell 5.33% C 74.19% Python 17.83% Makefile 0.29% M4 2.36%

lofreq's People

Contributors

Stargazers

Watchers

lofreq's Issues

Key SOMATIC in field INFO is not defined in the VCF file header for of indel calls

Hi,

I used picard-tools-1.141 SortVcf to sort VCF file produced by lofreq (version 2.1.2). Then the sorted VCF file will be combined with indel calls from other algorithms (as suggested in somaticseq pipeline). Picard gave me
"Exception in thread "main" java.lang.IllegalStateException: Key SOMATIC found in VariantContext field INFO at 1:11113181 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default."

My VCF file example:
`##fileformat=VCFv4.0

fileDate=20160713

source=lofreq call -d 101000 -f /home/proj/MDW_genomics/xu/galgal5/galgal5.fa --verbose --no-default-filter -b 1 --call-indels -a 0.010000 -C 6 -s -S /scratch/xu/MDV_project/bqsr/S10_normal_stringent.snvs.vcf.gz,/scratch/xu/MDV_project/bqsr/S10_normal_stringent.indels.vcf.gz --no-default-filter -r 1:1-98101272 -o /tmp/3464340.1.lofn.q/lofreq2_call_parallel1xcqSq/0.vcf.gz /scratch/xu/MDV_project/bqsr/918-3_S10.tumor.bam

reference=/home/proj/MDW_genomics/xu/galgal5/galgal5.fa

INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">

INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">

INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position">

INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">

INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">

INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">

INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">

FILTER=<ID=min_dp_6,Description="Minimum Coverage 6">

FILTER=<ID=max_dp_100000,Description="Maximum Coverage 100000">

FILTER=<ID=sb_fdr,Description="Strand-Bias Multiple Testing Correction: fdr corr. pvalue > 0.001000">

FILTER=<ID=indelqual_bonf,Description="Indel Quality Multiple Testing Correction: bonf corr. pvalue < 0.010000">

INFO=<ID=UNIQ,Number=0,Type=Flag,Description="Unique, i.e. not detectable in paired sample">

INFO=<ID=UQ,Number=1,Type=Integer,Description="Phred-scaled uniq score at this position">

FILTER=<ID=uq_fdr,Description="Uniq Multiple Testing Correction: fdr corr. pvalue < 0.000100">

CHROM POS ID REF ALT QUAL FILTER INFO

1 5177961 . GA G 110 PASS DP=12;AF=0.416667;SB=0;DP4=2,5,2,3;INDEL;HRUN=9;SOMATIC;UQ=63`

My question is how can I modify my VCF FILE header to solve this small problem?

Best regards,
Hongen

What does the AF column really mean?

With Lofreq, what does the AF column really represent? My assumption is that it is the minority variant allele frequency. It appears to range between 0 and 1, does that mean that a variant at 1.0 frequency is an actual SNP? Is there some documentation on interpreting these outputs? Many thanks and sorry if this is naive.

Calling MNV

Hi,

I wanted to check with you if Lofreq can call multi-nucleotide variants (MNV). e.g. AG>ACT. I created a simulated bam with this MNP and Lofreq wasnt able to call it. I used --call-indels paramter also, thinking it can call it as SNV and indel if not MNV, but it didnt call anything. Please let me know if there are any parameters I can tweak in order to make it work.

Thanks,
Ashini

Parallelize alnqual

To speedup preprocessing. Precomputation also avoids memory problem in on-the-fly computation for high indel error-rate data (keeps data for all piled up reads in memory)

Print the p-values

Is there a way to print the actual p-value that is associated with each variant?

Replace user option Bonferroni factor with Bonferroni threshold

As discussed with NN and IS on 2015-04-24

VCF sample column

I've been using LoFreq for a while and haven't had any issues with it. However, recently, I got an error that my VCF was not valid because it did not have a sample column (it stops at INFO). Is it possible to add the sample column? If that's not already built in, is it possible to add it somehow with an external tool?

dependency on kaln.h

I can't get lofreq to compile against the latest 1.2.x version of samtools. It seems to depend on kaln.h which is no longer in the samtools 1.2.x repo.

Error while running LoFreq somatic

Hi,

I am getting an invalid syntax error, when I try to run lofreq somatic mode.
$lofreq somatic

File "Reference/Apps/LoFreq/lofreq_star-2.1.2/bin/lofreq2_somatic.py", line 378
def rlx_to_str(self, sample_type, (num_snv_tests, num_indel_tests)):
^
SyntaxError: invalid syntax

Can you please help me fix it.

Thanks,
Ashini

Unable to run call-parallel on filenames with spaces

Hi,
I tried to run lofreq like this

lofreq call-parallel --pp-threads 4 -f 'path with spaces and (parenthesis)/ref_sequence.fna' bam_file.bam -o calls.vcf

and I get a bunch of

/bin/sh: -c: line 0: syntax error near unexpected token `('

I know, I know... One should not have path with spaces, but Dropbox recently renamed its folder to Dropbox (Personal). I guess that protecting the file names with shlex.quote should do the trick.

--only-indels should work without --call-indels

When using --only-indels as argument to the call commands --call-indels should be implied.

base quality filtering is actually just masking during variant call but doesnt' remove bases

Raised by gmy on sourceforge's lofreq:discussion mailing list.

Either really implement this as filter (i.e. also remove in DP[4] and DP) or make clear it's just temporary masking (or have both)

Substitution and indel tests

Hi Andreas,

After a Lofreq run, I saw messages like "Number of substitution tests and indel tests are: ... respectively". I am wondering what do these tests do? Are they necessary for Lofreq and for the output? If they can be turned off, how can I do that? I didn't find an input option that could do that.

Thank you very much,
Zheng

pseudo-parallel indel calling might miss indels at borders

Will not matter much in practice (e.g. get roughly 30 pooled commands for a ecoli which makes LoFreq blind to 30-1 positions).

Reported by isovic

Index violation in IDAQ for indel calling in Pacbio

u_within_limits() in bam_md_ext.c handles the problem (which might be a feature) for now. Larger bandwith settings might help, but will slow things down even more.

Add python interpreter option to configure

If Python version 3 as well as Python version 2.7 are installed and a binary or link called python is missing, configure will find and use version 3 (which is not supported by LoFreq). This is easily fixed by creating such link, but it would be even easier if the corresponding interpreter could be set as option to ./configure.

Request: Accept several BAM files in call

There could be at least two cases were this is needed for the final user:

Call variants using information for several samples: this could increment the power if you are looking for low frequency variants when shared between samples. Another important fact of call SNPs for several samples at the same time is that the strand bias could be computed using more information (more coverage).
Call variants using replicates: sometimes two samples from the same population are taken, but calling variants with them together (although they are going to be manage independently) will increase the power.

Of course, the final user could merge the BAM files to obtain a huge one and call the variant on that one whatever is the case; nevertheles, sometimes impossible to handle, because too many samples are used. I think that the htslib pileup function can handle several BAM and this could be easy to implement, just taking all the parameters after the last flag as input files and pass them to it.

lofreq call filter doesn't work

Hi,

I installed LoFreq 2.1.2 and 2.1.3.1 from binary packages on Ubuntu. I don't know if it's necessary but Samtools-1.1 and Htslib-1.1 are installed in the PATH.
Both versions give me an error when I try to run LoFreq call:

./lofreq call -f /genome_hg38.fa -o /sample1.lofreq_variants.vcf -l /SeqCap_EZ.hg38.analysis.variant_calling.target.bed.merged  /sample1.alnqual-lofreq.bam --verbose --no-default-filter
Alive and happily crunching away on pos 17018314 of 1...
Alive and happily crunching away on pos 121526406 of 10...
Alive and happily crunching away on pos 25227485 of 12...
Alive and happily crunching away on pos 31337657 of 17...
Alive and happily crunching away on pos 54290077 of 4...
Executing lofreq filter -i /tmp/lofreq2-call-dyn-bonf.iMIoN6 -o /sample1.lofreq_variants.vcf --no-defaults --snvqual-thresh 76 --indelqual-thresh 20
FATAL(lofreq_filter.c|main_filter:1104): Unrecognized argument found. Exiting...

ERROR(lofreq_call.c|main_call:1529): The following command failed: lofreq filter -i /tmp/lofreq2-call-dyn-bonf.iMIoN6 -o /sample1.lofreq_variants.vcf --no-defaults --snvqual-thresh 76 --indelqual-thresh 20

I already checked the bam-file that I'm using, and it was created using the same reference genome, also the qualities were checked and should be ok.
Even when I use the --no-default-filter there is still the problem with the filter step. Is there something that I'm doing wrong?

Thanks

The variants of Low frequence

How can I call the variants that allele frequence is under 1% ? Is there any parameter can be set?

Handling of overlapping paired-end reads

Ideally overlapping paired-end reads shouldn't be counted twice. This could be achieved by proper pre-processing (e.g. SeqPrep and single-end mapping for merged pairs) or internally in LoFreq by making use of samtools "read-pair overlap detection". Problem with the latter is that it bumps up one quality and sets the other to zero instead of just swallowing the base.

Issue when run make

Hi there,

I am trying to install lofreq on Ubuntu 16.04 and get this following error msg when run make

washington@Linux:~/lofreq$ make
Making all in src/cdflib90
make[1]: Entering directory '/home/washington/lofreq/src/cdflib90'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/washington/lofreq/src/cdflib90'
Making all in src/uthash
make[1]: Entering directory '/home/washington/lofreq/src/uthash'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/washington/lofreq/src/uthash'
Making all in src/lofreq
make[1]: Entering directory '/home/washington/lofreq/src/lofreq'
gcc -DPACKAGE_NAME="LoFreq_Star" -DPACKAGE_TARNAME="lofreq_star" -DPACKAGE_VERSION="2.1.3.1" -DPACKAGE_STRING="LoFreq_Star\ 2.1.3.1" -DPACKAGE_BUGREPORT="[email protected]" -DPACKAGE_URL="" -DGIT_VERSION="v2.1.3.1-dirty" -DPACKAGE="lofreq_star" -DVERSION="2.1.3.1" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -D__EXTENSIONS__=1 -D_ALL_SOURCE=1 -D_GNU_SOURCE=1 -D_POSIX_PTHREAD_SEMANTICS=1 -D_TANDEM_SOURCE=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=".libs/" -DSTDC_HEADERS=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DHAVE_LIBM=1 -DHAVE_LIBZ=1 -DNDEBUG=/**/ -D_USE_KNETFILE=1 -I. -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -Wall -I../cdflib90/ -I../uthash -I/home/washington/htslib-1.5 -I/home/washington/samtools-1.5 -g -O2 -pthread -MT bam_md_ext.o -MD -MP -MF .deps/bam_md_ext.Tpo -c -o bam_md_ext.o bam_md_ext.c
In file included from bam_md_ext.c:38:0:
/home/washington/samtools-1.5/bam.h:189:28: error: conflicting types for ‘seq_nt16_int’
#define bam_nt16_nt4_table seq_nt16_int
^
bam_md_ext.c:54:12: note: in expansion of macro ‘bam_nt16_nt4_table’
const char bam_nt16_nt4_table[] = { 4, 0, 1, 4, 2, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4 };
^
In file included from /home/washington/htslib-1.5/htslib/sam.h:31:0,
from bam_md_ext.c:37:
/home/washington/htslib-1.5/htslib/hts.h:319:18: note: previous declaration of ‘seq_nt16_int’ was here
extern const int seq_nt16_int[];
^
In file included from bam_md_ext.c:43:0:
samutils.h:47:14: warning: ‘op_cat_str’ defined but not used [-Wunused-variable]
static char *op_cat_str[] = {
^
Makefile:467: recipe for target 'bam_md_ext.o' failed
make[1]: *** [bam_md_ext.o] Error 1
make[1]: Leaving directory '/home/washington/lofreq/src/lofreq'
Makefile:403: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

can anyone provide and help?

Thank you,

WLD

Possibly unconsistent count of forward/reverse reads

Hi Andreas,
I already anticipated by mail this observation I made. I simulated forward and reverse reads in quite equal amount and the alignment seems to agree. Nevertheless, the counts of the INFO tag DP4 are strange. For example:

##fileformat=VCFv4.0
##fileDate=20150429
##source=lofreq call -f cns.fasta --no-default-filter -r sample_cons_Pol:1-188 -o /tmp/lofreq2_call_parallelKepDFM/0.vcf.gz hq_2_cons_sorted.bam
##reference=cns.fasta
...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
sample_cons_Pol 50      .       C       G       85      PASS    DP=1109;AF=0.005410;SB=0;DP4=1103,0,6,0
sample_cons_Pol 328     .       A       G       8400    PASS    DP=3231;AF=0.100898;SB=0;DP4=2886,9,325,1

Any thought? I'm using

version: 2.1.2a
commit: v2.1.2a-92-g6f6e173
build-date: Apr  9 2015

Thanks!

Allow sam or uncompressed BAM output

Variant not called at 12% AF

I've attached a .bam that shows evidence of two variants of significant depth (here's an IGV snapshot):

While I am able to detect the first one with lofreq, the second one fails to appear, with any set of cmdline options I can think of. Could someone have a look at this and explain why the ~12% AF variant at chr7:148506396 A / C is not reported by lofreq? I am using lofreq 2.1.1.

missing_variant.bam.zip

Call variants when the reference is an N

Some flag could be provided for allow calling variants when there is an N in the reference. As we discuss by email, it could be done just replacing the N for a random reference; but it will be very useful for the final user to be allow to output this kind of variants.

Including this in the code could be done by:

When the reference is an N, replace it by a base that have no coverage in the data. Using a reference with 0 coverage will allow to output any variant that is found. Nevertheless, it could be an issue if each base (ATCG) is in at least one read, so it would not be reported if it is a reference SNP.
Perform the calling using the replaced N as a reference.
Change the reference to N again before reporting the variant.

The first point could be change by using the consensus variant approach as in previous versions.

SyntaxError: Non-ASCII character '\xb6' in file lofreq_star-2.1.2/

I had previous success running lofreq adn now am getting a syntax error. Any suggestion? The commad I run:

$python lofreq_star-2.1.2/bin/lofreq call-parallel --pp-threads 12 -f human_g1k_v37.fa -s -S All_20161122.vcf.gz -l SureSelect.bed -o SampleA_lofreq.vcf SampleA_realigned_recal.bam

File "lofreq_star-2.1.2/bin/lofreq", line 1
SyntaxError: Non-ASCII character '\xb6' in file lofreq_star-2.1.2/bin/lofreq on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

I reinstalled the package incase something strange happend to original installation since I last used the software. Please advice how to fix the error. THanks.

temporary file creation error

I have many Illumina datasets (> 1000 samples from different individuals within a population) which I will analyze with LoFreq.

If I run "LoFreq call" one a single set of aligned reads, all goes smoothly.

However, if I submit all 1000+ independent jobs to the compute cluster (SGE), then many of them return with an error:

ERROR(lofreq_call.c|main_call:1394): Couldn't open /tmp/lofreq2-call-dyn-bonf.9pEkER

But with a different suffix: e.g. lofreq2-call-dyn-bonf.nydjvL or lofreq2-call-dyn-bonf.uHy7CC , etc.

I noticed that LoFreq uses "mktemp" in lofreq_call.c. But I have read that "mktemp" is unsafe and that "mkstemp" should be used instead.

I tried editing the source file from "mktemp" to "mkstemp" but I got a segfault at run time (I may be lacking the zlib devel version, maybe that's why)

please note that when I run the same exact command alone (i.e. without 1000 other simultaneous jobs), then it executes perfectly fine, no error message.

Maybe there can be an option for the user to give a temporary/working directory to LoFreq, to aovid the "mktemp" problem entirely?

Compilation fails with samtools v.1.3 and htslib v.1.3

When configure with pre-compiled samtools-1.3 and htslib-1.3, make produce the following error:

In file included from bam_md_ext.c:38:0:
samtools-1.3/bam.h:189:28: error: conflicting types for ‘seq_nt16_int’
 #define bam_nt16_nt4_table seq_nt16_int
                            ^
bam_md_ext.c:54:12: note: in expansion of macro ‘bam_nt16_nt4_table’
 const char bam_nt16_nt4_table[] = { 4, 0, 1, 4, 2, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4 };
            ^
In file included from htslib-1.3/htslib/sam.h:30:0,
                 from bam_md_ext.c:37:
htslib-1.3/htslib/hts.h:255:18: note: previous declaration of ‘seq_nt16_int’ was here
 extern const int seq_nt16_int[];
                  ^
In file included from bam_md_ext.c:43:0:
samutils.h:47:14: warning: ‘op_cat_str’ defined but not used [-Wunused-variable]
 static char *op_cat_str[] = {
              ^
make[1]: *** [bam_md_ext.o] Error 1
make: *** [all-recursive] Error 1

Just tag removed variants with a filter instead of fully removing them

Suggested by Daniel Gómez Sánchez

reporting strand bias

Lofreq assigns a Phred-scaled score for strand bias. Why use Phred scores? Is it possible to report the raw probability scores? What does it mean when a site has a Phred score SB=0?

DP and DP4 fields

Hi Andreas,

Can you please explain how DP and DP4 fields differ? I know DP is raw depth and DP4 is counts for ref and alt on forward & reverse bases. So, I believe DP4 is subset of raw depth and does not include reads with strand bias. Is that correct?
We have a variant for which the allele frequency calculated by Lofreq is 3% ( I use -m20 -q 13 and -Q 13) and quality of 290, but when we calculate it from the DP4 counts for ref and alt, it comes out to be 49.19%. IGV also has VAF of 48%, so just wondering how should I go about interpreting these. Should we always calculate the VAF from DP4?
Moreover, when I run Lofreq with default settings, it calculates VAF as 47.7% and DP4 estimated VAF is still 49% with a quality of 3586. So, definitely the settings change the allele frequency calculation by Lofreq but why is DP4 not affected by this. Can you please help me solve this issue.

Thank you.

Parallelize Viterbi

After all reads are aligned independently.

Could not parse threshold

Hi
I am developer and trying to wrap up your lofreq tool for galaxyproject. I have the forms generated and the command executed.

lofreq call -f "/galaxy/database/files/000/dataset_131.dat" -o "/galaxy/database/files/000/dataset_132.dat" --min-bq "6" --min-alt-bq "6" --def-alt-bq "0" --min-jq "0" --min-alt-jq "0" --def-alt-jq "0" --no-baq --del-baq --no-ext-baq --min-mq "0" --max-mq "0" --def-nm-q "0" --min-cov "1" --max-depth "1000000" --src-qual --sig --bon f --illumina-1.3 --use-orphan --plp-summary-only --no-default-filter --force-overwrite --verbose "/galaxy/database/files/000/dataset_129.dat"

dataset_131 is the fasta file (reference_h37rv.fasta)
dataset_131 is the output vcf file
dataset_132 is the bam file (ext1.bam)

This is though the error, I receive when trying to run the tool:
FATAL(lofreq_call.c|main_call:1243): Couldn't parse sign-threshold

What does it mean, in program terms (simple terms)?

Regards
Zipho

add genotyping support

just copy logic from somewhere else

LoFreq v.2.1.2 pre-compiled for OS X does not works in some OS versions

Pre-compiled version does not work in OS X 10.8.5 (12F2560) but it works in OS X 10.9.5 (13F1507), because of the libSystem.B.dylib:

dyld: Symbol not found: ___exp10
  Referenced from: lofreq
  Expected in: /usr/lib/libSystem.B.dylib

I understand that it could not be compiled for every OS, but I would appreciate if it could be specified somewhere for which version could be used, because it does not complain until it needs this function.

It is not a major issue, because when I compiled the source in my computer is working perfectly, but for a final user it could be a problem.

Allow to feed uniform indel and indel alignment qualities to call functions

Useful for debugging and circumventing preprocessing and duplication of BAM where not necessary

Error compiling lofreq

I am compiling lofreq using this

 ./configure SAMTOOLS=/apps/samtools/1.2.1 HTSLIB=/apps/htslib/1.2.1/ --
prefix=/apps/lofreq/2.1.3

And it starts to compile

configure: Configuring LoFreq_Star (version 2.1.3a) for your system...

            |             ____|
            |       _ \   |     __|  _ \   _` |
            |      (   |  __|  |     __/  (   |
           _____| \___/  _|   _|   \___| \__, |
                                             _|

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for style of include used by make... GNU
checking for gcc... /apps/gcc/5.3.0/bin/gcc
checking whether the C compiler works... yes

. .... truncated

But then it lands on this issue.

> checking whether to build with debug information... no
> checking for /apps/samtools/1.2.1/bam.h... no
> configure: error: bam.h not found

I get this error against htslib 1.2.1 and 1.3.1 and the same with samtools 1.2.1 and 1.3.1. I am not sure how to resolve this error. What compilation steps do you recommend.

Strand-bias: only apply if distribution is opposite?

Matters esp. for high coverage data where huge numbers make any diff look significant
Implement and benchmark and first

Allow "--overwrite" option

First, thanks for the very useable and documented package. This is just a feature request/enhancement.

It would be great to have a "--overwrite" option to allow lofreq to overwrite existing output files if the user chooses. As it is, the user must go through and delete files before rerunning lofreq.

Add info about indel processing steps to website

installing optional tools

I was able to download the binary. However, I wanted to use the optional tools as well. It looks like the only way to get them is by downloading the source. I did that. I copied them to a location in my $PATH. However, when I run lofreq vcfplot or lofreq2_vcfplot.py, I get:

FATAL: Couldn't find LoFreq modules. Are you sure your PYTHONPATH is set correctly?

I also tried navigating to lofreq-2.1.2/src/tools and running python setup.py install, but then I get an error:

Traceback (most recent call last):
  File "setup.py", line 8, in <module>
    import setup_conf
ImportError: No module named setup_conf

How do I get setup_conf?

Is there a better way to install optional tools?

Support for FORMAT and Sample fields in LoFreq Somatic

Simply add AF and possibly number of orphans for now. Take from Info in case of tumor and compute in within uniq in case of normal

Unable to call SNVs at positions where >50% of bases get filtered

This was introduced as a hack to avoid calling SNVs on consensus indels (which might not be visible due to missing indel qualities). See call_vars() in lofreq_call.c. One way of dealing with this might be to enforce indel qualities (would avoid trouble with "missing" indels as well).

Intially reported by John Jianwei Che

Quality scores capped at 49314

Hi Andreas,

In my analysis, the quality scores from Lofreq seem to reach a plateau at 49,314. Is this the maximum quality score a variant can have? And if so, is there a way to change this value and set it to something else?

Thank you so much.

Ashini

Filtering can use excessive amounts of memory if vcf file contains many variants

Happens because vcf file is loaded fully into memory to determine multiple testing correction threshold. Easily fixed by loading twice, once to determine thresholds and then a second time to apply them.

Initially reported by Kanika Arora (NYG)

region command flag

Hi,

The command help statements for lofreq call mentions:

Regions
-r | --region STR Region in which pileup should be generated [null]
-l | --bed FILE List of positions (chr pos) or regions (BED) [null]

How do we pass a single tuple of say, chr pos-start pos-end, to lofreq call? Does this have to be in bed format, or can this be done from the command line (without needing a file)?

Not compiling against htslib 1.2

Hi!
I just noticed that compilation fails when passing a new version of htslib.

The error is (shortened)

bam_md_ext.c:54: error: conflicting types for ‘seq_nt16_int’
/home/ozagordi/tmp/htslib-1.2/htslib/hts.h:186: note: previous declaration of     ‘seq_nt16_int’ was here
make[1]: *** [bam_md_ext.o] Error 1
make[1]: Leaving directory `/home/ozagordi/tmp/lofreq/src/lofreq'

I managed to compile successfully by following the steps in .travis.yml

Thanks for lofreq!

Support for Python3

...for all scripts in ./src/scripts and also helpers. Should still be able to use 2.7

Additional way to avoid alignment issues, esp. in noisy data

See FPs in Pacbio data on or next to true (consensus) indels or SNVs: resp. other type is wrongly called on same or neighboring site. This could also happen in pooled data with non-consensus variants. Can't simply disallow multiallelic calls, which is a useful feature in other contexts (e.g. viral quasispecies). Easiest but hacky approach is to remove one of the two overlapping/neighbouring variants as post filter and keep the one with higher Q or AF.

csb5 / lofreq Goto Github PK

lofreq's People

Contributors

Stargazers

Watchers

Forkers

lofreq's Issues

fileDate=20160713

reference=/home/proj/MDW_genomics/xu/galgal5/galgal5.fa

INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">

INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">

INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position">

INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">

INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">

INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">

INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">

FILTER=<ID=min_dp_6,Description="Minimum Coverage 6">

FILTER=<ID=max_dp_100000,Description="Maximum Coverage 100000">

FILTER=<ID=sb_fdr,Description="Strand-Bias Multiple Testing Correction: fdr corr. pvalue > 0.001000">

FILTER=<ID=indelqual_bonf,Description="Indel Quality Multiple Testing Correction: bonf corr. pvalue < 0.010000">

INFO=<ID=UNIQ,Number=0,Type=Flag,Description="Unique, i.e. not detectable in paired sample">

INFO=<ID=UQ,Number=1,Type=Integer,Description="Phred-scaled uniq score at this position">

FILTER=<ID=uq_fdr,Description="Uniq Multiple Testing Correction: fdr corr. pvalue < 0.000100">

CHROM POS ID REF ALT QUAL FILTER INFO

Recommend Projects

Recommend Topics

Recommend Org