Giter Site home page Giter Site logo

varscan's People

Contributors

dkoboldt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

varscan's Issues

False Positive SNP Calls

I have discovered that under certain conditions, a SNP is called where a single base deletion is actually present. This occurs about 30% of the time, and only for deletions, not for single base additions. An example would be the following:

$ varscan mpileup2snp A62_tmp.pileup --output-vcf 1 > A62_snp.varscan
$ varscan mpileup2indel A62_tmp.pileup --output-vcf 1 > A62_indel.varscan

Consider the following lines from the SNP and indel files, respectively:
UFC4000096 4536607 . A C . PASS ADP=752;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:45:753:752:21:13:38.24%:2.9583E-5:29:25:11:10:12:1

UFC4000096 4536606 . CA C . PASS ADP=729;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:255:729:729:24:702:96.56%:0E0:45:50:17:7:579:123

The reference sequence (with relevant bases in small case):
4536601 CCTCAcaAAAACAAACTATTAGGTTAAACAAAAGCTAAACAAATTAAGTACTGATAATTG

A typical read covering the affected bases (small case and underscore):

M01380:2107:18862:17614
AAAATGACTATCTATATAAACAAATAGCCATGTTAACATTAAGTTCTGTC
TATACAGACTATCCATAAAGAATGTTCAGATGACATACACACCATTACCG
CATATCATTTAGATATATTGATATACCTCAc_AAAACAAACTATTAGGTT
AAACAAAAGCTAAACAAATTAAGTACTGATAATTGACTTACCGGTGTACA
TCAAAAGAATCATATACATATGTAGAAGTGACAGAAGGGAGTCATTGTTA
TGCAAACTAAGCTTCTGGCAATTATGCTTGCAGCACCTGTGGTATTCAGT
TC

A cursory inspection of the stacked reads in Tablet verify that the 96.5% coverage predicted by the indel file is correct.

A bug about version 2.3.8

Exception in thread "main" java.lang.NumberFormatException: For input string: "156051AAA>E;BBEEDDEEEEBEEEDEECEDE'@cd?CEEEEEE@DEDEEEEDEDEEEEE>C@FDE:EC6CEEEEFDEEECD/EBDEEEDDEEEEECADEDDDADDB"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at net.sf.varscan.Somatic.(Somatic.java:1026)
at net.sf.varscan.VarScan.somatic(VarScan.java:298)
at net.sf.varscan.VarScan.main(VarScan.java:199)
HI ,I meet this error , what happened ???

thank you !

varscan copynumber running problem

Hi,
I am running Varscan copynumber using the default settings. But met this error message.
Normal Pileup: normal.mpileup
Tumor Pileup: tumor.mpileup
Min coverage: 10
Min avg qual: 15
P-value thresh: 0.01
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
at net.sf.varscan.Copynumber.(Copynumber.java:693)
at net.sf.varscan.VarScan.copynumber(VarScan.java:328)
at net.sf.varscan.VarScan.main(VarScan.java:209)

My command line is: java -jar ../../../software/VarScan.v2.3.9.jar copynumber normal.mpileup tumor.mpileup 007_008_glio
I have no idea what has caused the problem.

Thank you in advance.

Commas in the vcf-files_FILTER field

Hi!
This issue concerns the resulting vcf-file of fp-filter.
Could you, please, replace commas in the FILTER field (column) with ';' . The programs combining vcf-files (for example, CombineVariants(GATK)) don't process filter records separated by commas.

Thank you.
Best wishes,
Liliya

0 had sufficient coverage for comparison

I ran VarScan2 on my paired pileup files, and I do not think there are problems with the input files (please see attached a part of the input files), but I really do not understand why it gave me output like: 0 had sufficient coverage for comparison

java -jar ~/Downloads/varscan-master/VarScan.v2.4.4.jar somatic ~/normal.pileup ~/tumor.pileup --output-snp ~/snp.vcf --output-indel ~/indel.vcf --min-coverage-tumor 8 --output-vcf 1 --min-var-freq 0.075 --strand-filter 1

65821 positions in tumor
65821 positions shared in normal
0 had sufficient coverage for comparison
0 were called Reference
0 were mixed SNP-indel calls and filtered
0 were removed by the strand filter
0 were called Germline
0 were called LOH
0 were called Somatic
0 were called Unknown
0 were called Variant

normal.pileup.txt
tumor.pileup.txt

I tried other versions of VarScan2 and get the same result. Can you please help me check what caused the problem? Thanks very much for your time!

MNV or two SNV

How does Varscan (v2.4.3) determine that two consecutive SNVs are part of the same event and call the variants as a MNV?

I have multiple examples where BAM files (which have gone through differing preprocessing/alignment steps) have resulted in different descriptions of the same variants.

There are two adjacent SNVs on the same strand. In one BAM file varscan calls this as a MNV, but in another it calls it as two SNVs.
Variants are called as SNVs even with near identical MAFs (40.11% and 40.17%) and similar read counts.

I am using varscan v2.4.3 and the mpileup2cns command:
samtools mpileup -f genome/grch37.fa -B -d 500000 -q 1 input.bam | java -Xmx12030m -jar /usr/bin/VarScan.v2.4.3.jar mpileup2cns --min-coverage 10 --min-reads2 5 --min_avg_qual 15 --min-var-freq 0.01 --min-freq-for-hom 0.75 --p-value 0.05 --strand-filter 0 --output-vcf 1 --variants

Deletion uses "-" in Alt field

Varscan 2.4.2 somatic VCF indel output file uses dashes in the Alt field, which should never appear as they are considered invalid characters in that field. For example, a single deletion might show up as Ref=C, Alt=-T, when actually there should be two Ref bases given and only the T given for Alt, if I understand what is happening.

Mergesegments.pl output question

Good afternoon.

First of all, thanks for the software provided.

I have what must be a really dumb question about the output of the mergesegments.pl script, but I have not been able to find it here or in the Biostars corresponding topics.

What is the exact meaning of the seg_mean column (col. 4)? Is it an average of the CN of an specific region when comparing the tumor and the normal sample?

Thanks in advance for the confirmation!

Release tags

Hello,

can you please tag your releases?

Thanks in advance,
Oliver

VCF and tab interconversion

I wanted to separate LOF and somatic HC and germline using processSomatic, but it requires a tabular input whereas I had generated VCF files. I want those VCF files and I want VCF files with LOF separated. It would be nice if all varscan commands that read tab-separated variant files could also read VCF files, and all commands that write files could write both kinds. Also, it would be very nice if there were a "null" command that could simply read a file in one format and output it in the other.

Somatic Status assignment

image

I am using VarScan v2.3 somatic module to call snps in Normal Tumor pair through the following command:
samtools mpileup -f hg19.fa -B -C50 -q 10 -Q 15 normal.bam tumor.bam | java -jar VarScan.jar somatic NormalTumorPair.varscan --mpileup 1 --strand-filter 1

I have noticed, as the image above indicates that there are several snps that have variant_p_val =1 and somtic_p_val < 0.05 and are being called "Germline" (red streak in top left corner of graph). Is the somatic status call not based on pvalue?

varscan appears to throw out reverse reads

I am calling mutations with varscan2. The samtools mpileup I generate reports 140 reads, half of them on the reverse strand. Varscan appears to ignore these reverse reads - the AD reported is 70 and the ADF:ADR is 2:0 when according to the pileup it should be 2:2.

Redundant Processing To Get SNPs and Indels

I notice that the intermediate results of mpileup2snp and mpileup2indel are the same.

96222 variant positions (87240 SNP, 8982 indel)
5920 were failed by the strand-filter
81814 variant positions reported (81814 SNP, 0 indel)

versus

96222 variant positions (87240 SNP, 8982 indel)
5920 were failed by the strand-filter
8488 variant positions reported (0 SNP, 8488 indel)

Why not just have one function to avoid unnecessarily reprocessing the same data twice? Something such as samtools mpileup -f hg38/all.fa sample.bam | mpileup2variants --snp-vcf aSample.snp.vcf --indel-vcf aSample.indel.vcf would be efficient.

VarScan copynumber outputName

Code:
String outputName = "output";

    if(args.length >= 3 && !args[2].startsWith("-"))
    {
        outputName = args[2];
    }

if using it like that "samtools mpileup -q 1 -f ref.fa normal.bam tumor.bam | java -jar VarScan.jar copynumber $outPrefix --mpileup 1", the "outputName" will always be "output". So, args[2] should be args[1] which refers to "$outPrefix".

But if using it like that "USAGE: java -jar VarScan.jar copynumber [normal-tumor.mpileup] [Opt: output] OPTIONS\n", the args[2] is right.

And if like that "java -jar VarScan.jar copynumber normal.pileup tumor.pileup output.basename", what is going to be? Oh, this one will use the another constructor. Anyway, it's OK.

so, if the program allows the pipe from command line, I think this issue should be fixed.

VCF output for varscan somatic not working

Hi,
I am trying to run varscan 2.4.2 in somatic mode:

java -jar /opt/varscan/VarScan.v2.4.2.jar somatic \
normal.pileup tumor.pileup \
/opt/varscan/output/normal-tumor. \
--min-coverage 3 --min-var-freq 0.01 \
--p-value 0.10 --somatic-p-value 0.05 --strand-filter 0 \
--output_vcf 1

but the output is still the Varscan output.
Am I missing something?
Thanks in advance for your help

varscan report wrong DP related values, almost 50% off

Hi,

I just found a wired issue with varscan and it happened to both 2.4.4 and 2.4.3. Basically the varscan vcf files have wrong DP related values, about 50% off the values reported by IGV or Mutect2 vcf files.

Here is the varscan 2.4.4 vcf for EGFR L858R:

chr7 55259515 . T G . VarBaseQual ADP=1191;WT=0;HET=1;HOM=0;NC=0;ANN=G|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_005228.3|protein_coding|21/28|c.2573T>G|p.L858R|2819/5600|2573/3633|858/1210||,G|sequence_feature|LOW|EGFR|EGFR|helix:combinatorial_evidence_used_in_manual_assertion|NM_005228.3|protein_coding|21/28|c.2573T>G||||||,G|upstream_gene_variant|MODIFIER|EGFR-AS1|EGFR-AS1|transcript|NR_047551.1|pseudogene||n.-2873A>C|||||2873| GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:27:1191:1191:1179:12:1.01%:1.6693E-3:53:54:1013:166:11:1

Here is gatk Mutect2 vcf for the same variant:

chr7 55259515 . T G . PASS AC=1;AF=0.500;AN=2;CONTQ=93;ClippingRankSum=0.125;DP=2318;ECNT=2;FS=0.000;GERMQ=93;LikelihoodRankSum=-1.190;MBQ=20,20;MFRL=172,176;MMQ=60,60;MPOS=37;MQ=60.00;MQRankSum=0.000;POPAF=7.30;ROQ=77;ReadPosRankSum=0.086;SEQQ=28;SOR=0.615;STRANDQ=31;TLOD=6.62;UNIQ_ALT_READ_COUNT=26;ANN=G|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_005228.3|protein_coding|21/28|c.2573T>G|p.L858R|2819/5600|2573/3633|858/1210||,G|sequence_feature|LOW|EGFR|EGFR|helix:combinatorial_evidence_used_in_manual_assertion|NM_005228.3|protein_coding|21/28|c.2573T>G||||||,G|upstream_gene_variant|MODIFIER|EGFR-AS1|EGFR-AS1|transcript|NR_047551.1|pseudogene||n.-2873A>C|||||2873| GT:AD:AF:DP:F1R2:F2R1:SB 0/1:2146,26:9.246e-03:2172:1016,11:1122,14:1029,1117,13,13

When I look at the bam file in IGV, the DP numbers from IGV are almost identical to the values from Mutect2 vcf.

Any explanation?

Thanks,

Ying

Erroneous flags are passing silently.

My students working with VarScan constantly missing variatns, just providing erroneous flags (say, --min-var-frequency instead of --min-var-qreq). And it passes silently, without any warning. Can you please add this check, just to make the VarScan yell at user if it provides unexisted flags? It can save a lot of time and efforts.

Best,
Mike

a small problem of latest version of varscan2

Dear varscan2 developpers,

I am using the laterest varscan2 2.4.1, and after I use the false positive filter, I found the column names of the output have an "extra tab" between "mmqs_diff" and "ref_avg_rl" which result in some problems in downstream analyses. I am wondering could you please help fix this problem, many thanks:)

Yours,
Ying

min-mmqs-diff does not accept negative values.

Setting min-mmqs-diff as a negative value like -2, throws the following error:

Input Parameter Threw Exception: For input string: "true"
java.lang.NumberFormatException: For input string: "true"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:76)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:627)
at net.sf.varscan.FpFilter.(FpFilter.java:202)
at net.sf.varscan.VarScan.fpfilter(VarScan.java:335)
at net.sf.varscan.VarScan.main(VarScan.java:174)

Default p-value threshold

When running a calling command like mpileup2cns without specifying the --p-value argument, VarScan outputs

Warning: No p-value threshold provided, so p-values will not be calculated

It appears, though, that VarScan does calculate p-values for variants (they are in VCF output) and filters variants on a default threshold of 0.01. The warning should be changed or removed because it mistakenly suggests that variants will not be filtered on a p-value when the argument is not provided.

Also, the manual states that the default p-value threshold for calling commands is 0.99. This should be updated to 0.01 or the true default threshold should be changed to 0.99.

Suggestions for using Varscan2 on tumor only samples ?

Are there any recommendations on how to run varscan2 to find mutations on tumor only samples (i.e. without a matched normal available).
Would running it in 'germline' mode, but allowing more flexible thresholds be sufficient ?
Would applying filters like processSomatic (or somaticFilter) make sense?
Any other suggestions?
Thanks in advance

typos returning fpfilter counts

A number of lines report 'minimim' instead of 'minimum'.
this is very minor and can probably be fixed easily.
best

Loading readcounts from varscan_somatic/tumour.indel.counts...
Parsing variants from varscan_somatic/varscan_somatic_mpileup_normal-tumor.indel...
10062 variants in input file
4384 had a bam-readcount result
4318 had reads1>=2
1501 passed filters
8561 failed filters
5678 failed because no readcounts were returned
135 failed minimim variant count < 3
13 failed minimum variant freq < 0.05
0 failed minimum strandedness < 0.0
16 failed minimum reference readpos < 0.2
10 failed minimum variant readpos < 0.15
8 failed minimum reference dist3 < 0.2
10 failed minimum variant dist3 < 0.15
411 failed maximum reference MMQS > 50
172 failed maximum variant MMQS > 100
90 failed maximum MMQS diff (var - ref) > 50
260 failed maximum mapqual diff (ref - var) > 10
133 failed minimim ref mapqual < 20
478 failed minimim var mapqual < 30
0 failed minimim ref basequal < 15
0 failed minimim var basequal < 30
90 failed maximum RL diff (ref - var) > 0.05

pileup2snp output

I have a question regarding the output of the tool varscan pileup2snp. The input is the pileup file generated by samtools mpileup without reference. One particular line from the input is

1 16259813 N 80 G$g$aaGAGAaggaAGGaaGaGAgaAGaAgAAGGaAAAAAGAAaaGAaGgGgAGaGAGAgGggGAAAAgAAGgggAGGGAA^]G BBCDigfDIHEHDgBJGDHDlkHcBJEJ]J/H?JiJmJIjJJFkJG^F?<JlFIJmJDJACJIEeJCJEJCDCHHHH>F@
So there are 80 reads covering this position, there is an N, since we don't have a reference. As I understand it, there are basically two nucleotides (G and A) found in this position. In the output there are these corresponding two lines

Chrom Position ... VarAllele
1 16259813 ... A
1 16259813 ... A

This is confusing to me, since I expect a G instead of the second A. Can anybody help me understand this issue? Thx.

The output file was generated by
varscan pileup2snp pileup.tsv --min-coverage 10 --min-base-qual 30 --output-vcf 1 > output.txt

Slash is incorrect alternate allele separator

In varscan 2.4.2 somatic call VCF output file, a "/" instead of comma is used to separate multiple alternate alleles. I thought I read in the release notes that this had been fixed in 2.4.2, but the problem is still there.

No versioned source repository?

It looks like the repository is only being used to keep snapshots as JAR files. Is there anywhere I can checkout the source to see version history? For example, I want to see when default parameters changed and why.

indel missing that colsing snv

image
As the picture show, the sample have snv and indel so close. but the varscan's somatic( version 2.3.9 case control)only have the snv result. missing the indel result. can you have a look the bug?

yuting

Ambiguous allele (UIPAC) not properly handled according to VCF specs 4.3

The Human Decoy Sequences (hs37d5) prepared according to [README]
contains ambiguous IUPAC references.
In particular it has some S (C or G).

The VCF 4.3 specs prescribe that a caller outputs the first base, but not the IUPAC ambiguous base:

REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive).
[...]
If the reference sequence contains IUPAC ambiguity codes not allowed by this specification (such as R = A/G), the ambiguous reference base must be reduced to a concrete base by using the one that is first alphabetically (thus R as a reference base is converted to A in VCF.)

For example, I should not be seeing a call such as:

hs37d5  33184489        .       S       C       .       PASS    DP=33;SS=1;SSC=0;GPV=1.3852e-19;SPV=1   GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:10:0:10:100%:0,0,9,1 1/1:.:23:0:23:100%:0,0,23,0

Here the S should have been interpreted as C. Therefore there should have been no call.

This bug causes false positives and, more importantly, causes downstream tools to fail, such as igvtools

Infering population parameters from pooled samples

Hello,
I have HiSeq data from some yeast experimental populations and I would like to compare population parameters between populations (samples). For example, I would like to estimate genetic diversity, or be able to biuld a side freqeuncy spectrum. Maybe using something like Lynch et al. analyses: https://doi.org/10.1093/gbe/evu085 or Ferretti et al work: http://onlinelibrary.wiley.com/doi/10.1111/mec.12522/abstract

However, I am wondering how to normilise the samples to be able to compare between them (for instance diversity). I'm especially consern about differences in coverage between samples. I see, with VarScan I can set min number of supporting reads, freqeuncy, quality, etc, but the presition to detect for instance singlotons (or in general low freq variants) will depend on coverage.
Could someone give me some thints of that?
Thanks a lot in advance,

/Sergio Tusso

State Location of Output

The user manual doesn't explain where the output goes. Is it saved to a file or printed to the terminal?

indel immediately following SNV is not called

Hi,
I have a situation where there is an indel immediately after a SNV and the indel does not get called. This sample has been run a few times and in some cases the alignment is such that the indel preceeds the SNV (due to the sequence in this region both are equivalent), and in this case VarScan calls both the indel and the SNV fine. I have attached the relevant lines from the mpileup file where the indel is not being called.

Thank you.
Natalie

snipet.mpileup.txt

No explicit license statement any more

Hi,

the old webseite was specifying a non-profit license (Non-Profit Open Software License 3.0). I can not find any license statement on the Github pages. I formerly several times tried to seek contact about the licensing since I would like to distribute varscan in Debian main. Unfortunately the non-profit restrictions conflicts with the Debian Free Software Guidelines and makes the software non-free.

If no license is specified at all as it is now from a distribution point of view its is non-free as well since we do not have any explicite permission to distribute the code.

It would be great if you would consider a free license like GPL, BSD or similar.

Thanks for considering

 Andreas.

Missing filters in VarScan fpfilter output VCF header

Hello, there are VCF header lines from FpFilter.java file from VarScan.v2.4.4.source.jar - looks like this is fpfilter source.

String vcfHeaderInfo = "";
vcfHeaderInfo = "##FILTER=<ID=VarCount,Description=\"Fewer than " + minVarCount + " variant-supporting reads\">";
vcfHeaderInfo += "\n##FILTER=<ID=VarFreq,Description=\"Variant allele frequency below " + minVarFreq + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=VarReadPos,Description=\"Relative average read position < " + minVarReadPos + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=VarDist3,Description=\"Average distance to effective 3' end < " + minVarDist3 + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=VarMMQS,Description=\"Average mismatch quality sum for variant reads > " + maxVarMMQS + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=VarMapQual,Description=\"Average mapping quality of variant reads < " + minVarMapQual + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=VarBaseQual,Description=\"Average base quality of variant reads < " + minVarBaseQual + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=Strand,Description=\"Strand representation of variant reads < " + minStrandedness + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=RefMapQual,Description=\"Average mapping quality of reference reads < " + minRefMapQual + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=RefBaseQual,Description=\"Average base quality of reference reads < " + minRefBaseQual + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=MMQSdiff,Description=\"Mismatch quality sum difference (ref - var) > " + maxMMQSdiff + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=MapQualDiff,Description=\"Mapping quality difference (ref - var) > " + maxMapQualDiff + "\">";
vcfHeaderInfo += "\n##FILTER=<ID=ReadLenDiff,Description=\"Average supporting read length difference (ref - var) > " + maxReadLenDiff + "\">";

But there are actual filtering lines:

failReason += "RefReadPos";
failReason += "RefDist3";
failReason += "RefMapQual";
failReason += "RefMMQS";
failReason += "RefAvgRL";
failReason += "SomaticP";
failReason += "VarCount";
failReason += "VarFreq";
failReason += "VarReadPos";
failReason += "VarDist3";
failReason += "VarMMQS";
failReason += "VarMapQual";
failReason += "RefBaseQual";
failReason += "VarBaseQual";
failReason += "VarAvgRL";
failReason += "Strand";
failReason += "MaxBAQdiff";
failReason += "MMQSdiff";
failReason += "MinMMQSdiff";
failReason += "MapQualDiff";
failReason += "ReadLenDiff";
failReason = "NoReadCounts";

Looks like RefReadPos, RefDist3, RefMMQS, RefAvgRL, SomaticP, VarAvgRL, MaxBAQdiff, MinMMQSdiff and NoReadCounts are missing from header. Some VCF-processing programs want all filter to be listed in VCF-header.
And using commas as separators as described here is a problem too - better to change them to semicolon.

VCF format problem VarScan2.41.0

Hi,

I ran the following code but output was in VarScan format.

TuncBook-Pro:blackhole morova$ java -jar ~/varscan/VarScan.v2.4.1.jar somatic normal/normal.mpileup tumor/tumor.mpileup --output-vcf 1

This is my header info from the output of the following command.
chrom position ref var normal_reads1 normal_reads2 normal_var_freq normal_gt tumor_reads1 tumor_reads2 tumor_var_freq tumor_gt somatic_status variant_p_value somatic_p_value tumor_reads1_plus tumor_reads1_minus tumor_reads2_plus tumor_reads2_minus normal_reads1_plus normal_reads1_minus normal_reads2_plus normal_reads2_minus

Please help me,

Best,

Tunc

the num of +-strand diff greatly(Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev)

i use the somatic command to call mutations, the num of forward and reverse strands diff greatly, the forward is usually much more than the reverse , and the total depth is also much less than other tools(eg. vardict), nearly half of the vardict's total depth(eg. 1560-804,1272-581)

ref/fwd ref/rev var/fwd var/rev">

664 84 510 71
125 22 194 19
232 2 90 1
231 18 91 3
207 21 67 8
153 24 49 7
152 24 49 7
139 18 47 3
307 14 266 16
1068 89 1005 69
892 136 945 127
1 0 1578 292
1129 69 1096 57
796 22 702 19
2 0 1568 152
596 28 574 29
89 2 117 5
752 79 734 88
696 79 730 54
1128 80 1022 72
1183 70 1041 67
1049 63 923 56
559 106 445 86
113 54 129 48
170 3 121 1
512 49 14 0
229 2 197 2
760 67 683 74

thanks a lot for your help

sanity error when running bcftools stats on varscan data

Dear,
I ran varscan mpileup2cns (v2.4.3) on some data and when analysing the vcf with bcftools stats got an error

$ bcftools stats --fasta-ref   Ref.fasta variants.vcf.gz  > vcf.stats
Sanity check failed, the reference sequence differs: NC_030973.1:129900+2 .. G vs Y
samtools faidx R64_SEUB3.0_merged.fasta NC_030973.1:129900-129902

After inspecting this closely (samtools/bcftools#691 (comment)) @pd3 came to the conclusion that varscan is not handling IUPAC ref bases as it should and should have made CAC out of the reference CAY bases instead of the returned CAG.

Could you please confirm and maybe fix this IUPAC replacement case for the ref field?
Thanks in advance

## reference info
samtools faidx R64_SEUB3.0_merged.fasta NC_030973.1:129900-129902
NC_030973.1:129900-129902
CAY

$ tabix variants.vcf.gz NC_030973.1:129900-129902
NC_030973.1	129900	.	CAG	C	.	PASS	ADP=22;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR	0/1:49:33:22:10:13:48.15%:1.1242E-5:23:12:6:4:3:10

typos in doc

Thanks Dan for the great tool,

I just noticed what I think are typos in the accompanying doc
bash variables with $ in front and spaces at either sides of the '=' sign

Best
Stephane

http://dkoboldt.github.io/varscan/somatic-calling.html

Do NOT use any of the variant- or consensus-calling parameters. You just want the raw pileup output. This Perl snippet shows you how to pipe input from SAMtools into VarScan:
$normal_pileup = "samtools mpileup -f $reference $normal_bam";
$tumor_pileup = "samtools mpileup -f $reference $tumor_bam";

To limit the pileup to reads with mapping quality > 0 (recommended), use this variation:
$normal_pileup = "samtools mpileup -q 1 -f $reference $normal_bam";
$tumor_pileup = "samtools mpileup -q 1 -f $reference $tumor_bam";

Next, issue a system call that pipes input from these commands into VarScan :
bash -c \"java -jar VarScan.jar somatic <\($normal_pileup\) <\($tumor_pileup\) output

somaticfilter default --min-reads2 value error

Hi

I've noticed that the --min-reads2 default value in somaticfilter appears to be 4 and not 2 as stated in the manual. Thought it might be helpful to other users to let you know.

Best regards,
Georgette

varscan somatic terminates without error, sometimes with an error

When I run varscan somatic using a reference genome with a large number of contigs (>2000) the program completes without error, but does not output data in the outfile files for all the contigs. All the contigs are present in the two mpileup files.

Sometimes I see a java error in the output:

Somatic p-value: 0.05
Warning:
java.lang.NullPointerException
at net.sf.varscan.Somatic.comparePositions(Somatic.java:1669)
at net.sf.varscan.Somatic.(Somatic.java:1244)
at net.sf.varscan.VarScan.somatic(VarScan.java:281)
at net.sf.varscan.VarScan.main(VarScan.java:199)
16410533 positions in tumor

The maximum reported contigs appears to be ~170.

Thanks.

Varscan trio de novo mutation calling too slow

Hi,

I am using Varscan trio function to call de novo mutation. However, I found Varscan trio function is running too slow. It's unable to call per chromosome in parallel because each trio run takes a huge amount of virtual memory (approximately 50 GB). It would down the server if doing per chromosome in parallel.

Is there any better way to run varscan trio function?

Thanks,

Shan

Somatic input ready loop is too aggressive when I/O is slow

We're using varscan's somatic caller in a production pipeline, where both the normal and tumor inputs are provided through via bash's <(samtools mpileup ...) syntax, as recommended. samtools' own input is read over the network, and we've noticed in cases where the network is under slow, the somatic caller sometimes aborts the run, reporting "Input file(s) were not ready for parsing after 100 5-second cycles! Pileup output may be invalid or too slow." We're in the "too slow" case.

mpileup will eventually succeed (it does under better network conditions), just not in the time that the somatic caller allows. It would be helpful if this code could more carefully check for the cases of invalid cases, and otherwise allow for slow inputs, to avoid failing an entire pipeline because of temporary I/O degradation.

MinMMQSdiff

Hi,
After applying fpfilter to my somatic variant calls, some of them are filtered out with MinMMQSdiff tag. I can't find the explanatory header row regarding this filter. The closest one is this one:

##FILTER=<ID=MMQSdiff,Description="Mismatch quality sum difference (ref - var) > 50">

What does MinMMQSdiff mean? Also which fields of the bam-readcount file are used to apply this filter?

Thank you in advance,
Iran

Please use Git repository properly

Hi,
currently this Git repository just stores binary JAR files. The intention of a Git repository is rather to store source code to enable inspecting diffs. The release process of this source is than handled by release tags in the Github interface.
Thanks for considering, Andreas.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.