Giter Site home page Giter Site logo

pcrduplicates's People

Contributors

vibansal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pcrduplicates's Issues

float error

Hi,

Any idea what might be going on here?

reading input file to obtain list of het variants
#clusters 1:13048 2:6432 3:5408 4:5766 5:6849 6:8297 7:9978 8:11394 9:12177 10:13160 11:12360 12:12177 13:11059 14:9861 15:8792 16:8089 17:7425 18:6889 19:6366 20:6041 21:5767 22:5697 23:5456 24:5074 25:4560 26:4120 27:3799 28:3292 29:2820 30:2589 31:2174 32:2025 33:1858 34:1652 35:1516 36:1407 37:1366 38:1185 39:1151 40:1072 41:1107 42:994 43:964 44:983 45:946 46:923 47:860 48:887 49:790 50:692 51:737 52:664 53:646 54:699 55:584 56:599 57:556 58:484 59:501 60:507 61:489 62:454 63:440 64:373 65:383 66:376 67:328 68:343 69:326 70:300 71:305 72:298 73:274 74:276 75:262 76:256 77:231 78:221 79:238 80:201 81:204 82:181 83:165 84:153 85:151 86:150 87:143 88:136 89:131 90:115 91:113 92:90 93:96 94:96 95:111 96:79 97:86 98:77 99:81 100:81 101:82 102:61 103:57 104:67 105:66 106:58 107:46 108:47 109:54 110:47 111:46 112:51 113:33 114:41 115:31 116:36 117:26 118:22 119:20 120:29 121:24 122:30 123:20 124:21 125:19 126:29 127:10 128:18 129:15 130:18 131:16 132:14 133:7 134:11 135:12 136:12 137:11 138:5 139:10 140:5 141:10 142:11 143:6 144:2 145:8 146:11 147:5 148:7 149:4 150:3 151:10 152:3 153:8 154:5 155:4 156:4 157:5 158:1 159:1 160:2 161:3 162:2 163:1 164:3 165:2 166:1 167:1 168:2 169:2

identified 12 het variants filtered 8151 high-cov 1e-07 0.0 0.0 0.0
processing reads for chrom groupI with 276 reads filteredreads 18646 [276, 9] 0.048913041706
processing reads for chrom groupIII with 118 reads filteredreads 28228 [394, 12] 0.0456852780283
clus2-singlechrom groupIII 1 0 0.0
clusters-2 00: 1 01: 0 f2= 0.0 A0: 1.0 dup-counts 1.0 1.0 1.0

processing reads for chrom groupIV with 25 reads filteredreads 23211 [419, 27] 0.0966587089103
processing reads for chrom groupV with 134 reads filteredreads 9589 [553, 29] 0.0786618430622
clus2-singlechrom groupV 1 0 0.0
clusters-2 00: 2 01: 0 f2= 0.0 A0: 0.5 dup-counts 2.0 2.0 1.0

processing reads for chrom groupVII with 186 reads filteredreads 31439 [739, 36] 0.0730717175498
clus2-singlechrom groupVII 1 0 0.0
clusters-2 00: 3 01: 0 f2= 0.0 A0: 0.666666666667 dup-counts 3.0 3.0 1.0

processing reads for chrom groupIX with 391 reads filteredreads 26748 [1130, 49] 0.065044247212
processing reads for chrom groupXII with 258 reads filteredreads 36167 [1388, 52] 0.056195965013
processing reads for chrom groupXIV with 279 reads filteredreads 25306 [1667, 68] 0.0611877620805
processing reads for chrom groupXVI with 176 reads filteredreads 25152 [1843, 71] 0.0577862178091
processing reads for chrom scaffold_581 with 261 reads filteredreads 75545 [2104, 79] 0.056321292508
clus2-singlechrom scaffold_581 1 0 0.0
clusters-2 00: 4 01: 0 f2= 0.0 A0: 0.5 dup-counts 4.0 4.0 1.0

AB_counts 1 3 0.25 0.75 1.25 0.75 0.5
dup-counts ignored 1 [5, 4.0, 4.0] 13048
dup-counts ignored 2 [4, 4.0, 4.0, 0.0] 6432
dup-counts ignored 3 [0, 0.0, 0.0, 0.0] 5408
dup-counts ignored 4 [1, 0.0, 0.0, 0.0, 0.0] 5766
dup-counts ignored 5 [2, 2.0, 2.0, 0.0, 0.0] 6849
dup-counts ignored 6 [2, 2.0, 2.0, 0.0, 0.0, 0.0] 8297
dup-counts ignored 7 [2, 1.0, 1.0, 0.0, 0.0, 0.0] 9978
dup-counts ignored 8 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 11394
dup-counts ignored 9 [2, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0] 12177
dup-counts ignored 10 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0] 13160
dup-counts ignored 11 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 12360
dup-counts ignored 12 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 12177
dup-counts ignored 13 [3, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 11059
dup-counts ignored 14 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 9861
dup-counts ignored 15 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 8792
dup-counts ignored 16 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 8089
dup-counts ignored 17 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 7425
dup-counts ignored 18 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 6889
dup-counts ignored 19 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 6366
dup-counts ignored 20 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 6041
dup-counts ignored 21 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 5767
dup-counts ignored 22 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 5697
dup-counts ignored 23 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 5456
dup-counts ignored 24 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 5074
dup-counts ignored 25 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 4560
dup-counts ignored 26 [1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 4120
dup-counts ignored 27 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 3799
dup-counts ignored 28 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 3292
dup-counts ignored 29 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 2820
dup-counts ignored 30 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 2589
dup-counts ignored 31 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 2174
dup-counts ignored 32 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 2025
dup-counts ignored 33 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1858
dup-counts ignored 34 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1652
dup-counts ignored 35 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1516
dup-counts ignored 36 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1407
dup-counts ignored 37 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1366
dup-counts ignored 38 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1185
dup-counts ignored 39 [0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1151
dup-counts ignored 40 [1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1072
dup-counts ignored 41 [1, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 1107

Traceback (most recent call last):
File "estimate_PCRduprate.py", line 416, in
[Frate,Ftotal,Funique,Funiqlist] = calculate_values(duplicate_counts,cluster_counts);
File "estimate_PCRduprate.py", line 139, in calculate_values
Lambda = [1.0,float(counts_PCR[1])/counts_PCR[0]];
ZeroDivisionError: float division by zero

Thanks,
David

Script extract_duplicates returned NaN duplication rate

I ran ./extract_duplicates to estimate PCR duplication rate on our data:
./extract_duplicates --bam SRR1585519.dedup.merged.sorted.bam --VCF NA19099.het.vcf --mmq 20 > SRR1585519.dedup.hetreads

however it returned:
PCR duplicates marked 0 total-reads 0 frac nan discarded 62152388
#clusters
total reads (PE=1) 0 unique-reads 0 duplicates:0, duplication rate nan

Using samtool rmdup and IGV, we observed a lot of duplicate reads in our data, but the script here seemed to not work on our files. What's the problem here?

Question

Hi,

I tried to estimate PCR duplicates ratio of my samples using it. I did "extract_duplicates" command using vcf file made by "samtools mpileup" using my samples.
The output files from "exact_duplicates" command are strange for me because the contents from multiple samples are almost same and "PCR duplicates marked 0 total-reads 0 frac -nan discarded 49244807" and "#clusters
total reads (PE=1) 0 unique-reads 0 duplicates:0, duplication rate -nan" were written.
My samples seems to contain many PCR duplicates (aprrox. 90% of the total) by judge of "samtools rmdup". So I don't believe the output file from "exact duplicates" (it said no PCR duplicate in my samples, right?)

So, what should I do to resolve this issue? Should I use vcf files from public databse like ENSEMBL?

Help to check the rate of PCR duplicate on RNA-Seq Experiment

It would be great if you help me with the following command,
do you agree with the following upstream step before PCRduplicate
Step1: Alignment
STAR \ --runThreadN 30 \ --genomeDir $GENOME \ --sjdbGTFfile /fdb/igenomes/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf \ --sjdbOverhang 75 \ --readFilesIn AB4315-L1_HNJM2BGXC_S1_R1_001.fastq.gz \ --readFilesCommand zcat \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix $RESULT/AB4315-L1
Step2: Post Alignment
samtools view --threads 30 -h -F 1796 -q 30 AB4315-L1.bam | \ awk 'BEGIN{OFS=FS} {if($3 != "chrUn" && $3 !~ /chrUn/ && $3 !~ /random/) print $0 }' > AB4315-L1.filtered.sam samtools sort --threads 30 AB4315-L1.filtered.sam -O bam -o AB4315-L1.filtered.sorted.bam samtools index -@ 30 AB4315-L1.filtered.sorted.bam
Step3: BCFtools
bcftools mpileup --threads 30 \ -f /fdb/igenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa \ ./AB4315-L1.filtered.sorted.bam | \ bcftools call --threads 30 --ploidy GRCh38 -mv -Ov -o ./AB4315-L1.filtered.sorted.variant.bcf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.