statgen / bamutil Goto Github PK
View Code? Open in Web Editor NEWHome Page: http://genome.sph.umich.edu/wiki/BamUtil
Home Page: http://genome.sph.umich.edu/wiki/BamUtil
Dependencies ------------ On debian type systems (including Ubuntu), add the following packages if they are not already installed (or have your admin add them if you do not have permission): sudo apt-get install g++ libssl-dev libcurses-perl zlib1g-dev Building -------- To compile, from the top level directory, type: make To test (after compiling), from the top level directory, type: make test Under the main statgen repository, there are: * lib - the library code ** The library code is compiled into libStatGen.a which, after compiling is located at: statgen/lib/libStatGen.a ** After compiling, library headers can all be found in: statgen/lib/include * src - the tools we developed ** After compling, the executables are found in: statgen/src/bin * scripts - the scripts we developed Makefiles --------- statgen/Makefile.include should contain the definitions that you need for creating software using this library. statgen/lib/Makefile.lib and statgen/lib/Makefile.src can be used as templates for creating Makefiles for new software. If possible, just link to them. They look for a file called Makefile.tool that should be all you need to update for your specific software. (both Makefiles automatically include Makefile.include) A similar setup should be used for test code, linking your Makefile to statgen/Makefile.test and defining Makefile.testtool for your specific test. Other Notes ----------- * Typically the .o files are compiled into their own directory called obj.
Hi, I am wondering if I am using the tool correctly. Will there be a record for every QNAME
in the bams being compared when comparing MAPQ
scores?
I am noticing my diff output includes only a fraction of the QNAME
entries of one of my bams being diff'd.
$ bam diff --in1 ${BAM1} --in2 ${BAM2} --mapQual >> diffs.txt
$ cat diffs.txt | grep -v "<" | grep -v ">" | sort | uniq | wc -l. # get only the read names, no diffs
7199
$ samtools view ${BAM1} | cut -f1 | sort | uniq | wc -l. # qname/read names in my input bam
46367516
I am trying to determine whether one map/align method mapped reads with greater confidence than another. Thank you for any direction, correction, or help. Thanks!
Hello,
I am trying to use the TOPMed Variant Calling Pipeline (latest: Freeze 8) to analyze human genetic data. I am following the instructions on the Readme (https://github.com/statgen/topmed_variant_calling) and have aligned my sequenced reads in BAM format (and indexed them) using bwa to the GRCh38 (full analysis set plus decoy hla) reference genome fasta. (Found here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/).
The TOPMed pipeline readme does not mention the steps of deduplicating and recalibrating the BAM files, but as mentioned in the wiki here (https://genome.sph.umich.edu/wiki/GotCloud) and my own knowledge of variant calling pipeline best practices, I believe these steps are highly recommended.
So I would like to use BamUtil to deduplicate and recalibrate my BAM files. Following the instructions here (https://genome.sph.umich.edu/wiki/BamUtil:_recab), my command is:
bamUtil/bin/bam dedup --recab --in input.bam --out input.bam.recab --force --refFile /ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla.fa --dbsnp /ref/dbsnp_142.b38.vcf.gz --oneChrom --storeQualTag OQ --maxBaseQual 40
But it fails with the following error message:
Failed to open reference genome /ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla.fa /ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla-bs.umfa: wrong type of file (expected type 460927905 but got 1096302936)
I also tried the same command as above but using a similar(?) reference genome (hs38DH.fa), which I obtained (following the instructions here: https://github.com/statgen/topmed_variant_calling) using the command:
wget ftp://share.sph.umich.edu/1000genomes/fullProject/hg38_resources/topmed_variant_calling_example_resources.tar.gz
This returned a similar error:
GenomeSequence::open: failed to open file /ref/hs38DH-bs.umfa Failed to open reference genome /ref/hs38DH.fa hs38DH-bs.umfa: wrong type of file (expected type 460927905 but got 1415074904)
I am not sure what it means by "expected type 460927905". I would greatly appreciate any help in solving this issue. Thank you!
Hello,
First of all, thanks for the bamUtil suite!
I'd like to understand the following behavior of clipOverlap
. I have a read pair marked as properly paired even if reads face outwards:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
When I clip them (command below) I see that the bottom read is clipped on the right and the top read is fully clipped. Result:
...................................................
>>>>>>>>>>>>>>>>>>>>>....................
What is the reason for doing this? I would expect one of the two reads to be clipped in the overlapping region only. (Granted such pair is odd anyway but I do find these cases in bwa output).
bam clipOverlap --in outies.sam --out out.sam --readName
Input/Output:
cat outies.sam
@SQ SN:chr7 LN:159138663
r1 99 chr7 100 255 40M = 80 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN *
r1 147 chr7 80 255 40M = 100 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN *
cat out.sam
@SQ SN:chr7 LN:159138663
r1 99 chr7 100 255 20M20S = 80 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN *
r1 147 chr7 80 255 40S = 100 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN *
Thank you!
Dario
Dear all,
I've been using bam2FastQ (latest version) and noticed that when the --gzip
option is set, even though the files are gzipped, the file names are missing the .gz
extension. I've tweaked the code to fix this (see attached).
cheers,
Hello,I meet a problem:bash -x testBam2FastQ.sh and bash -x testMergeBam.sh failed on centos8_aarch64
aarch64:
[root@localhost test]# bash -x testMergeBam.sh
+ ERROR=false
+ ../bin/bam mergeBam --out results/mergeBam.bam --list testFiles/mergeBam.list --noph
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBam.bam expected/mergeBam.bam
diff: results/mergeBam.bam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSam.sam -l testFiles/mergeSam.list --noph
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeSam.sam expected/mergeSam.sam
diff: results/mergeSam.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSam1.sam -i testFiles/sortedSam.sam -i testFiles/sortedSam1.sam --noph
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeSam1.sam expected/mergeSam1.sam
diff: results/mergeSam1.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSam2.sam -i testFiles/sortedSam1.sam -i testFiles/sortedSam.sam --noph
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeSam2.sam expected/mergeSam2.sam
diff: results/mergeSam2.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSam3.sam -i testFiles/sortedSam.sam -i testFiles/sortedSam.sam --noph
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeSam3.sam expected/mergeSam3.sam
diff: results/mergeSam3.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSam4.sam -i testFiles/sortedSam.sam -i testFiles/sortedSam2.sam --noph
+ '[' 255 -eq 0 ']'
+ '[' -e results/mergeSam4.sam ']'
+ diff results/mergeSam4.log expected/mergeSam4.log
1c1,11
< Unrecognized option nophonehome
---
> ERROR : Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2
> prev: (@RG ID:myID2 SM:sample2 LB:library2
> )
> new: (@RG ID:myID2 SM:sample24 LB:library2
> )
> Exiting due to ERROR:
> ERROR: Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2
> prev: (@RG ID:myID2 SM:sample2 LB:library2
> )
> new: (@RG ID:myID2 SM:sample24 LB:library2
> )
+ '[' 1 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSamPI.sam -i testFiles/sortedSam.sam -i testFiles/sortedSamPI.sam --ignorePI --noph
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeSamPI.sam expected/mergeSam3.sam
diff: results/mergeSamPI.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSamPI4.sam -i testFiles/sortedSamPI.sam -i testFiles/sortedSam2.sam -I --noph
+ '[' 255 -eq 0 ']'
+ '[' -e results/mergeSamPI4.sam ']'
+ diff results/mergeSamPI4.log expected/mergeSam4PI.log
1c1,11
< Unrecognized option nophonehome
---
> ERROR : Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2, even when ignoring PI
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample24 LB:library2
> )
> Exiting due to ERROR:
> ERROR: Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2, even when ignoring PI
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample24 LB:library2
> )
+ '[' 1 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSamPI1.sam -i testFiles/sortedSamPI1.sam -i testFiles/sortedSamPI.sam -I --noph
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeSamPI1.sam expected/mergeSamPI1.sam
diff: results/mergeSamPI1.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSamBeforePIfail.sam -i testFiles/sortedSamPI.sam -i testFiles/sortedSamPI2.sam -I --noph
+ '[' 255 -eq 0 ']'
+ '[' -e results/mergeSamBeforePIfail.sam ']'
+ diff results/mergeSamBeforePIfail.log expected/mergeSamPI_2.log
1c1,11
< Unrecognized option nophonehome
---
> ERROR : Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2, even when ignoring PI
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample22 PI:500 LB:library2
> )
> Exiting due to ERROR:
> ERROR: Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2, even when ignoring PI
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample22 PI:500 LB:library2
> )
+ '[' 1 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSamAfterPIfail.sam -i testFiles/sortedSamPI.sam -i testFiles/sortedSamPI3.sam -I --noph
+ '[' 255 -eq 0 ']'
+ '[' -e results/mergeSamAfterPIfail.sam ']'
+ diff results/mergeSamAfterPIfail.log expected/mergeSamPI_3.log
1c1,11
< Unrecognized option nophonehome
---
> ERROR : Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2, even when ignoring PI
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample2 PI:500 LB:library22
> )
> Exiting due to ERROR:
> ERROR: Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2, even when ignoring PI
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample2 PI:500 LB:library22
> )
+ '[' 1 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeSamPIfail.sam -i testFiles/sortedSamPI.sam -i testFiles/sortedSamPI1.sam --noph
+ '[' 255 -eq 0 ']'
+ '[' -e results/mergeSamPIfail.sam ']'
+ diff results/mergeSamPIfail.log expected/mergeSamPIfail.log
1c1,11
< Unrecognized option nophonehome
---
> ERROR : Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample2 PI:501 LB:library2
> )
> Exiting due to ERROR:
> ERROR: Failed to add readgroup to header, duplicate, but non-identical RG ID, myID2
> prev: (@RG ID:myID2 SM:sample2 PI:500 LB:library2
> )
> new: (@RG ID:myID2 SM:sample2 PI:501 LB:library2
> )
+ '[' 1 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg1List.sam -l testFiles/mergeBam.list --noph -r 1
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg1List.sam expected/mergeBamReg1.sam
diff: results/mergeBamReg1List.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg1File.sam -l testFiles/mergeBam.list --noph -R testFiles/region1.txt
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg1File.sam expected/mergeBamReg1.sam
diff: results/mergeBamReg1File.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg13List.sam -l testFiles/mergeBam.list --noph -r 1,3
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg13List.sam expected/mergeBamReg13.sam
diff: results/mergeBamReg13List.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg13File.sam -l testFiles/mergeBam.list --noph -R testFiles/region13.txt
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg13File.sam expected/mergeBamReg13.sam
diff: results/mergeBamReg13File.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg1P3List.sam -l testFiles/mergeBam.list --noph -r 1:75-1011,1:1750-1750,3
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg1P3List.sam expected/mergeBamReg1P3.sam
diff: results/mergeBamReg1P3List.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg1P3File.sam -l testFiles/mergeBam.list --noph -R testFiles/region1P3.txt
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg1P3File.sam expected/mergeBamReg1P3.sam
diff: results/mergeBamReg1P3File.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg1PList.sam -l testFiles/mergeBam.list --noph -r 1:1014
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg1PList.sam expected/mergeBamReg1P.sam
diff: results/mergeBamReg1PList.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg1PFile.sam -l testFiles/mergeBam.list --noph -R testFiles/region1P.txt
Unrecognized option nophonehome
+ '[' 255 -ne 0 ']'
+ ERROR=true
+ diff results/mergeBamReg1PFile.sam expected/mergeBamReg1P.sam
diff: results/mergeBamReg1PFile.sam: No such file or directory
+ '[' 2 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeErr.sam -l testFiles/mergeBam.list --noph -r 1:1r-5
Unrecognized option nophonehome
+ '[' 255 -eq 0 ']'
+ diff results/mergeErr1.err expected/mergeErr1.err
0a1,4
>
> FATAL ERROR -
> Error: Invalid region string, '1:1r-5', the start position, '1r', is not an integer >= 1.
>
+ '[' 1 -ne 0 ']'
+ ERROR=true
+ ../bin/bam mergeBam -o results/mergeBamReg1File.sam -l testFiles/mergeBam.list --noph -R testFiles/region1err.txt
Unrecognized option nophonehome
+ '[' 255 -eq 0 ']'
+ diff results/mergeErr2.err expected/mergeErr2.err
0a1,4
>
> FATAL ERROR -
> Error: Invalid region string, '1:4-5k', the end position, '5k', is not an integer >= 1.
>
+ '[' 1 -ne 0 ']'
+ ERROR=true
+ true == true
+ echo 'Failed testMergeBam.sh'
Failed testMergeBam.sh
+ exit 1
However, the two test files on ccentos8_x86_64 have passed. Why is this the case? Is aarch64 not supported?
I sometimes bump into this problem with bam diff tool, it seems the comparison does not begin. unfortunately i couldn't upload bam/sam files here.
Hi, I am a newbie in Github and analyzing NGS data using command line, I hope I am writing this question in the right place. i am trying to run clipoverlap on some bam files. I get an error:
FAIL_IO: Failed to Open "THE BAM FILE NAME".
The directory if right and I can not figure out why I am getting this error.
Any chance of adding the option to bam2FastQ to generate an interleaved FASTQ file for paired-end reads? Currently it defaults to 2 FASTQ files. Some tools (e.g. bwa) will handle this form of data, e.g. the "-p" option for the "bwa mem" command (http://bio-bwa.sourceforge.net/bwa.shtml). It also makes it a lot easier to pipe the output from bam2FastQ directly into bwa without writing intermediate files. If it's not a feature that's currently planned, I'd be willing to look into what would be required to create a patch. Thanks!
I am trying to generate 2 bam files that are the differences between 2 bams using bamutil diff.
http://genome.sph.umich.edu/wiki/BamUtil:_diff#Usage
After trying it for a while, I find it's exaclty what I need, but it's taken a long time (more than 10 hours and running) to compare two human 30x builds:
~/src/bamUtil/bin/bam diff --mapQual --onlyDiffs --recPoolSize 100000 --posDiff 100000 --in1 $bam1 --in2 $bam2 --out onlyDiffs.bam
Any ideas what I can do to speed it up?
Hi there,
I was trying to install bamUtil.
However, it shows the following error:
clang: error: the clang compiler does not support -pg option on versions of OS X 10.9 and later
make[1]: *** [../bin/profile/bam] Error 1
make: *** [src] Error 2
I wonder what should I do?
Thank you!
This is super low priority but the code here assumes that the first read is the forward read and the second read is the reverse. Everywhere that this function is called ensures this but any new code that calls this function may not. This also assumes that the reverse read does not extend past the beginning of the forward read. This is dependent on the sequencing technique and may not be future proof.
Build instructions on http://genome.sph.umich.edu/wiki/BamUtil vs GitHub are different.
While compiling bamUtil v1.0.9 using the following command:
make USER_COMPILE_VARS=-mmacosx-version-min=10.8
I ran into this error:
g++ -O4 -pipe -Wall -I../../libStatGen/include -I. -D__ZLIB_AVAILABLE__ -D_FILE_OFFSET_BITS=64 -D__STDC_LIMIT_MACROS -mmacosx-version-min=10.8 -o ../obj/BamExecutable.o -c BamExecutable.cpp -DVERSION="\"1.0.9\""
BamExecutable.cpp: In static member function ‘static void BamExecutable::bamVersion()’:
BamExecutable.cpp:41: error: ‘DATE’ was not declared in this scope
BamExecutable.cpp:41: error: ‘USER’ was not declared in this scope
Using this command to compile allowed me to proceed:
make "USER_COMPILE_VARS=-mmacosx-version-min=10.8 -DUSER=\"\\\"lfo\\\"\" -DDATE=20150215"
Hi there,
I am encountering an error while building this repository using the following codes:
git clone git://github.com/statgen/bamUtil.git
cd bamUtil
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
module load gcc/v6.3.0
make cloneLib
make
## here comes the error
...
VcfRecord.cpp: In member function 'bool VcfRecord::read(InputFile*, bool, VcfRecordDiscardRules&, VcfSubsetSamples*)':
VcfRecord.cpp:160: error: call of overloaded 'to_string(int&)' is ambiguous
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_string.h:2604: note: candidates are: std::string std::to_string(long long int)
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_string.h:2610: note: std::string std::to_string(long long unsigned int)
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_string.h:2616: note: std::string std::to_string(long double)
VcfRecord.cpp: In member function 'bool VcfRecord::write(InputFile*, bool)':
VcfRecord.cpp:199: error: call of overloaded 'to_string(int&)' is ambiguous
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_string.h:2604: note: candidates are: std::string std::to_string(long long int)
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_string.h:2610: note: std::string std::to_string(long long unsigned int)
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/basic_string.h:2616: note: std::string std::to_string(long double)
make[3]: *** [obj/VcfRecord.o] Error 1
make[2]: *** [vcf] Error 2
make[1]: *** [../../libStatGen/libStatGen.a] Error 2
make[1]: Leaving directory `/proj/data9/samuel/modules/bamUtil/src'
make: *** [src] Error 2
I am working on a HPC cluster that uses linux
Any suggestion to avoid the ambiguities?
Hi,
I would like to compare two bam files ignoring the read ids: if two reads contain the same sequence string but different read ids, I would like bam diff to consider them the same, and skip them in the _only1 and _only2 files when running bam diff --in1 bam1.bam --in2 bam2.bam --out my.bam
I have modified the code below, but I would like to know if this should be enough or should be combined with specific parameters when running bam diff.
Thanks in advance.
cd /home/avilella/src/bamUtil/src
bool Diff::matchingRecs(SamRecord* rec1, SamRecord* rec2)
{
if((rec1 == NULL) || (rec2 == NULL))
{
// one or both of the records is NULL, so return false.
return(false);
}
// // Have 2 records, so compare them.
// if((SamFlag::getFragmentType(rec1->getFlag()) ==
// SamFlag::getFragmentType(rec2->getFlag())) &&
// (strcmp(rec1->getReadName(),
// rec2->getReadName()) == 0))
// Have 2 records, so compare them.
if(SamFlag::getFragmentType(rec1->getFlag()) ==
SamFlag::getFragmentType(rec2->getFlag()))
{
// Same fragment and read name.
return(true);
}
return(false);
}
I want to use bam dedup in a pipe to save disk space and accesses:
$ cat input.sbam | bam dedup --in - --out - --log - > output1.sam
terminate called after throwing an instance of 'std::runtime_error'
what(): FAIL_PARSE: Too few columns (1) in the Record, expected at least 11.
Aborted
The same happens when I use /dev/stdin instead of '-'.
Furthermore, the output of the above command is sam formatted and in unsorted order. If I compare the bam formatted and sorted output of the previous command with the output of the command using an output file instead of stdout, I get more or less the same results:
$ bam dedup --in input.sbam --out output2.sbam --log -
$ samtools view -S -b output1.sam | samtools sort - output1.sbam
$ md5sum <(samtools view outpu1.bam | cut -f -11) <(samtools view output2.sbam | cut -f -11)
f026615da44eee232c9e560635dba161 /dev/fd/63
f026615da44eee232c9e560635dba161 /dev/fd/62
The cut -f 11 is to remove the attributes column because the two invocations produce different attribute order.
It would be desirable if output to standard out would produce sorted bam.
Philip
Hi,
I have two BAM files that I'd like to compare. Each is about 5.6GB. I expect them to be identical (I'm sort of doing a reproducibility test).
When I ran with the following command:
bam diff --in1 a.bam --in2 b.bam --all --onlyDiffs --recPoolSize -1 --out c.bam
It generated three files:
-rw-r--r-- 1 2373574 Jun 13 15:12 c.bam
-rw-r--r-- 1 1803478 Jun 13 15:12 c_only1_a.bam
-rw-r--r-- 1 1803105 Jun 13 15:12 c_only2_b.bam
I tried to see what actually differs between the two, but I think they look identical. My suspicion is maybe something to do with the muti-mapped reads. Do you have any idea how to resolve this?
samtools view c.bam | head -n1
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792 0 1 629088 1 46M1I44M * 0 0 GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT ,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF AS:i:78 HI:i:4 NH:i:4 nM:i:3 ZC:Z:42M1I48M ZT:Z:AS:i:78;HI:i:3;NH:i:4;nM:i:3
$ samtools view a.bam | grep -F ":TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792"
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792 0 1 629088 1 42M1I48M * 0 0 GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT ,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF NH:i:4 HI:i:3 AS:i:78 nM:i:3
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792 0 1 629088 1 46M1I44M * 0 0 GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT ,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF NH:i:4 HI:i:4 AS:i:78 nM:i:3
$ samtools view b.bam | grep -F ":TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792"
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792 0 1 629088 1 42M1I48M * 0 0 GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT ,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF NH:i:4 HI:i:3 AS:i:78 nM:i:3
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792 0 1 629088 1 46M1I44M * 0 0 GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT ,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF NH:i:4 HI:i:4 AS:i:78 nM:i:3
Hello,
I am working on a HPC cluster that uses linux x86_64. I am encountering an error while building this repository using the following commands:
git clone https://github.com/statgen/bamUtil.git
cd bamUtil
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
module load dev/gcc/8.2
module load dev/PGI-compilers/17.5
make cloneLib
make ## here comes the error
[..]
g++ -std=c++0x -I/usr/local/packages/dev/gcc/8.2.0/include -I../../libStatGen/include -I. -D__ZLIB_AVAILABLE__ -D_FILE_OFFSET_BITS=64 -D__STDC_LIMIT_MACROS -DDATE=""Wed 5 Dec 17:31:32 GMT 2018"" -DVERSION=""1.0.14"" -DUSER=""bo1fra"" -o ../obj/BamExecutable.o -c BamExecutable.cpp -DVERSION=""1.0.14""
mkdir -p ../bin
g++ -std=c++0x -I/usr/local/packages/dev/gcc/8.2.0/include -I../../libStatGen/include -I. -D__ZLIB_AVAILABLE__ -D_FILE_OFFSET_BITS=64 -D__STDC_LIMIT_MACROS -DDATE=""Wed 5 Dec 17:31:32 GMT 2018"" -DVERSION=""1.0.14"" -DUSER=""bo1fra"" -o ../bin/bam ../obj/BamExecutable.o ../obj/Validate.o ../obj/Convert.o ../obj/Diff.o ../obj/DumpHeader.o ../obj/SplitChromosome.o ../obj/WriteRegion.o ../obj/DumpIndex.o ../obj/ReadIndexedBam.o ../obj/DumpRefInfo.o ../obj/Filter.o ../obj/ReadReference.o ../obj/Revert.o ../obj/Squeeze.o ../obj/FindCigars.o ../obj/Stats.o ../obj/PileupElementBaseQCStats.o ../obj/ClipOverlap.o ../obj/MateMapByCoord.o ../obj/SplitBam.o ../obj/TrimBam.o ../obj/MergeBam.o ../obj/PolishBam.o ../obj/GapInfo.o ../obj/Logger.o ../obj/Bam2FastQ.o ../obj/Dedup.o ../obj/Dedup_LowMem.o ../obj/Prediction.o ../obj/LogisticRegression.o ../obj/MathCholesky.o ../obj/HashErrorModel.o ../obj/Recab.o ../obj/OverlapHandler.o ../obj/OverlapClipLowerBaseQual.o ../obj/ExplainFlags.o ../obj/Main.o ../../libStatGen/libStatGen.a -lm -lz
../obj/Validate.o: In function operator<<(std::ostream&, SamValidationErrors const&)': Validate.cpp:(.text._ZlsRSoRK19SamValidationErrors[_ZlsRSoRK19SamValidationErrors]+0x4d): undefined reference to
SamValidationErrors::getErrorString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&) const'
../obj/DumpHeader.o: In function DumpHeader::dumpHeader(char const*)': DumpHeader.cpp:(.text+0x21f): undefined reference to
SamFileHeader::getHeaderString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&) const'
../obj/WriteRegion.o: In function WriteRegion::execute(int, char**)': WriteRegion.cpp:(.text+0xe11): undefined reference to
InputFile::readTilChar(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
../obj/ReadReference.o: In function ReadReference::execute(int, char**)': ReadReference.cpp:(.text+0x4b3): undefined reference to
GenomeSequence::getString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, unsigned int, int) const'
../obj/MergeBam.o: In function parseOutRG(SamFileHeader&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, SamFileHeader*, bool)': MergeBam.cpp:(.text+0x28aa): undefined reference to
SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
MergeBam.cpp:(.text+0x28cc): undefined reference to SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)' MergeBam.cpp:(.text+0x2c18): undefined reference to
SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
MergeBam.cpp:(.text+0x2c3a): undefined reference to SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)' MergeBam.cpp:(.text+0x2cee): undefined reference to
SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
MergeBam.cpp:(.text+0x2d16): undefined reference to SamFileHeader::appendCommentLines(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)' ../obj/Bam2FastQ.o: In function
Bam2FastQ::writeFastQ(SamRecord&, InputFile*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, char const*)':
Bam2FastQ.cpp:(.text+0x2c0b): undefined reference to SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)' Bam2FastQ.cpp:(.text+0x2f3f): undefined reference to
BaseUtilities::reverseComplement(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
../obj/Dedup.o: In function Dedup::buildReadGroupLibraryMap(SamFileHeader&)': Dedup.cpp:(.text+0x277c): undefined reference to
SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
Dedup.cpp:(.text+0x2868): undefined reference to SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)' ../obj/Dedup_LowMem.o: In function
Dedup_LowMem::buildReadGroupLibraryMap(SamFileHeader&)':
Dedup_LowMem.cpp:(.text+0x25a6): undefined reference to SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)' Dedup_LowMem.cpp:(.text+0x2692): undefined reference to
SamHeaderRecord::appendString(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&)'
collect2: error: ld returned 1 exit status
make[1]: *** [../bin/bam] Error 1
make[1]: Leaving directory `/home/bo1fra/software/bamUtil/src'
make: *** [src] Error 2
Any suggestion?
Thank you in advance
BamUtilLibStatGen.1.0.7.tgz
Compiling this on my host (Gentoo Linux) using the toplevel makefile results in a series of errors. I made it compiling by changing the CFLAGS and CXXFLAGS.
(1) The UINT16_MAX and similar variables were not available by default in my compiler (GCC 4.6.3). I had to add -D__STDC_LIMIT_MACROS to the C[XX]FLAGS.
(2) The declaration of BgzfFileType in libStatGen require -D__ZLIB_AVAILABLE__, otherwise one of the files in the bamUtil directory fails.
(3) I had to add the /general and /include directories to the C[XX]FLAGS, because the defaults in the Makefile didn't work already when compiling libStatGen.
(4) Finally, when compiling BamExecutable.cpp somehow the DATE and USER variable were not defined.
BamExecutable.cpp: In static member function 'static void BamExecutable::bamVersion()':
BamExecutable.cpp:41:33: error: 'DATE' was not declared in this scope
BamExecutable.cpp:41:50: error: 'USER' was not declared in this scope
So I had to add -DDATE='"$(date -I)"' -DUSER='"$(whoami)"'"
I hope that helps.
Philip
Hello there and thanks for bamUtil.
I have an enhancement request: when using bamUtil for massaging of BAM files one real bottleneck seems to be the recalibration step (bam recab
), because it cannot be run on multiple cores. At least conceptually it looks like it should be straightforward to collect the required information per region (running the analysis in parallel for many regions) and then aggregate at the end to give the final table. Applying that table can be a single core process of course.
Are there any plans to implement something along these lines? Or is something similar already implemented somewhere and I missed it in the documentation?
Thanks
Andreas
Hello, I am facing a problem with indexing the clipped bam files using samtools. I am getting errors like bellow. I don't have the same problem when indexing the original bam files (Unclipped). I am not sure if I should ignore this or this is something important and I should try to solve it. Any idea what I should do to solve this problem?
read 'HS24_57:7:2110:1654:80872' mapped to '5' at POS 180043777 to 180043776 has BIN 15670 but should be 1958
In total, there are 9 reads with incorrect BIN fields
Fix this by converting BAM->SAM->BAM to force BIN recalculation
This warning is produced by validate
even if the SEQ is set to *
. The SAM spec explicitly states that it is ok for the SEQ to be *
and that the lengths don't have to match:
If not a ‘*’, the length of the sequence must equal the sum of lengths of M/I/S/=/X operations in CIGAR
It even states that SEQ should be *
for secondary alignments:
SEQ and QUAL of secondary alignments should be set to ‘*’ to reduce the file size.
Or am I misunderstanding something?
Thanks!
Hi, first of all, thanks for making and maintaining bamUtil and it has been very helpful for my research.
For some reason, on the releases page, 1.0.13 is still labeled as the "Latest release", although 1.0.14 is available and this seems confusing for some people. Could you please update the latest release label to 1.0.14? Thank you!
I'm working on a Linux based HPC working with private data hence the environment is restrictive (no internet connection); therefore packages must be built from source.
After downloading the file and transferring it to the cluster I try to build by specifying the PATH for libStatGen
but I get an error:
$ wget https://github.com/statgen/bamUtil/archive/v1.0.14.tar.gz
$ tar -zxvf v1.0.14.tar.gz
$ cd bamUtil-1.0.14
$ make LIB_PATH_GENERAL=/project/M-mtgraovac182840/tools/libStatGen
make[1]: Entering directory '/project/M-mtgraovac182840/tools/bamUtil-1.0.14/src'
make -C /project/M-mtgraovac182840/tools/libStatGen --no-print-directory opt
ar -cru ../libStatGen.a obj/bgzf.o obj/knetfile.o
g++ -O4 -pipe -Wall -Werror -Wno-strict-overflow -I../include -I. -D__ZLIB_AVAILABLE__ -D_FILE_OFFSET_BITS=64 -D__STDC_LIMIT_MACROS -o obj/Chromosome.o -c Chromosome.cpp -DVERSION="\"1.0.0\""
Chromosome.cpp: In constructor 'Chromosome::Chromosome(const string&, unsigned int, bool)':
Chromosome.cpp:42:15: error: '*<unknown>.Chromosome::gs' is used uninitialized in this function [-Werror=uninitialized]
if (this->gs) delete gs;
~~~~~~^~
cc1plus: all warnings being treated as errors
make[3]: *** [../Makefiles/Makefile.common:81: obj/Chromosome.o] Error 1
make[2]: *** [Makefiles/Makefile.base:15: general] Error 2
make[1]: *** [/project/M-mtgraovac182840/tools/libStatGen/Makefiles/Makefile.ext:50: /project/M-mtgraovac182840/tools/libStatGen/libStatGen.a] Error 2
make[1]: Leaving directory '/project/M-mtgraovac182840/tools/bamUtil-1.0.14/src'
make: *** [/project/M-mtgraovac182840/tools/libStatGen/Makefiles/Makefile.base:15: src] Error 2
In a more recent update, the primary way we use clipoverlap was broken by --readname enforcing alphanumeric sorting of readnames. BWA's raw SAM output is readname sorted in that readpairs (with of course the same name) are next to each other in the order read from the fastq files but not in alphanumeric sorted order.
The issue is order of operations and how you pipe commands together. As it stands with the latest versions, we have to align, then coordinate sort, then clipoverlap and mark completely clipped reads as unmapped, with then essentially ruins the proper coordinate sorted order. What we'd like to do, and what make the most sense computationally, would be to align, pipe that raw output in to clipoverlap while marking completely clipped reads as unmapped, then coordinate sort and index.
Would it be possible for the --readname option to drop its alphanumeric enforcement and go back to just readpairs must be adjacent to each other? Or add a --bwasam option for streaming straight from BWA?
Thanks!
Ben
HashErrorModel.cpp: In member function ‘uint8_t HashErrorModel::getQemp(BaseData&)’:
HashErrorModel.cpp:96: error: ‘class std::unordered_map<long unsigned int, HashErrorModel::SMatches, std::hash, std::equal_to, std::allocator<std::pair<const long unsigned int, HashErrorModel::SMatches> > >’ has no member named ‘at’
make[2]: *** [../obj/HashErrorModel.o] Error 1
make[2]: Leaving directory /auto/cmb-01/fs3/lxia/setup/bamUtil_1.0.7/bamUtil/src' make[1]: *** [src] Error 2 make[1]: Leaving directory
/auto/cmb-01/fs3/lxia/setup/bamUtil_1.0.7/bamUtil'
make: *** [bamUtil/] Error 2
Looks like my stdlibc++ is old. But I cann't change it c'z I am not root. Could it be fixed from source code side to use more backward compatible expressions?
Thanks
samtools view ... | bam trimBam -.ubam -.ubam --clip -R
produces this error:
bam: invalid option -- '.'
ERROR: Unrecognized option left
works ok with
samtools view ... |bam trimBam - - --clip -R
does work though
looks like a getopt problem to me.
did as in README (git)
These files provide some programs for working on SAM/BAM files.--------------------------------------------------------------------------------
To use git to clone the required statgen library:
make cloneLib
Next, to build libStatGen & this program:
make
many lines
......
then
ar -cru ../libStatGen.a obj/GlfException.o obj/GlfFile.o obj/GlfHeader.o obj/GlfRecord.o obj/GlfRefSection.o obj/GlfStatus.o
/opt/local/bin/ranlib: file: ../libStatGen.a(LongHash.o) has no symbols
/opt/local/bin/ranlib: file: ../libStatGen.a(TrimSequence.o) has no symbols
/opt/local/bin/ranlib: file: ../libStatGen.a(WindowsHelper.o) has no symbols
/opt/local/bin/ranlib: file: ../libStatGen.a(SamFlag.o) has no symbols
mkdir -p ../obj
g++ -O4 -pipe -Wall -I../../libStatGen/include -I. -D__ZLIB_AVAILABLE__ -D_FILE_OFFSET_BITS=64 -D__STDC_LIMIT_MACROS -std=c++0x -DDATE=""Sat Aug 25 12:23:40 CEST 2012"" -DVERSION=""1.0.2"" -DUSER=""splaisan"" -o ../obj/BamExecutable.o -c BamExecutable.cpp -DVERSION=""1.0.2""
cc1plus: error: unrecognized command line option "-std=c++0x"
make[1]: *** [../obj/BamExecutable.o] Error 1
make: *** [src] Error 2
Any idea what is missing here?
It worked fine on RHEL but not on my laptop
Thanks
Running make cloneLib
is broken with the following error:
Cloning into 'libStatGen'...
fatal: remote error:
The unauthenticated git protocol on port 9418 is no longer supported.
Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
According to the linked blog post:
unencrypted git:// offers no integrity or authentication, making it subject to tampering. We expect very few people are still using this protocol, especially given that you can’t push (it’s read-only on GitHub). We’ll be disabling support for this protocol.
Also in the same blog post:
January 11, 2022 | Final brownout.This is the full brownout period where we’ll temporarily stop accepting the deprecated key and signature types, ciphers, and MACs, and the unencrypted Git protocol. This will help clients discover any lingering use of older keys or old URLs.
Which explains why we're seeing the error now.
Dear bamUtils,
I'm using bam clipOverlap --stats --poolSize 100000000 --in PE.bam --out merge.bam
and get reports like the below for a handful of files. The alignments all look clean before and after, it's only the stats that don't make too much sense. Is there something I might be missing? This is for ~110bp (after cutadapt, cleaning) PE RNAseq - the imbalance between the fw/rev strands can be attributed to rRNA. The numbers I'm concerned with are the 'Average # Reference Bases Overlapped' and 'Variance of Reference Bases overlapped' as they seem abnormally high.
Overlap Statistics:
Number of overlapping pairs: 31917921
Average # Reference Bases Overlapped: 1178.49
Variance of Reference Bases overlapped: 3.93918e+07
Number of times orientation causes additional clipping: 11373
Number of times the forward strand was clipped: 22035804
Number of times the reverse strand was clipped: 9882117
WARNING: did not find expected overlapping mates for 173173 records.
Completed ClipOverlap Successfully.
When I run "make" to compile libStatGen and bamUtil, I get the following errors:
In file included from ClipOverlap.cpp:22:
In file included from ./ClipOverlap.h:27:
./MateMapByCoord.h:70:19: error: no type named 'unordered_multimap' in namespace 'std'
typedef std::unordered_multimap<uint64_t, SamRecord*> MATE_MAP;
~~~~~^
./MateMapByCoord.h:70:37: error: expected member name or ';' after declaration specifiers
typedef std::unordered_multimap<uint64_t, SamRecord*> MATE_MAP;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
./MateMapByCoord.h:73:5: error: unknown type name 'MATE_MAP'
MATE_MAP myMateBuffer;
^
2 warnings and 3 errors generated.
make[1]: *** [../obj/ClipOverlap.o] Error 1
make: *** [src] Error 2
Then, running make install INSTALLDIR=mydirectory
fails to install files into the specified directory and yields the same errors as above.
I've tried fresh installing and updating with no change.
Thanks for your help.
I'm trying to automatically compare BAM files being output by bowtie2 (for a continuous integration system). The input data is identical, but when running on 2 different machines I'm noticing sometimes 2 lines in the BAM files will have the same name/start position, but represent different reads. For whatever reason, bowtie2 sometimes reverses the order of the lines, so bamUtil's algorithm calls a mismatch.
Is there any way to get bam diff to hold off for a few lines to see if there is another record with the same name/start pos that actually matches?
I have a small MiSeq bam file (in.bam) that passes the bamutil validate --in in.bam --so_coord
check:
Number of records read = 44491
Number of valid records = 44491
TotalReads 44491.00
MappedReads 44251.00
PairedReads 44491.00
ProperPair 43249.00
DuplicateReads 0.00
QCFailureReads 0.00
MappingRate(%) 99.46
PairedReads(%) 100.00
ProperPair(%) 97.21
DupRate(%) 0.00
QCFailRate(%) 0.00
TotalBases 6645701.00
BasesInMappedReads 6609701.00
Returning: 0 (SUCCESS)
When I run with direct file input bamutil dedup --in in.bam --out del.bam --verbose
it works as expected
Writing del.bam
Successfully marked 9 unpaired and 46 paired duplicate reads
When I try piping the SAM, it detects no duplicates samtools view -h in.bam | bamutil dedup --in - --out del.bam --verbose
:
Writing del.bam
Successfully marked 0 unpaired and 0 paired duplicate reads
It throws an error when I try to pipe uncompressed bam file samtools view -hub in.bam | bamutil dedup --in -.ubam --out del.bam --verbose
:
Sorting the indices of 101 duplicated records
Exiting due to ERROR:
FAIL_ORDER: Cannot read header since the file pointer is null
My bamutil version:
Set of tools for operating on SAM/BAM files.
Version: 1.0.14; Built: Thu Apr 4 10:14:43 EDT 2019 by slazic
Is there a publication or preprint for us to cite bamUtil? If not, should we just cite the webpage (https://genome.sph.umich.edu/wiki/BamUtil)? Thanks.
Hello,
I noticed that output of Strelka for detecting variants is different when using clipped bams(using ClipOverlap) with using unmodified bams. This is an important issue because I don't know how I can trust the trimmed bams. An specific example is a deletion that is being supported by atleast 4 unpaired reads and this deletion has not been detected with Strelka using the Clipped bams. Do you know what can be the problem here?
Hi,
I would like to introduce a new mapQual Diff option, so that the behaviour is as follows:
~/src/bamUtil/bin/bam diff --mapQualDiff 5 --posDiff 0 --in1 $s1 --in2 $s2 --out out.sam
This would generate an only1 and only2 output files where the reads
that are differently aligned are printed out, including those reads
that are equally aligned but where the mapping quality score is
different by 5 or more points difference. If the difference is only
below 5 points, they will still be in the common part, and not in the
diffs.
I think everything is contained in Diff.cpp, right?
When I run
git clone [email protected]:statgen/bamUtil.git
make cloneLib
I see:
Makefile.inc:17: ../libStatGen/Makefiles/Makefile.tool: No such file or directory
Unable to locate: ./../libStatGen/Makefiles/Makefile.tool
To change the location, set LIB_PATH_GENERAL or LIB_PATH_BAM_UTIL to the appropriate path to libStatGen. Or specify "make LIB_PATH_GENERAL=yourPath" or "make LIB_PATH_BAM_UTIL=yourPath"
Use make cloneLib if you have git and want to clone the current libStatGen at that location.
Cloning into '../libStatGen'...
remote: Counting objects: 4393, done.
remote: Total 4393 (delta 0), reused 0 (delta 0), pack-reused 4393
Receiving objects: 100% (4393/4393), 4.64 MiB | 0 bytes/s, done.
Resolving deltas: 100% (2165/2165), done.
Checking connectivity... done.
Call make to compile libStatGen and this tool.
That looks a lot like an error, and it shouldn't.
The SAM spec details the type "B" array tag, which is not supported here. The error is:
parsing BAM - Unknown custom field of type FZ:B
terminate called after throwing an instance of 'std::runtime_error'
what(): FAIL_PARSE: Unknown tag type.
Aborted
Is there a license? Is this open source, no commercial restrictions?
Compilation log of bamUtil as per "https://github.com/statgen/docker-alignment/blob/master/Dockerfile" linked to by the Functional Equivalence paper "https://www.nature.com/articles/s41467-018-06159-4" is attached
It would be great if bam diff
returns an exit code (0 for identical, 1 for different) just like the diff
command does. This will allow us to do something like:
$ bam diff --in1 sample1.bam --in2 sample2.bam --all --out cmp
$ test $? -eq 0 && echo "Identical" || echo "Different"
Hi There,
I am using bamUtil stats to analyze some whole-genome sequencing data like so
curl url_for_my_large_bam.bam | ./bam stats --in -.ubam --cBaseQC results.txt
and results.txt is really large. I would love to be able to pipe it to awk to do some filtering. It seems like it would be pretty easy to add an option to write to STDOUT. Is this an option I am missing or would it be possible to add?
Thanks,
Dylan
If the compiler toolchain is in a non-standard location, the current Makefile does pick up the correct system includes from CFLAGS, but does not use the corresponding -L
from LDFLAGS.
This produces a broken binary (linked against the wrong version of libstdc++).
Hi - I'd like to use clipoverlap on really bad somatic WGS FFPE data. A lot of my reads overlap completely and nearly all have some overlap. Many have insert size <100bp. (I didn't design the experiment...)
The way I understand it - if I have 2 reads of 100bp each that overlap by 80bp (insert size 120bp) then clipoverlap will give me 1 read of 20bp and 1 of 100bp. Is that right? What I need is 2 reads of 60bp as output. So trim both reads in the middle of the overlap. is that something you would be interested in doing? Just thought I'd check before re-inventing the wheel.
Regards,Liam
Hi
The program ClipOverlap.cpp calls the header file OverlapBaseQualMap.h but the latter does not exist in the bamUtil zip package. Compiling ends up with the following error message:
ClipOverlap.cpp:29:32: error: OverlapBaseQualMap.h: No such file or directory
make[1]: *** [../obj/ClipOverlap.o] Error 1
make: *** [src] Error 2
Commenting line 29 in ClipOverlap.cpp seems to solve the compiling issue.
I assume the OverlapBaseQualMap.h is not needed anyway since it is not used in ClipOverlap.cpp. Could you please confirm?
Cheers
Bastien
make[1]: Entering directory /home/songsy/softwares/bamUtil/src' make -C ../../libStatGen --no-print-directory opt ar -cru ../libStatGen.a obj/bgzf.o obj/knetfile.o g++ -O4 -pipe -Wall -Werror -Wno-strict-overflow -I../include -I. -D__ZLIB_AVAILABLE__ -D_FILE_OFFSET_BITS=64 -D__STDC_LIMIT_MACROS -o obj/GzipHeader.o -c GzipHeader.cpp -DVERSION="\"1.0.0\"" cc1plus: warnings being treated as errors GzipHeader.cpp: In member function ‘bool GzipHeader::readHeader(UncompressedFileType&)’: GzipHeader.cpp:83: error: comparison between signed and unsigned integer expressions make[3]: *** [obj/GzipHeader.o] Error 1 make[2]: *** [general] Error 2 make[1]: *** [../../libStatGen/libStatGen.a] Error 2 make[1]: Leaving directory
/home/songsy/softwares/bamUtil/src'
make: *** [src] Error 2
I have no idea what's the problem! Thanks!!
When I run "make" to compile libStatGen and bamUtil, I get the following errors:
make[1]: *** [obj/VcfRecord.o] Error 1
make[1]: Leaving directory `/home/Software/libStatGen/vcf'
make: *** [vcf] Error 2
Thank for your help.
I tried to compile bamutil (version with libgen included -
https://genome.sph.umich.edu/w/images/7/70/BamUtilLibStatGen.1.0.13.tgz) on linux according to manual. However it didn't work because all warnings was treated as errors. I had to eddit a makefile to fix that issue:
cd bamUtil_1.0.13 \
&& sed -i '/-Werror/d' libStatGen/general/Makefile
and after that bamUtil compiled without any error. That fix was not obvious for me at all, and should be either mentioned in manual or the Makefile should be eddited
Hi, I installed this tool but it's empty with conda, can you give me some suggestions?
Hi, i recently use bamUtil to extract reads to be paried-read fastq files from bam file. They found read pairs but the generated fastq files are empty.
bam bam2FastQ --in in_file.bam --firstOut in_file_r1.fastq --secondOut in_file_r2.fastq
When it is done running, it says:
Found 80109518 read pairs.
Found 0 unpaired reads.
but the two fastq files are empty with size as 0.. Could you please help me out on it? Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.