pezmaster31 / bamtools Goto Github PK
View Code? Open in Web Editor NEWC++ API & command-line toolkit for working with BAM data
License: MIT License
C++ API & command-line toolkit for working with BAM data
License: MIT License
I need to get the list of all reads at a long list of genomic locations.
This requires me to call BamMultiReader::Rewind() multiple times. Hence I need the method to run as fast as possible.
Is there any way I can speed this up?
Dear Derek:
I used bamtool split function and have two comments:
I guess it could be faster if the 'bamtools filters -IsMapped true' can be avoided in the processing step. May I recommend to tolerate unmapped read in the split function and output the unmapped reads to a separated file as stub.unmapped.bam, if possible.
Thanks a lot.
Fan
Hello,
Could you allow to post on bamtools-user at google?
Thank you in advance
Hello,
I got the following error when trying to install with cmake. What am I missing?
C:\pezmaster31-bamtools-889210e\build>cmake ..
-- Building for: NMake Makefiles
-- The C compiler identification is MSVC
-- The CXX compiler identification is MSVC
-- Check for CL compiler version
-- Check for CL compiler version - 1600
-- Check if this is a free VC compiler
-- Check if this is a free VC compiler - no
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio 10.0/V
C/bin/cl.exe
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio 10.0/V
C/bin/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio 10.0
/VC/bin/cl.exe
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio 10.0
/VC/bin/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
CMake Error at src/api/CMakeLists.txt:57 (install):
install Library TARGETS given no DESTINATION!
Thanks for the help
Hi. I use Bamtools as a library in Samscope (thanks for the excellent work, btw!), and a user sent me a BAM file which bamtools (either from inside samscope or as a stand-alone program) cannot seem to parse, and yet samtools can parse it. As a workaround, recreating the BAM file with samtools reheader command ("samtools view -H bad.bam > header.sam ; samtools reheader header.sam bad.bam > good.bam") does create a bamtools readable BAM file, so I'm wondering: what is up with this file? Is this a bamtools bug? Or is this file somehow an odd samtools-specific mutant BAM?
Here's the description from the user as to how that BAM file was created.
In order to create the bam file I run the Trinity program:
alignReads.pl
This eventually calls bowtie like so:
bowtie @argv -q -S --sam-nohead $format -v $bowtie_v -k $num_hits target $target_fa > $target_fa.pre.sam
Note the '--sam-nohead' flag. That is probably significant. The alignReads.pl program then gets rid of aligned reads then does a sort by name. Then captures the top number hits. Eventually it gets to this line:
samtools view -bt target.fa.fai -S $outfile_basename.coordSorted.sam | samtools sort - $outfile_basename.coordSorted"
Also does name sorting and bam indexing.
So there is the answer, at least to some level. Samtools is being used to generate the bam file.
The --sam-nohead thing doesn't seem like it should be a big problem, as the @sq headers are recreated with the "view -t target.fa.fai" line below, but it does seem very weird that the if the BAM file was originally created by samtools that it could be fixed by samtools. I'll try to get the user to check the samtools version in that trinity toolchain and get back to you if I get more information.
I'm using samtools version 0.1.18-dev (r982:313) to read/fix it. I've tried bamtools version 1.0.2 and 2.1.1 (from git 6708a21).
Thanks!
Hi,
I'm trying to extract tags from a BAM file:
#include <iostream>
#include <bamtools/api/BamReader.h>
using namespace std;
int main() {
BamTools::BamReader bam_reader;
if (!bam_reader.Open("test.bam")) {
cerr << "Error opening BAM file" << endl;
return 1;
}
BamTools::BamAlignment read_aln;
while ( bam_reader.GetNextAlignment(read_aln) ) {
cout << "Current read: " << read_aln.Name << endl;
uint32_t x0;
if (read_aln.GetTag("X0", x0)) {
cout << "X0: " << x0 << endl;
}
string md;
if (read_aln.GetTag("MD", md)) {
cout << "MD: " << md << endl;
}
}
return 0;
}
The BAM file I've used for testing looks like this (after running through samtools view):
@SQ SN:1 LN:249250621
@PG ID:bwa PN:bwa VN:0.5.9-r16
EAS51:1:1:1121:15866 99 1 220659190 60 101M = 220659392 303 GAAACCCCGCGTAATGAACAAGCTACTGAGAGCCCAGAGAGGTGCTCCTCCTCTCAGGCATGTTCAATCTTCAGGATTCAGAAATTGTTCTGAGTAGAAGT EEEB=FFFDFFFFDF=C?BDEDEEBE=EDBBB:C=B:@?5B,@CAEE=BEEDBEE@-=D@:???@=?BBE@B@-5?BBBBB=:@DD@@@BB@?@@A@=5=@XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:101 AS:i:0
EAS51:1:1:1121:15866 147 1 220659392 60 101M = 220659190 -303 CCAGGACAGGAAGGGCCTGTCCANCTTAATTTACAGGGCCATGCAGCAGAATAATCATGGGTCATTCTTTTAAAATGGGGCTTCCATGTTTCGAAAGGAAA 5?2@=?-@?B?9?@?72&-0)58#?>@DE@D?B5@BB5@BB5:-CB?B@BA@BBB=E=EBE?A</?A,=DEDDEEB=??::-C5=E@>>@55CEADAEEEEXT:A:U NM:i:3 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:2G15T4T77 AS:i:31
The output:
Current read: EAS51:1:1:1121:15866
Current read: EAS51:1:1:1121:15866
That is, no tags have been found and printed. Am I doing something wrong or is this bug?
Regards,
Tobias
It is not a serious problem, but Bamtools is about 20% slower than the Python version aka pysam.
Hi Derek,
What do you think about an 'isSingleton' property that could be added to the filter tool.
Here is my use case:
I have bam files containing reads that were paired in sequencing, but due to filtering/other steps in bam file processing, some reads do not have their mate in the bam file.
More specifically, the bam file contains records where QNAME is unique in the file, but flags indicating the read is paired, the state of the mate, etc. are indicated.
Picard FixMate does not change these flags even though the mate does not exist in the file (and perhaps that is correct behavior)
I have a downstream tool that requires input reads in pairs (and in fastq format in fact...), and would thus like to filter my bam for "isSingleton" : "false" and pipe to the fastq formatter.
~~
After running bamtool stats on one of my files, I see this output:
Stats for BAM file(s):
Total reads: 61808555
Mapped reads: 59523078 (96.3023%)
Forward strand: 30821048 (49.8653%)
Reverse strand: 30987507 (50.1347%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 61808555 (100%)
'Proper-pairs': 34875950 (56.4258%)
Both pairs mapped: 55349948 (89.5506%)
Read 1: 30966136
Read 2: 30842419
Singletons: 4173130 (6.7517%)
Indicating that the tool distinguishes between reads paired in sequencing (Paired-end reads: 61808555 (100%)) and singletons in the bam file (Singletons: 4173130 (6.7517%))
I am hoping your reply will be the Tutorial/Documentation is out of date and the isSingleton filter property exists, or it is easy to implement.
I'll have a poke through the code now to see how difficult it would be to implement if its not in place already.
Thanks much - great project!,
Tim
Hello,
I get the same compile error as the previous post using the latest edition. I'm running RHEL 5 2.6.18-128. gcc version is 4.1.2. Below is the output from the compile. Any guidance would be appreciated.
Thanks.
Jon
[root@hpc01 build]# make
[ 0%] Exporting SharedHeaders
[ 1%] Built target SharedHeaders
[ 1%] Exporting APIHeaders
[ 3%] Built target APIHeaders
[ 4%] Building CXX object src/api/CMakeFiles/BamTools.dir/BamAlignment.cpp.o
[ 6%] Building CXX object src/api/CMakeFiles/BamTools.dir/BamIndex.cpp.o
[ 8%] Building CXX object src/api/CMakeFiles/BamTools.dir/BamMultiReader.cpp.o
[ 9%] Building CXX object src/api/CMakeFiles/BamTools.dir/BamReader.cpp.o
[ 11%] Building CXX object src/api/CMakeFiles/BamTools.dir/BamWriter.cpp.o
[ 12%] Building CXX object src/api/CMakeFiles/BamTools.dir/BGZF.cpp.o
[ 14%] Building CXX object src/api/CMakeFiles/BamTools.dir/SamHeader.cpp.o
[ 16%] Building CXX object src/api/CMakeFiles/BamTools.dir/SamReadGroup.cpp.o
[ 17%] Building CXX object src/api/CMakeFiles/BamTools.dir/SamReadGroupDictionary.cpp.o
[ 19%] Building CXX object src/api/CMakeFiles/BamTools.dir/SamSequence.cpp.o
[ 20%] Building CXX object src/api/CMakeFiles/BamTools.dir/SamSequenceDictionary.cpp.o
[ 22%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BamHeader_p.cpp.o
[ 24%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BamMultiReader_p.cpp.o
[ 25%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BamReader_p.cpp.o
[ 27%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BamStandardIndex_p.cpp.o
[ 29%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BamToolsIndex_p.cpp.o
[ 30%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BamWriter_p.cpp.o
[ 32%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/SamFormatParser_p.cpp.o
[ 33%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/SamFormatPrinter_p.cpp.o
[ 35%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/SamHeaderValidator_p.cpp.o
/home/jgladieux/pezmaster31-bamtools-cbb397f/src/api/internal/SamHeaderVersion_p.h: In member function âvoid BamTools::Internal::SamHeaderVersion::SetVersion(const std::string&)â:
/home/jgladieux/pezmaster31-bamtools-cbb397f/src/api/internal/SamHeaderVersion_p.h:87: error: âmajorâ was not declared in this scope
/home/jgladieux/pezmaster31-bamtools-cbb397f/src/api/internal/SamHeaderVersion_p.h:97: error: âminorâ was not declared in this scope
make[2]: *** [src/api/CMakeFiles/BamTools.dir/internal/SamHeaderValidator_p.cpp.o] Error 1
make[1]: *** [src/api/CMakeFiles/BamTools.dir/all] Error 2
make: *** [all] Error 2
Dear Derek:
Although I am not 100% sure, I highly suspect that there is a bug in the split BAM function, that a read can be duplicated multiple times (> 20000) in the output BAM files when the input BAM is split by reference chromosomes (HG19). The duplication of read seems randomly occur with 1 to 5% of chances when many jobs were submitted simultaneously to a compute cluster running SGE. What I did or what might cause the problem are:
Could you please check the codes to see any possibility of a bug resulting in duplicated reads in rare situations?
Many thanks.
Fan Yang
Hello,
I use bamtools to parse reads mapped to de novo assemblies. For large genomes, the number of reference sequences (=contigs) can be very large - often in the millions. In this situation it takes a very very long time to open the bam file. I dug into this and the problem is when reading the header, after parsing the @sq record it is added to a SamSequenceDictionary. The SamSequenceDictionary::Add function checks that the sequence is not already present in the dictionary - this performs a linear search over the dictionary (Contains() function) which means the time to read the header is quadratic in the number of @sq records. This is obviously a serious problem for my use case.
I have temporarily removed the Contains() call in my build to get around this but I would be very happy to see a proper solution.
Seems to be missing a header file in github master:
In BamAlignment::GetEndPosition(bool usePadded, bool zeroBased) where zeroBased is by default true
you have the following code at the end of the function
// adjust for zero-based coordinates, if requested
if ( zeroBased ) alignEnd -= 1;
I would argue that this is incorrect and the zeroBased argument should be removed altogether. My reasoning follows:
he SAM format is providing alignment positions in terms of coordinates in the half-open interval system as apposed to indexed. Programmatically this system is more accurate for representing genomic intervals as it allows for things like insertions to differ from 1-base long reads (insertions have the same start/stop position). It also allows for the length to be computed as the stop - start position.
To transfer a half-open interval to an indexed interval, you add one to the start position. It seems the intention of the zeroBased
parameter was to emulate that by subtracting one from the end position. But consider an alignment that is 50bp long starting as position 75. It's end position should be 75+50 = 125 so you can compute it's length in genomic space as 125-75. It's half-open interval is (75,125] and it's indexed coordinates it's [76,125], but changing the end position doesn't make sense.
This bit us in the butt when drawing reads in genomic space, but I'm concerned because this parameter is set to true it's used for example in computing BAM indexes and could potentially be placing reads in the incorrect bins (if it's end position was an the end of a bin boundary).
I'd like to hear your thoughts on this. We commented the line out of our code to get the desired behavior.
BTW, great project and your code is extreamly hygentic, easy to read and maintain. We are building a genome browser (intend to make it free) and are really pounding on the code with lots of multi-threaded and network oriented use cases. But I'm happy to be able to contribute back the small things we do run into to your great project.
I am trying to get everything that overlaps with position 11 chromosome x out of my bam file.
I use bamtools-2.1.1 filter -script test.json -in bamfile -out filterbam file
the test.json is
{
"reference":"X",
"position":"11"
}
when i run it it appears to work fine when i look at fast format, bed format etc but pileup format gives with position 12?!? how do I fix this?
Thanks
Hi,
Thanks for alerting the users about the BamTools 2.x transition. I would like to make an automated script to download and install the package, however there is no tag or download available for me to freeze the source from.
Could you please make a tag in the repository denoting the "official" 2.0 release?
Thanks,
-ak
I've just tried to perform a build using a checkout of bamtools as of today and got the following error:
Linking CXX shared library ../../../lib/libbamtools.so
/usr/bin/ld: CMakeFiles/BamTools.dir/BamAlignment.cpp.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
CMakeFiles/BamTools.dir/BamAlignment.cpp.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
make[2]: *** [../lib/libbamtools.so.1.0.0] Error 1
make[1]: *** [src/api/CMakeFiles/BamTools.dir/all] Error 2
make: *** [all] Error 2
A quick check of the error on Google seems to point to an issue with GCC and linking into 64bit libraries. Version of linux is CentOS 5.4 and GCC is 4.1.2 20080704. I'm guessing -fPIC needs to be added to the CMake file but I do not have a clue where or which one
Hi,
I was running bamtools pileup with a fasta input and it was really slow. I tracked this down to fasta module which retrieved the full reference sequence each time one base was required. I mod'd this so it would calculate the offset of the base req'd and then seek and getc to retrieve just one byte. It improved the performance of pileup dramatically when using Human genome.
Kind Regards, Colin
diff --git a/src/utils/bamtools_fasta.cpp b/src/utils/bamtools_fasta.cpp
index d3ad080..f3d27fe 100644
--- a/src/utils/bamtools_fasta.cpp
+++ b/src/utils/bamtools_fasta.cpp
@@ -251,21 +251,25 @@ bool Fasta::FastaPrivate::GetBase(const int& refId, const int& position, char& b
return false;
}
- // seek to beginning of sequence data
- if ( fseeko(Stream, referenceData.Offset, SEEK_SET) != 0 ) {
- cerr << "FASTA error : could not sek in file" << endl;
- return false;
- }
-
- // retrieve sequence
- string sequence = "";
- if ( !GetNextSequence(sequence) ) {
- cerr << "FASTA error : could not retrieve base from FASTA file" << endl;
+ int64_t lines = position / referenceData.LineLength;
+ int64_t lineOffset = position % referenceData.LineLength;
+ int64_t SeekTo = referenceData.Offset + lines * referenceData.ByteLength + lineOffset;
+
+ // seek to required base
+ if ( fseeko(Stream, SeekTo, SEEK_SET) != 0 ) {
+ cerr << "FASTA error : could not seek in file" << endl;
return false;
}
-
- // set base & return success
- base = sequence.at(position);
+ base = getc(Stream);
return true;
}
Hi,
Apologies, as I'm a bit of a novice with anything linux related, but I have installed bamtools and can see the 'bamtools' in my bin and 'libbamtools...' in my lib directories, but when I try and run bamtools I get the message
bamtools: command not found
I'm not sure what I'm missing, any suggestions would be gratefully received!
-- TL;DR --
I think we could improve the experience of linking against bamtools by making #includes between headers within bamtools relative (or some other better fix that probably exists).
Mainly, when I install bamtools, I don't want to modify my environment variables or write special build code to find that library.
If you think it's worth a shot, please let me know and I'll take a crack at writing a patch.
-- Longer version --
I've run into the following problem, and it's becoming a headache...
It's important that I can install bamtools in a central location, but because of the way bamtools installs, I must add explicitly modify my environment variables to include the bamtools include and lib dirs (or I could write some special build code to include the correct paths).
This means I include bamtools in my code like this...
When she links against my headers, she's going to get build errors like,
"....api/BamAlignment.h: No such file or directory"
-- Solution? --
I took a peek at the code. Many of the header files could refer to each other with relative paths. Stripping the "api/" prefix from most of the includes would solve about 90% of the problem. "shared/foobar.h" from api/blah.h might be slightly different.
In my very humble opinion, when I install a library, I shouldn't have to modify my environment or write special build code to find it.
And, although really a minor nit, I think it's cleaner to write...
than
Hopefully this solution wouldn't disrupt your build style too much, and would make life easier for me and future developers. If I'm missing some glaring detail, please feel free to call me a fool :D
Hey Derek,
Just wanted to document this as a potential new feature; not an issue. The ability to read remote (via FTP or HTTP) BAM files would be especially cool with the new BamMultiReader functionality. Asynchronous reading is non-trivial, but perhaps you could port Heng Li's "knet" libraries that are used in samtools, or use curlpp or Boost:asio? Have you thought about this? It would also benefit Gambit, as one could quickly (and temporarily) pull in public datasets for comparisons to experimental data, etc.
Just some random, caffeinated thoughts for the day.
Aaron
Hi,
I have managed to compile the Bamtools projects with the exception of bamtools_cmd. When I try to compile this project I get eight 'error C2491' errors like this one :
error C2491: 'BamTools::FilterEngine::addFilter' : definition of dllimport function not allowed \pezmaster31-bamtools-8b4c010\src\utils\bamtools_filter_engine.h 188 bamtools_cmd
The methods that fail are addFilter, addProperty, allPropertyNames, buildDefaultRuleString, buildRuleQueue, enabledPropertyNames, filterNames and setDefaultCompareType.
I have tried to resolve this problem by adjusting the __declspec(dllimport) , but since I am not very experienced in C++ I have failed. Has anyone encountered this problem or have any ideas on how to resolve it?
Thanks
This is not so much an issue, but the description in the doxygen files for the BamAlignment attribute QueryBases is a little confusing. The description is the "'original' sequence (as reported from sequencing machine)", however it appears to be the aligned sequence, and therefore the reverse compliment of the original sequence if the 0x10 flag is set. I'm not sure if this is specific to BWAs reporting of the sequence or it's true of most alignment algorithms.
I don't see anywhere in the documentation that ftp and http file locations are supported however the following command works fine:
bamtools count -in ftp://encodeftp.cse.ucsc.edu/pipeline/hg19/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -region chr8:128746973-128755020
90772
But the following doesn't:
bamtools count -in http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqGm12878R1x75dSplicesRep1V2.bam -region chr8:128746973-128755020
I'm using cygwin under window 7.
Is HTTP access not yet fully functional? Thank you for your help (and the great tool).
I'm having issues with some publicly available encode BAM files. It appears that perhaps the bai file being created by BamTools is broken. Not sure if the BAM file is bad or there is some bug in BamTools. Here are the steps to reproduce this issue:
I downloaded the following BAM file and BAI file:
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai
Performing a count on these files gives reasonable results:
$ bamtools count -in ./og/wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -region chr8:128746973-128755020
90772
However if I create a new index file using the latest version of bamtools (May 31, 2012) then peform the same operation teh count returns 0 results:
$ mv wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai.bak
$ bamtools index -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam
$ bamtools count -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -region chr8:128746973-128755020
0
Indeed if I examine the sizes of the origina and new index files they are not the same.
4036792 May 31 11:55 wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai
4038128 May 21 10:15 wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam.bai.bak
If however, I sort and then index the BAM file (again using bamtools) the count will work.
$ bamtools sort -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2.bam -out wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam
$ bamtools index -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam
$ bamtools count -in wgEncodeCaltechRnaSeqK562R1x75dAlignsRep2V2_bt_sorted.bam -region chr8:128746973-128755020
90772
Perhaps the original bam file was sorted incorrectly resulting in an invalid bai file OR sorting the bam file fixes some irregularity with the encode bam file. Either way the BamTools indexing should at least show an error. I hope this helps.
The tools look very useful, I downloaded and hit the following compile problem however.
[ 4%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BamMultiReader_p.cpp.o
/nas02/renci/sequence_analysis/bin/pezmaster31-bamtools-f63f2fa/src/api/internal/BamMultiMerger_p.h: In member function 'virtual void BamTools::Internal::UnsortedMultiMerger::Clear()':
/nas02/renci/sequence_analysis/bin/pezmaster31-bamtools-f63f2fa/src/api/internal/BamMultiMerger_p.h:183: error: 'class std::queue<std::pair<BamTools::BamReader*, BamTools::BamAlignment*>, std::deque<std::pair<BamTools::BamReader*, BamTools::BamAlignment*>, std::allocator<std::pair<BamTools::BamReader*, BamTools::BamAlignment*> > > >' has no member named 'clear'
make[2]: *** [src/api/CMakeFiles/BamTools.dir/internal/BamMultiReader_p.cpp.o] Error 1
make[1]: *** [src/api/CMakeFiles/BamTools.dir/all] Error 2
Hi,
I was going to build the latest bamtools but I get this compile error :
[ 35%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/SamHeaderValidator_p.cpp.o
/tmp/bamtools/src/api/internal/SamHeaderVersion_p.h: In member function 'void BamTools::Internal::SamHeaderVersion::SetVersion(const std::string&)':
/tmp/bamtools/src/api/internal/SamHeaderVersion_p.h:87: error: 'major' was not declared in this scope
/tmp/bamtools/src/api/internal/SamHeaderVersion_p.h:97: error: 'minor' was not declared in this scope
make[2]: *** [src/api/CMakeFiles/BamTools.dir/internal/SamHeaderValidator_p.cpp.o] Error 1
make[1]: *** [src/api/CMakeFiles/BamTools.dir/all] Error 2
make: *** [all] Error 2
I am using Rocks Cluster 5.4 (Centos 5.x) with
gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)
g
Marco
Dear Derek,
The reader function IsIndexLoaded() does not tell whether the index file has been loaded or not, only whether the HasIndex parameter is set to true. This is clear form the function implementation:
bool BamReader::IsIndexLoaded(void) const {
return d->HasIndex();
}
In BamReader_p.cpp, I did not see a variable along the lines of bool IsIndexLoaded . Perhaps such a variable was forgotten and the function IsIndexLoaded() should return the status of this variable?
Thank you,
Matthew
Hi,
Apologies,I met some problems in using bamtools to implement random jumps with new index file.
when there is only (*.bti) index file, i input commands as follows:
bamtools random -in sortedByBamtools.bam -region chr1:0-700 -out result1000.bam -n 1000
bamtools stopped as:
bamtools random ERROR: could not load index data for all input BAM file(s)... Aborting.
thanks for telling me how to fix it.
Adding support to create bam from sam would allow the user to completely replace samtools with bamtools.
Hi,
It's my first time user bamtools so this may be a mistake on my end but I thought it'd be a good idea to post the error I am getting here.
I am trying to filter a bam file to only the genomic regions that map to chr19 but I get an error:
[node1380]35> bamtools filter -in Alignment_Post_Processing_1335.bam -out test.bam -region chr19
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check
Aborted (core dumped)
I am not sure if I am doing something wrong, I have the wrong version of bamtools or the installation wasn't successful. I am using a cloud server that I don't control and they already provided this version of bamtools installed...
Here I send info about the version I am using:
[node1380]35> bamtools --version
bamtools 1.0.2
Part of BamTools API and toolkit
Primary authors: Derek Barnett, Erik Garrison, Michael Stromberg
(c) 2009-2011 Marth Lab, Biology Dept., Boston College
Thank you for bamtools - it's awesome! Hopefully this is the right place to post this - I'm assuming issues includes potential bugs, right?
I stumbled on a minor bug when trying to merge multiple files that were sorted by queryname. It took me awhile to track what was happening, but it appears that when the constructor for the 'ByName' struct in Sort.h is called, the default constructor is not used. In other words, no default sort order is established, so when the comparator for sorting is called, it always returns false ensuring that files merged by queryname are no longer sorted. I don't think this is a compiler specific issue, but just in case, I'm using g++ (GCC) 4.4.4 20100726 (Red Hat 4.4.4-13). If you want I can send you some test bam files which produces the error. I would attach them now, but I don't see any easy way to do so.
I found that if I changed the m_order variable from a reference to just a standard Sort::Order object, that fixed it. The old / new code is shown below, along with the specific change:
Old Code in src/api/algorithms/Sort.h
struct ByName : public AlignmentSortBase {
// ctor
ByName(const Sort::Order& order = Sort::AscendingOrder)
: m_order(order)
{ }
// comparison function
bool operator()(const BamTools::BamAlignment& lhs, const BamTools::BamAlignment& rhs) {
return sort_helper(m_order, lhs.Name, rhs.Name);
}
// used by BamMultiReader internals
static inline bool UsesCharData(void) { return true; }
// data members
private:
const Sort::Order& m_order;
};
New Code:
struct ByName : public AlignmentSortBase {
// ctor
ByName(const Sort::Order& order = Sort::AscendingOrder)
: m_order(order)
{ }
// comparison function
bool operator()(const BamTools::BamAlignment& lhs, const BamTools::BamAlignment& rhs) {
return sort_helper(m_order, lhs.Name, rhs.Name);
}
// used by BamMultiReader internals
static inline bool UsesCharData(void) { return true; }
// data members
private:
const Sort::Order m_order; // CHANGED - just removed the '&' and it works
};
bamtools v0.8.xx on X86_64 Linux
I was sanity-checking all the BAM implementations against each other (Picard 1.32, samtools 0.1.8, cl-sam 0.9.5) and noticed that only bamtools was printing 0 for every ISIZE from a read-name sorted BAM file (created by samtools):
Picard 1.32
ENSDART00000000004_1016_1202_4f 99 9 34433277 99 35M * 0 187 TACCACAGGCGCCTGCATGAAGAAATCCATCGATG <<<<<<<<;<<<<<<<<<:1<<<<<<<<<<<<;<5 H0:i:1 H1:i:0 MF:i:18 AM:i:78 NM:i:0 SM:i:78 UQ:i:0
ENSDART00000000004_1016_1202_4f 147 9 34433429 99 35M * 0 -187 GAGCGGCTGTGACGAAACCCAGTGCGACAGAGAAG 8<;<<<;8<<;<<<<<7<<4<:<<<<<<<<<<<<< H0:i:1 H1:i:0 MF:i:18 AM:i:78 NM:i:0 SM:i:78 UQ:i:0
samtools 0.1.8:
ENSDART00000000004_1016_1202_4f 99 9 34433277 99 35M * 0 187 TACCACAGGCGCCTGCATGAAGAAATCCATCGATG <<<<<<<<;<<<<<<<<<:1<<<<<<<<<<<<;<5 MF:i:18 AM:i:78 SM:i:78 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
ENSDART00000000004_1016_1202_4f 147 9 34433429 99 35M * 0 -187 GAGCGGCTGTGACGAAACCCAGTGCGACAGAGAAG 8<;<<<;8<<;<<<<<7<<4<:<<<<<<<<<<<<< MF:i:18 AM:i:78 SM:i:78 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
cl-sam 0.9.5:
ENSDART00000000004_1016_1202_4f 99 9 34433277 99 35M * 0 187 TACCACAGGCGCCTGCATGAAGAAATCCATCGATG <<<<<<<<;<<<<<<<<<:1<<<<<<<<<<<<;<5 MF:i:18 AM:i:78 SM:i:78 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
ENSDART00000000004_1016_1202_4f 147 9 34433429 99 35M * 0 -187 GAGCGGCTGTGACGAAACCCAGTGCGACAGAGAAG 8<;<<<;8<<;<<<<<7<<4<:<<<<<<<<<<<<< MF:i:18 AM:i:78 SM:i:78 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
bamtools v0.8.xx:
ENSDART00000000004_1016_1202_4f 99 9 34433277 99 35M * 0 0 TACCACAGGCGCCTGCATGAAGAAATCCATCGATG <<<<<<<<;<<<<<<<<<:1<<<<<<<<<<<<;<5 MF:i:18 AM:i:78 SM:i:78 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
ENSDART00000000004_1016_1202_4f 147 9 34433429 99 35M * 0 0 GAGCGGCTGTGACGAAACCCAGTGCGACAGAGAAG 8<;<<<;8<<;<<<<<7<<4<:<<<<<<<<<<<<< MF:i:18 AM:i:78 SM:i:78 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
As you can see, bamtools prints 0 where ISIZE is either 187 or -187. It does this for all records in the file.
Not an "Issue"....
i modified the pileup engine to be a "multiple bam" pileup engine (uses a priority queue, it's cute), but then i ran into an issue where i wanted to get the "quality score" for each base in the pileup, and the PositionInAlignment is good for getting the AlignedBase ... but not the QualityString.
Essentially, I'm creating a "PositionInQuery" class member.
I think all I have to do is Subtract any "Deletions" from the "PositionInAlignment" as I pass them to get PositionInQuery... but I'm wondering if I'm working too hard for nothing.
Is that all?
Hi Derek,
I've been playing around with the pileup option in bamtools convert. In testing it with a pos-sorted BAM file from the 1000 Genomes project, I found that it exits with a std::out_of_range() error. I tested it with:
Below is the log. Any ideas?
bamtools convert -format pileup -in ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00739/alignment/HG00739.chrom20.ILLUMINA.bwa.PUR.low_coverage.20111114.bam
20 59993 N 6 ^2G^>G^>G^>G^>G^>G JG>G=I
20 59994 N 6 TTTTTT EC4@BH
20 59995 N 6 GGGGGG JGCIDJ
20 59996 N 6 AAAAAA IEDAAJ
20 59997 N 6 CCCCCC HFDB>H
20 59998 N 6 TTTTTT JIHDHK
20 59999 N 6 CCCCCC IGCFHK
20 60000 N 6 AAAAAA I8EJHI
20 60001 N 6 GGGGGG JGDJEI
20 60002 N 6 AAAAAA HGAIDJ
20 60003 N 6 TTTTTT HIAHHH
20 60004 N 6 CCCCCC JHEIDK
20 60005 N 6 CCCCCC JEEKBK
20 60006 N 6 AAAAAA GGFECJ
20 60007 N 6 GGGGGG LHFLEL
20 60008 N 6 AAAAAA :HFGCJ
20 60009 N 6 GGGGGG GFHEHL
20 60010 N 6 GGGGGG GGGFIJ
20 60011 N 6 TTGTTT <B%;=F
20 60012 N 6 GGGGGG F2EFHJ
20 60013 N 6 GGGGGG HFC;>K
20 60014 N 6 AAAAAA GA7A2I
20 60015 N 7 AAAAAA^]A FECA5L@
20 60016 N 7 GGGGGGG FDEGALA
20 60017 N 7 AAAAAAA E7>C;IC
20 60018 N 7 GGGGGGG GBBFAJG
20 60019 N 7 GGGGGGG JJAFHKG
20 60020 N 8 AAAAAAA^]A EH>F@HD<
20 60021 N 8 AAAAAAAA AFH?6JHC
20 60022 N 8 GGGGGGGG IIHECJHD
20 60023 N 9 GGGGGGGG^]G JFHE?HHD<
20 60024 N 9 AAAA$AAAAA ?B;EGGCCB
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::at
Abort trap
I'm not sure if this is the way you want to do this, or if you would prefer I directly modify the BamAlignment object to do this. The BAM format specification does allow for an alignment with sequence information to have no quality score information - this is indicated by an * in SAM format, and in BAM format is indicated by a char array with length equal to the sequence filled with hex 0xFF . If I try to update the quality string of a BamAlignment object by setting it to an * and use the BamWriter to write the BamAlignment object, however, the program crashes. Listed below is an update to fix this so that the BAM writer checks for * strings, and writes the appropriate information to the Bam file. Tell me if this is good, or if you feel a different solution is better. I just thought this might be simpler than requiring the user to update the quality string appropriately every time they want to do this, especially since an * is compatible for writing to a BAM file (after the necessary conversion of course). Cheers!
Old Code in 'src/api/internal/bam/BamWriter_p.cpp':
char* pBaseQualities = (char*)al.Qualities.data();
for ( size_t i = 0; i < queryLength; ++i )
pBaseQualities[i] -= 33; // FASTQ conversion
m_stream.Write(pBaseQualities, queryLength);
New Code:
char* pBaseQualities;
if (!al.Qualities.compare("*")) { // Qualitiy scores are not stored...
pBaseQualities = (char *)malloc(queryLength);
for ( size_t i =0; i < queryLength; ++i )
pBaseQualities[i] = 0xFF;
m_stream.Write(pBaseQualities, queryLength);
free(pBaseQualities);
} else {
pBaseQualities = (char*)al.Qualities.data();
for ( size_t i = 0; i < queryLength; ++i )
pBaseQualities[i] -= 33; // FASTQ conversion
m_stream.Write(pBaseQualities, queryLength);
}
Hi, dear Derek:
I am using bamtools utility programs to merge multiple files and then sort them by coordinates. I can do it in two separated steps without any problem. However, if I put them in one step as:
bamtools merge -in in_file1 -in in_file2 | bamtools sort -out out_file,
I got many error messages as:
BgzfStream ERROR: unable to open file .sort.temp.0
BamReader ERROR: Could not open BGZF stream for .sort.temp.0
BamMultiReader WARNING: Could not open .sort.temp.0, ignoring file
BgzfStream ERROR: unable to open file .sort.temp.1
BamReader ERROR: Could not open BGZF stream for .sort.temp.1
BamMultiReader WARNING: Could not open .sort.temp.1, ignoring file
....
BgzfStream ERROR: read block failed - invalid block header
BgzfStream ERROR: read block failed - invalid block header
BgzfStream ERROR: read block failed - invalid block header
BgzfStream ERROR: read block failed - invalid block header
BgzfStream ERROR: could not decompress block - zlib::inflate() failed
BgzfStream ERROR: read block failed - could not decompress block data
BgzfStream ERROR: could not decompress block - zlib::inflate() failed
BgzfStream ERROR: read block failed - could not decompress block data
Could you please check it?
Thanks.
Fan
Hi all,
It looks like the RefID integer in the BamAlignment class is always populated as = 0 by the BamReader GetNextAlignment function. This makes it pretty hard to use to the Jump function in BamReader. I'll try to track down why this output is given, but thought it would be good to confirm others have the same issue.
Thanks,
DSH
Hi,
is there a way to retrieve all unaligned sequences using the index? It should be simple to just jump beyond the last index alignment with a coordinate.
Or am I missing something and this is already available in bamtools?
Thanks!
I am getting this error:
[ 4%] Building CXX object src/api/CMakeFiles/BamTools.dir/internal/BgzfStream_p.cpp.o
/project/youngn/zhoup/Source/bamtools/src/api/internal/BgzfStream_p.cpp:20:18: error: zlib.h: No such file or directory
What can I do to tell cmake to look for zlib.h in my custom library (e.g., /usr/local/include) ? thanks a lot!
Current installation procedure installs libraries (in particular shared objects) under /usr/local/lib/bamtools. The installer doesn't update the linker information (i.e. ldconfig under Linux or install_name_tool on OS X). The easiest solution is to install the library in standard path. See the patch attached:
$ git diff
diff --git a/src/api/CMakeLists.txt b/src/api/CMakeLists.txt
index 1af7a7f..77a619c 100644
--- a/src/api/CMakeLists.txt
+++ b/src/api/CMakeLists.txt
@@ -54,8 +54,8 @@ target_link_libraries( BamTools ${APILibs} )
target_link_libraries( BamTools-static ${APILibs} )
-install( TARGETS BamTools LIBRARY DESTINATION "lib/bamtools" RUNTIME DESTINATION "bin")
-install( TARGETS BamTools-static ARCHIVE DESTINATION "lib/bamtools")
+install( TARGETS BamTools LIBRARY DESTINATION "lib" RUNTIME DESTINATION "bin")
+install( TARGETS BamTools-static ARCHIVE DESTINATION "lib")
include(../ExportHeader.cmake)
I noticed that pileup became very slow when it gets into high copy number regions.
I tracked this down to ClearOldData() and the use of the erase method to remove alignments.
The modified routine below does away with erase and instead packs the elements up and then resizes the vector. This has improved performance dramatically but may still not be the optimum solution. Replacing the vector by a linked list collection might be even faster as you could then use erase() and the alignments wouldn't need to be repacked but I didn't have time to check other repercussions of this and so didn't try it.
The modified routine is:
void PileupEngine::PileupEnginePrivate::ClearOldData(void) {
// remove any data that ends before CurrentPosition
size_t i = 0, j = 0;
while ( i < CurrentAlignments.size() ) {
// remove alignment if it ends before CurrentPosition
const int endPosition = CurrentAlignments[i].GetEndPosition();
if ( endPosition <= CurrentPosition ) {
i++;
continue;
}
if(i != j)
CurrentAlignments[j] = CurrentAlignments[i];
i++;
j++;
}
CurrentAlignments.resize(j);
}
Kind Regards, Colin
Hi,
Apologies, I'm a bit of novice with bamtools, i have installed bamtools ,can use it to sort or convert .However,when i want to use bamtools to index my bam files ,i met a problem as follows:
bamtools index -in sorted.bam
terminate called after throwing an instance of 'std::runtime_error'
what(): ILocalIODevice::Write: device not in write-only mode
It confused me .
Thanks
Comparing samtools and bamtools, I notice that the Positions in BamTools are exactly 1 smaller than in samtools.
e.g:
//samtools command line:
samtools view my_data.bam 1:100001000-100002000
//result: ---> SRR011040.2343305 35 1 100001014 99 76M = 100001137
//BamTools C++ implementation
//...
my_BAM_reader.SetRegion(0, 100001000, 0, 100002000);
for (int j=0; j< 10; ++j)
{
my_BAM_reader.GetNextAlignment(Balgmt);
}
// results:
Name: SRR011040.2343305
...
Position = 100001013 // this position was 100001014 in samtools above
MatePosition = 100001136 // this position was 100001137 in samtools above
Compiles fine on a linux box, but BamReader.cpp and BamWriter.cpp fail compilation with "'...BamAlignmentSupportData...' is private within this context" errors.
Details:
% git clone http://github.com/pezmaster31/bamtools.git
% cd bamtools
% make
Building BamTools:
mkdir -p bin
mkdir -p obj
% g++ -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5493)
On the linux box:
% g++ -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)
I was able to work-around this issue by adding '-fno-access-control' to the CXXFLAGS in the Makefile.
Hi,
I think I've found a problem writing bam files in a BigEndian architecture. The code in BamWriter_p.cpp (WriteSamHeaderText) looks as follows:
uint32_t samHeaderLen = samHeaderText.size();
if ( m_isBigEndian ) BamTools::SwapEndian_32(samHeaderLen);
m_stream.Write((char*)&samHeaderLen, Constants::BAM_SIZEOF_INT);
// write the SAM header text
if ( samHeaderLen > 0 )
m_stream.Write(samHeaderText.data(), samHeaderLen);
Notice that when "m_stream.Write" is called, "samHeaderLen" has already been swapped, and is not the right value anymore in a big endian machine. It leads to segmentation faults. It would be better to use "samHeaderText.size()" instead.
Something similar happens in WriteReferences with referenceSequenceNameLen. Maybe in other places also... I'm looking into it.
Thanks,
-David
I would really like it if the API could take a FILE * ... in case I'm reading from popen... , or if i just processed some magic etc. I have a tool that reads sam or bam and either can be gzipped... it's hard to do with the current interface.
Code here:
http://code.google.com/p/ea-utils/source/browse/trunk/clipper/sam-stats.cpp
I see that the toolkit supports piping and the API does appear to support it, but there isn't much documentation on it. Is this just a beta feature at this point?
Thanks,
DSH
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.