samtools / tabix Goto Github PK
View Code? Open in Web Editor NEWNote: tabix and bgzip binaries are now part of the HTSlib project.
Home Page: https://github.com/samtools/htslib
Note: tabix and bgzip binaries are now part of the HTSlib project.
Home Page: https://github.com/samtools/htslib
I would like to index a file in WTCCC haps format so that I can pull out regions of interest. It strikes me that bgzip and tabix would work on this if the file was tab instead of space delimited. Before I go off and replace all the spaces with tabs, I was wondering how hard it would be to implement a run time or even compile time option to bgzip and tabix that allows for other delimiters other than tab.
On http://www.htslib.org/doc/tabix.html it is indicated that the file should be position sorted.
The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface.
However in many usages I see that the files are in fact first sorted by seqname THEN position. The tabix paper also seems to indicate this https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042176/.
Before being indexed, the data file needs to be sorted first by sequence name
and then by leftmost coordinate
So does the documentation need to be updates, or has tabix been updated since to allow the seqname to be out of order?
We are having this error on CentOS 6.9 with zlib-devel 1.2.3. I noticed similar issue here samtools/samtools#493 but it is not clear on how to resolve. adding -H
flag doesn't produce any extra output. We did not have this issue compiling samtools 1.9, anyway.
[root@pac tabix]# make
make[1]: Entering directory `/usr/local/tabix'
gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE bgzf.c -o bgzf.o
gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE kstring.c -o kstring.o
gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE knetfile.c -o knetfile.o
gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE index.c -o index.o
gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE bedidx.c -o bedidx.o
ar -csru libtabix.a bgzf.o kstring.o knetfile.o index.o bedidx.o
gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE main.c -o main.o
gcc -g -Wall -O2 -fPIC -o tabix main.o -L. -ltabix -lm -lz
./libtabix.a(bedidx.o): In function `bed_read':
/usr/local/tabix/bedidx.c:103: undefined reference to `gzopen64'
collect2: ld returned 1 exit status
make[1]: *** [tabix] Error 1
make[1]: Leaving directory `/usr/local/tabix'
make: *** [all-recur] Error 1
[root@pac tabix]# gcc -g -Wall -O2 -fPIC -o tabix main.o -L. -ltabix -lm -lz -H
./libtabix.a(bedidx.o): In function `bed_read':
/usr/local/tabix/bedidx.c:103: undefined reference to `gzopen64'
collect2: ld returned 1 exit status
[root@pac tabix]#
Hi,
Say I have a bed file, file.bed
, and I want to retrieve multipule regions.
file.bed:
chr1 10468 10469
chr1 10470 10471
chr1 10483 10484
chr1 10488 10489
chr1 10492 10493
chr1 10496 10497
running tabix file.bed chr1:10468-10490 chr1:10478-10490 chr1:10496-10497
gives me all the rows that matches region in the file.bed
:
chr1 10468 10469
chr1 10470 10471
chr1 10483 10484
chr1 10488 10489
chr1 10483 10484
chr1 10488 10489
chr1 10496 10497
Can I have an option to separate between the results, i.e. to define which rows came from each region? Something like:
chr1 10468 10469
chr1 10470 10471
chr1 10483 10484
chr1 10488 10489
chr1 10483 10484
chr1 10488 10489
chr1 10496 10497
I am trying to create a custom track for the WashU EpiGenome browser (instructions here: http://washugb.blogspot.com/2012/09/generate-tabix-files-from-bigwig-files.html), so I am using a bedgraph file example posted at UCSC page: http://genome.ucsc.edu/goldenPath/help/bedgraph.html
The file looks like this:
browser position chr19:49302001-49304701
browser hide all
browser pack refGene encodeRegions
browser full altGraph
# 300 base wide bar graph, autoScale is on by default == graphing
# limits will dynamically change to always show full range of data
# in viewing window, priority = 20 positions this as the second graph
# Note, zero-relative, half-open coordinate system in use for bedGraph format
track type=bedGraph name="BedGraph Format" description="BedGraph format" visibility=full color=200,100,0 altColor=0,100,200 priority=20
chr19 49302000 49302300 -1.0
chr19 49302300 49302600 -0.75
chr19 49302600 49302900 -0.50
chr19 49302900 49303200 -0.25
chr19 49303200 49303500 0.0
chr19 49303500 49303800 0.25
chr19 49303800 49304100 0.50
chr19 49304100 49304400 0.75
chr19 49304400 49304700 1.00
I run bzip first:
bgzip input.bedgraph
and then I run tabix:
tabix -p bed input.bedgraph.gz
at which point I get these errors:
[get_intv] the following line cannot be parsed and skipped: browser position chr19:49302001-49304701
[ti_index_core] the indexes overlap or are out of bounds
If bedgraph is not the file format tabix expects, what is the input file format?
Thanks!
Hi,
There seems to be some issue with the XS bindings. I've found this to be a problem on 5.16 and 5.18, okay on 5.14 but I didn't have access to 5.15 to try that.
% perl-5.16.3 -I blib/lib -e 'use Data::Dumper;use Tabix; warn Dumper(\%INC); my $tbi = Tabix->new(-data => q{some.vcf.gz}); print qq{Generated Tabix object\n};'
Subroutine Tabix::tabix_open redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_close redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_query redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_read redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_getnames redefined at blib/lib/Tabix.pm line 17.
Subroutine TabixIterator::tabix_iter_free redefined at blib/lib/Tabix.pm line 17.
$VAR1 = {
...
'Tabix.pm' => 'blib/lib/Tabix.pm',
'TabixIterator.pm' => 'blib/lib/TabixIterator.pm',
...
};
Generated Tabix object
Regards,
Keiran
Hi,
Is there a way where I can figure out the version of tabix used for indexing a vcf.gz file?
Regards,
Prasun
v1.10 brought about the new -X option (-X include customized index file), to samtools.
samtools/samtools#978
Is it possible to request for tabix/bcftools/etc?
Use case would be for passing in signed s3 urls into all of the various tools.
e.g.
tabix <signed_vcf_url> -X <signed_vcf_tbi_url> chr2
bcftools view <signed_vcf_url> -X <signed_vcf_tbi_url> chr2
samtools view <signed_bam_url> -X <signed_bam_bai_url> chr2
What does the following error mean?
It is a merged and sorted gff3 file, containing SPADES scaffolds names.
[get_intv] the following line cannot be parsed and skipped: >NODE_1000_length_470_cov_0.860058
[ti_index_core] the indexes overlap or are out of bounds
Hi,
I looked at the code and I saw that tabix
supports fetching files from http
and ftp
, but there is no https
support. Is something like this even on the roadmap, or it's completely out of the scope.
Best,
Viktor
It's confusing for people who aren't intimately familiar with the project. i.e. is the code here packaged for download via SF? Is the SF project now obsolete? Why is there a separate tabix / bzip SF projects? etc.
Cheers,
Dan.
The htslib branch does not handle ftp://
or http://
file locations.
E.g.
./tabix -h ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz 20
Could not load .tbi index of ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.