samtools / tabix Goto Github PK

View Code? Open in Web Editor NEW

91.0 91.0 40.0 146 KB

Note: tabix and bgzip binaries are now part of the HTSlib project.

Home Page: https://github.com/samtools/htslib

Java 6.56% C 75.59% Perl 1.96% Python 4.48% TeX 2.65% Makefile 0.83% C++ 5.27% XS 0.56% Roff 2.10%

tabix's People

Contributors

Stargazers

Watchers

Forkers

pchines gkno xied75 detrout lindenb bishoyh bumshmyak mcshane nerdstrike monkollek litswu andrewyatz kentalot nw11 wtsi-hgi sb43 mdozmorov david-a-parry picsaver antmd liyao001 dillonl minocheae crystall20150827 williamrichards2017 amgorb raonyguimaraes liserjrqlxue juzheng87 cih-y2k xjyx leornardzhou ypark dystudio jkbonfield novapyth xuefei-huai janeyang123 overerd alikurutluogl

tabix's Issues

feature request: tabix with space delimiter

I would like to index a file in WTCCC haps format so that I can pull out regions of interest. It strikes me that bgzip and tabix would work on this if the file was tab instead of space delimited. Before I go off and replace all the spaces with tabs, I was wondering how hard it would be to implement a run time or even compile time option to bgzip and tabix that allows for other delimiters other than tab.

What is the actual sorting required for tabix?

On http://www.htslib.org/doc/tabix.html it is indicated that the file should be position sorted.

The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface.

However in many usages I see that the files are in fact first sorted by seqname THEN position. The tabix paper also seems to indicate this https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042176/.

Before being indexed, the data file needs to be sorted first by sequence name
and then by leftmost coordinate

So does the documentation need to be updates, or has tabix been updated since to allow the seqname to be out of order?

undefined reference to `gzopen64'

We are having this error on CentOS 6.9 with zlib-devel 1.2.3. I noticed similar issue here samtools/samtools#493 but it is not clear on how to resolve. adding -H flag doesn't produce any extra output. We did not have this issue compiling samtools 1.9, anyway.

[root@pac tabix]# make
make[1]: Entering directory `/usr/local/tabix'
gcc -c -g -Wall -O2 -fPIC  -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE  bgzf.c -o bgzf.o
gcc -c -g -Wall -O2 -fPIC  -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE  kstring.c -o kstring.o
gcc -c -g -Wall -O2 -fPIC  -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE  knetfile.c -o knetfile.o
gcc -c -g -Wall -O2 -fPIC  -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE  index.c -o index.o
gcc -c -g -Wall -O2 -fPIC  -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE  bedidx.c -o bedidx.o
ar -csru libtabix.a bgzf.o kstring.o knetfile.o index.o bedidx.o
gcc -c -g -Wall -O2 -fPIC  -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -DBGZF_CACHE  main.c -o main.o
gcc -g -Wall -O2 -fPIC  -o tabix main.o -L. -ltabix -lm  -lz
./libtabix.a(bedidx.o): In function `bed_read':
/usr/local/tabix/bedidx.c:103: undefined reference to `gzopen64'
collect2: ld returned 1 exit status
make[1]: *** [tabix] Error 1
make[1]: Leaving directory `/usr/local/tabix'
make: *** [all-recur] Error 1

[root@pac tabix]# gcc -g -Wall -O2 -fPIC  -o tabix main.o -L. -ltabix -lm  -lz -H
./libtabix.a(bedidx.o): In function `bed_read':
/usr/local/tabix/bedidx.c:103: undefined reference to `gzopen64'
collect2: ld returned 1 exit status

[root@pac tabix]#

tabix multipule regions with delimited between output regions

Hi,

Say I have a bed file, file.bed, and I want to retrieve multipule regions.
file.bed:

chr1    10468   10469   
chr1    10470   10471   
chr1    10483   10484   
chr1    10488   10489   
chr1    10492   10493   
chr1    10496   10497

running tabix file.bed chr1:10468-10490 chr1:10478-10490 chr1:10496-10497 gives me all the rows that matches region in the file.bed:

chr1    10468   10469  
chr1    10470   10471  
chr1    10483   10484  
chr1    10488   10489  
chr1    10483   10484  
chr1    10488   10489 
chr1    10496   10497

Can I have an option to separate between the results, i.e. to define which rows came from each region? Something like:

chr1    10468   10469  
chr1    10470   10471  
chr1    10483   10484  
chr1    10488   10489
  
chr1    10483   10484  
chr1    10488   10489 

chr1    10496   10497

Tabix error using UCSC bedgraph example

I am trying to create a custom track for the WashU EpiGenome browser (instructions here: http://washugb.blogspot.com/2012/09/generate-tabix-files-from-bigwig-files.html), so I am using a bedgraph file example posted at UCSC page: http://genome.ucsc.edu/goldenPath/help/bedgraph.html

The file looks like this:

browser position chr19:49302001-49304701
browser hide all
browser pack refGene encodeRegions
browser full altGraph
#   300 base wide bar graph, autoScale is on by default == graphing
#   limits will dynamically change to always show full range of data
#   in viewing window, priority = 20 positions this as the second graph
#   Note, zero-relative, half-open coordinate system in use for bedGraph format
track type=bedGraph name="BedGraph Format" description="BedGraph format" visibility=full color=200,100,0 altColor=0,100,200 priority=20
chr19 49302000 49302300 -1.0
chr19 49302300 49302600 -0.75
chr19 49302600 49302900 -0.50
chr19 49302900 49303200 -0.25
chr19 49303200 49303500 0.0
chr19 49303500 49303800 0.25
chr19 49303800 49304100 0.50
chr19 49304100 49304400 0.75
chr19 49304400 49304700 1.00

I run bzip first:

bgzip input.bedgraph

and then I run tabix:

tabix -p bed input.bedgraph.gz

at which point I get these errors:

[get_intv] the following line cannot be parsed and skipped: browser position chr19:49302001-49304701
[ti_index_core] the indexes overlap or are out of bounds

If bedgraph is not the file format tabix expects, what is the input file format?

Thanks!

Perl library give warnings on use after perl-5.16+

Hi,

There seems to be some issue with the XS bindings. I've found this to be a problem on 5.16 and 5.18, okay on 5.14 but I didn't have access to 5.15 to try that.

% perl-5.16.3 -I blib/lib -e 'use Data::Dumper;use Tabix; warn Dumper(\%INC); my $tbi = Tabix->new(-data => q{some.vcf.gz}); print qq{Generated Tabix object\n};'
Subroutine Tabix::tabix_open redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_close redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_query redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_read redefined at blib/lib/Tabix.pm line 17.
Subroutine Tabix::tabix_getnames redefined at blib/lib/Tabix.pm line 17.
Subroutine TabixIterator::tabix_iter_free redefined at blib/lib/Tabix.pm line 17.
$VAR1 = {
          ...
          'Tabix.pm' => 'blib/lib/Tabix.pm',
          'TabixIterator.pm' => 'blib/lib/TabixIterator.pm',
          ...
        };
Generated Tabix object

Regards,
Keiran

figuring out tabix version

Hi,

Is there a way where I can figure out the version of tabix used for indexing a vcf.gz file?

Regards,
Prasun

Please archive this repo!

Since the note

About
Note: tabix and bgzip binaries are now part of the HTSlib project.

github.com/samtools/htslib

This repo should be archived.

@lh3 @pd3

Feat/support passing index files

v1.10 brought about the new -X option (-X include customized index file), to samtools.
samtools/samtools#978

Is it possible to request for tabix/bcftools/etc?

Use case would be for passing in signed s3 urls into all of the various tools.

e.g.

tabix <signed_vcf_url> -X <signed_vcf_tbi_url> chr2
bcftools view <signed_vcf_url> -X <signed_vcf_tbi_url> chr2
samtools view <signed_bam_url> -X <signed_bam_bai_url> chr2

error indexing gff3 file

What does the following error mean?

It is a merged and sorted gff3 file, containing SPADES scaffolds names.

[get_intv] the following line cannot be parsed and skipped: >NODE_1000_length_470_cov_0.860058
[ti_index_core] the indexes overlap or are out of bounds

https support

Hi,

I looked at the code and I saw that tabix supports fetching files from http and ftp, but there is no https support. Is something like this even on the roadmap, or it's completely out of the scope.

Best,
Viktor

Can you create a README describing the relationship of this code-base to the SourceForge projects?

It's confusing for people who aren't intimately familiar with the project. i.e. is the code here packaged for download via SF? Is the SF project now obsolete? Why is there a separate tabix / bzip SF projects? etc.

Cheers,
Dan.

htslib branch tabix does not work for remote files

The htslib branch does not handle ftp:// or http:// file locations.

E.g.

./tabix -h ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz 20
Could not load .tbi index of ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz