open2c / cooler Goto Github PK

View Code? Open in Web Editor NEW

189.0 189.0 53.0 90.22 MB

A cool place to store your Hi-C

Home Page: https://open2c.github.io/cooler

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

3d-genome bioinformatics chromatin contact-matrix cooler file-format genomics hdf5 hi-c ngs python sparse

cooler's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger erezlab scottx611x pkerpedjiev minghao2016 danielsday resurgo-genetics cyang-2014 soolee msauria yixf-self danledinh zhong-lab-ucsd xiaotaowang nemochina2008 sbreiff crerecombinase darlingfuer xtmgah bio-lijs luisdiaz1997 mortegac xjyx mimakaev joachimwolff zhang-jiankun phlya lulizou nvictus deepbody-me zhuakexi yluan91 battyone lijiacd985 hbbrandao gfudenberg ay-lab nservant zhaoxu-gao kononkova zhan4429 robomics kimquililan anybananagame kunfang93 hakanozadam wxbizwnbla74 a-detiste jctourtellotte genostack zm19911213 nithingowda16 imsanko

cooler's Issues

Add more formatting options to cooler dump

Choose column subset
NaN print format
symmetrize (#105)

unhelpful error when file_contigs is empty

INFO:cooler:Creating cooler at "out.cool::/"
INFO:cooler:Writing chroms
INFO:cooler:Writing bins
INFO:cooler:Writing pixels
Traceback (most recent call last):
  File "/usr/local/bin/cooler", line 11, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/cooler/cli/cload.py", line 243, in pairix
    create(cool_path, bins, iterator, metadata, assembly)
  File "/usr/local/lib/python3.5/dist-packages/cooler/io.py", line 237, in create
    file_path, target, meta.columns, iterable, h5opts, lock)
  File "/usr/local/lib/python3.5/dist-packages/cooler/_writer.py", line 184, in write_pixels
    for i, chunk in enumerate(iterable):
  File "/usr/local/lib/python3.5/dist-packages/cooler/_binning.py", line 455, in __iter__
    granges = balanced_partition(self.gs, self.n_chunks, self.file_contigs)
  File "/usr/local/lib/python3.5/dist-packages/cooler/_binning.py", line 74, in balanced_partition
    grouped = gs.bins.groupby('chrom', sort=False)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 4271, in groupby
    **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/groupby.py", line 1626, in groupby
    return klass(obj, by, **kwds)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/groupby.py", line 392, in __init__
    mutated=self.mutated)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/groupby.py", line 2625, in _get_grouper
    if is_categorical_dtype(gpr) and len(gpr) != len(obj):
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/series.py", line 465, in __len__
    return len(self._data)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals.py", line 2987, in __len__
    return len(self.items)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals.py", line 2898, in _get_items
    return self.axes[0]
IndexError: list index out of range

Observed over expected

Hi, is it possible to efficiently create and store a simple observed over expected map in the same (or different) cooler file? I'd like to have one for data at 5 kb resolution, but a whole chromosome to compute it in dense format won't fit into memory...

Add version argument to CLI

Re: #19

Get complete matrix from cooler dump?

I was curious if the output of cooler dump/show can get a text version of the contact matrix? Outputting a triangular matrix as a text file would be cool :)

Revise the meaning mad-max parameter

crash with digested bin file

Are there any known issues with cooler cload when using non-fixed bin sizes?

I am trying to generate a cooler file with a digested bin file:

cooler digest genomes/hg19_chr_only_length.txt genomes/hg19_chr_only.fasta MboI > bins/hg19_mboi_1f.bed

and then

cooler cload tabix bins/hg19_mboi_1f.bed hic_data.pairs.gz hic_data_1f.cool

after a few minutes..

chroms
bins
pixels
Traceback (most recent call last):
  File "/home/ezorita/miniconda2/bin/cooler", line 11, in <module>
    load_entry_point('cooler', 'console_scripts', 'cooler')()
  File "/home/ezorita/miniconda2/lib/python2.7/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/home/ezorita/miniconda2/lib/python2.7/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/home/ezorita/miniconda2/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ezorita/miniconda2/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ezorita/miniconda2/lib/python2.7/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ezorita/miniconda2/lib/python2.7/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/mnt/Data/BHIVE/HiC/pipeline/cooler/cooler/cli/cload.py", line 155, in tabix
    create(h5, chroms, lengths, bins, iterator, metadata, assembly)   
  File "/mnt/Data/BHIVE/HiC/pipeline/cooler/cooler/io/__init__.py", line 67, in create
    bin1_offset, nnz = write_pixels(grp, n_bins, reader, h5opts)
  File "/mnt/Data/BHIVE/HiC/pipeline/cooler/cooler/io/_writer.py", line 157, in write_pixels
    for chunk in reader:
  File "/mnt/Data/BHIVE/HiC/pipeline/cooler/cooler/io/_reader.py", line 250, in __iter__
    for df in self.aggregate(map=pool.imap):
  File "/home/ezorita/miniconda2/lib/python2.7/site-packages/multiprocess/pool.py", line 668, in next
    raise value
TypeError: range() integer step argument expected, got NoneType.

I generated several cooler files with the same dataset hic_data.pairs.gz using fixed bin sizes without any problem.

Cheers

cload fails when given a contig missing from the data

Tabix raises a ValueError when trying to extract a window using a non-existing contig. Might be a good idea to convert this into a warning instead of suppressing it completely.

nandankita @nandankita
Also, I got this error today while creating .cool file. Will you please let me know what does it mean?
chr6_ssto_hap7
Traceback (most recent call last):
File "/usr/bin/cooler", line 9, in <module>
load_entry_point('cooler==0.5.2', 'console_scripts', 'cooler')()
File "/usr/lib/python2.7/site-packages/click/core.py", line 716, in call
return self.main(args, kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 534, in invoke
return callback(args, **kwargs)
File "/home/ankita/4DN/cooler/cooler/cooler/cli/cload.py", line 83, in cload
create(h5, chroms, lengths, bins, reader, metadata, assembly)
File "/home/ankita/4DN/cooler/cooler/cooler/io/init.py", line 66, in create
bin1_offset, nnz = write_pixels(grp, n_bins, reader, h5opts)
File "/home/ankita/4DN/cooler/cooler/cooler/io/_writer.py", line 157, in write_pixels
for chunk in reader:
File "/home/ankita/4DN/cooler/cooler/cooler/io/_reader.py", line 209, in iter
for chunk in self._iterchunks(chrom):
File "/home/ankita/4DN/cooler/cooler/cooler/io/_reader.py", line 187, in _iterchunks
for record in pairsfile.fetch(chrom, bin.start, bin.end, parser=parser):
File "pysam/ctabix.pyx", line 455, in pysam.ctabix.TabixFile.fetch (pysam/ctabix.c:5292)
ValueError: could not create iterator for region 'chr6_ssto_hap7:1-10000'

nvictus @nvictus 12:25
Seems like the scaffold chr6_ssto_hap7 wasn't available in your contact data, but was in your chromsizes file. If that's true, this is a bug. The workaround for now would be to limit your chromsizes file (and the bins file generated from it) to only those contigs that were mapped to in your data.

Update docs and notebook tutorial

Even the load distribution on pairs aggregators

Current text-based contact aggregators have a very uneven load distribution (each chrom1 block is binned on a single worker). This is straightforward to address.

rename HDF5 fields for consistency

scaffolds --> chroms
matrix --> pixels

Class should allow specification of the root path of a matrix

Also provide an intuitive way to list all matrices in a file.

trans-only balancing

Long chromosome names cause silent problems in coarsegrain

Having chromosome names which are longer than 32 characters leads to hard to detect problems in downstream analysis. When running cooler coarsegrain, the chromosome names that are obtained from the bins (via the special_dtype) don't match those in the chroms table, leading to incorrect bin_offsets and no coarse-graining.

The root cause appears to be the fact that chromosome names are truncated to 32 characters in _writer.py (e.g. CHROM_DTYPE = np.dtype('S32')), whereas they're kept in their original form in write_bins (e.g. enum_dtype = ...).

I've submitted a PR which errors out of the cload program when the chromosome names (sequence ids) are longer than 32 characters. Ideally this limit should be increased to 64 or 128 bytes as longer sequence ids are not uncommon, but I'm not sure if this will break backward compatibility with existing cooler files. In either case, there should be at least a warning or even an error because this will lead to downstream issues (as in coarsegrain) which are difficult to diagnose.

Normalization

Could you explain what's the difference between a.k.a balancing and ICE(iterative correction and eigenvector decomposition), how to understand balancing, and whether the balancing used in the cooler is a upstate for Hi-C normalisation.

csort modifies input file

I am trying to use cooler csort (version cooler==0.5.3) without much success. My pairs file pairs.txt has the following format:

chr12   74054834        +       chr17   37101300        -
chr10   107716614       +       chr8    74236985        +
...

and the chromosome size file (hg19_chromsizes.txt):

chr1    249250621
chr10   135534747
...

When I run cooler csort hg19_chromsizes.txt pairs.txt I get the following output:

Enumerating chroms...
chr1	1
chr10	2
chr11	3
chr12	4
chr13	5
chr14	6
chr15	7
chr16	8
chr17	9
chr18	10
chr19	11
chr2	12
chr20	13
chr21	14
chr22	15
chr3	16
chr4	17
chr5	18
chr6	19
chr7	20
chr8	21
chr9	22
chrM	23
chrX	24
chrY	25

and after a few seconds my 14GB contact pairs file is converted into a useless 28 byte binary file, along with a new file pairs.txt.tbi, which I guess it's the failed attempt to index the former with tabix.

Did I misunderstand the usage of cooler csort? Why does it modify the input file?

Thanks!

cooler show raises an Exception when the region overflows the chromosome

Proposed solution: return a trimmed matrix?

recursive_agg_onefile.py thows exception

[peter@DBMI-PKerpedjiev-MacBook-Pro cooler.mirnylab] [origin-develop]$ python cooler/contrib/recursive_agg_onefile.py ~/data/test/su_test.cool
binsize: 1000000
total_length (bp): 3095693981
n_tiles: 12.0925546133
n_zooms: 4
Copying base matrix to level 4 and producing 4 new zoom levels counting down to 0...
ZoomLevel: 4 1000000
ZoomLevel: 3 2000000
Traceback (most recent call last):
  File "cooler/contrib/recursive_agg_onefile.py", line 191, in <module>
    aggregate(infile, outfile, n_zooms, chunksize, n_cpus)
  File "cooler/contrib/recursive_agg_onefile.py", line 72, in aggregate
    map=pool.imap if n_cpus > 1 else map)
TypeError: __init__() got an unexpected keyword argument 'cooler_root'

If I remove that parameter and re-run the script, I get this error:

[peter@DBMI-PKerpedjiev-MacBook-Pro cooler.mirnylab] [origin-develop]$ python cooler/contrib/recursive_agg_onefile.py ~/data/test/su_test.cool
binsize: 1000000
total_length (bp): 3095693981
n_tiles: 12.0925546133
n_zooms: 4
Copying base matrix to level 4 and producing 4 new zoom levels counting down to 0...
ZoomLevel: 4 1000000
ZoomLevel: 3 2000000
Traceback (most recent call last):
  File "cooler/contrib/recursive_agg_onefile.py", line 190, in <module>
    aggregate(infile, outfile, n_zooms, chunksize, n_cpus)
  File "cooler/contrib/recursive_agg_onefile.py", line 71, in aggregate
    map=pool.imap if n_cpus > 1 else map)
  File "/Users/peter/.virtualenvs/stuff/lib/python2.7/site-packages/cooler-0.6.4.dev0-py2.7.egg/cooler/io/_reader.py", line 374, in __init__
    self._size = cool.info['nnz']
AttributeError: 'str' object has no attribute 'info'

Running it on one of the files in the test directory yields yet another error:

[peter@DBMI-PKerpedjiev-MacBook-Pro cooler.mirnylab] [origin-develop]$ python cooler/contrib/recursive_agg_onefile.py tests/data/dixon2012-h1hesc-hindiii-allreps-filtered.1000kb.multires.cool
Traceback (most recent call last):
  File "cooler/contrib/recursive_agg_onefile.py", line 180, in <module>
    binsize = cooler.info(f)['bin-size']
KeyError: 'bin-size'

Need a way to select till the end of the chromosome

.fetch("chr1:1000000-"), or "chr1:1000000-end" or something would be nice.

It is useful for mice, where all chromosomes basically start at 3 MB

Error getting PairixAggregator

Was wondering if the pairix library is needed for running? I did pip install cooler and then just ran cooler and it gave the error

cooler
Traceback (most recent call last):
  File "/home/cdiesh/.linuxbrew/bin/cooler", line 7, in <module>
    from cooler.cli import cli
  File "/home/cdiesh/.linuxbrew/Cellar/python/2.7.13/lib/python2.7/site-packages/cooler/cli/__init__.py", line 22, in <module>
    from . import (
  File "/home/cdiesh/.linuxbrew/Cellar/python/2.7.13/lib/python2.7/site-packages/cooler/cli/cload.py", line 16, in <module>
    from ..io import create, TabixAggregator, HDF5Aggregator, PairixAggregator
ImportError: cannot import name PairixAggregator

Does it need https://github.com/4dn-dcic/pairix ?

Problem using 'balance' with CLI

If a 'weight' column already exists in a cool file (i.e. the file has already been balanced), running the balance command causes an error.

Using /tests/data/GM12878-MboI-matrix.2000kb.cool,
cooler balance -p 10 tests/data/GM12878-MboI-matrix.2000kb.cool
followed by
cooler balance -p 10 tests/data/GM12878-MboI-matrix.2000kb.cool --force
causes the following error:
KeyError: "Couldn't delete link (No write intent on file)".

Ran across this testing CLI tools on a .cool file created by a .hic to .cool converter I wrote and thought you all should know.

Cooler.extent returns a single offset instead of extent

Add utility to validate pixel sort

Efficiently fetch thousands of regions

Hi, is there a way to efficiently fetch thousands of regions from a cooler file (or a few files in parallel?)? I am comparing "density" (sum of Hi-C interactions) along the genome across 3 datasets, and want to do it with 25 or 50 kb sliding windows, and it takes a very long time even with 4 processes... Loading a whole chromosome is not an option, since at 5 kb (Hi-C resolution I use) it doesn't fit into memory. Loading chromosomes as big chunks is an option, but complicates things.
Cheers,
Ilya

Store chromosome names as HDF5 enum

Rather than fixed-length strings, the chroms/name dataset should be stored as HDF5 enums, just as the bins/chrom dataset is.

Store correct scaling factor as metadata for balancing

With rescale_marginals=True, the balancing weights are divided by sqrt(mean(final_marg)) so that the marginals (row sums) become unity when those transformed weights are applied. To obtain the original balanced matrix, one would need to multiply the elements by mean(final_marg).

Currently, the former value is stored as the scale attribute of the weight column, but it is probably more convenient to store the latter, as one is more likely to rescale queried matrices rather than the weights back to the original state.

Convert scripts folder into a CLI

Once the scripts mature and stabilize. Make use of setuptools console_scripts entry points and look into a CLI framework like click or one of these: http://docs.python-guide.org/en/latest/scenarios/cli/.

cli does not print usage of specific commands

"cooler dump" and other commands without arguments could print the page of the manual that describes the usage. It would make CLI much easier.

Crash with large datasets

It seems that large datasets overflow the size flag used by "Pool". I ran cooler cload succesfully with small datasets but with several billion contacts:

cooler cload tabix binfiles/bin_1kb.bed dataset.scf.bgz dataset.cool

This is 1kb binning, with > 1billion contacts in dataset.scf I get the following error:

INFO:cooler:Using 8 cores
/usr/local/lib/python2.7/dist-packages/cooler/_reader.py:232: UserWarning: NOTE: When using the Tabix aggregator, make sure the order of chromosomes in the provided chromsizes agrees with the chromosome ordering of read ends in the contact list file.
  "NOTE: When using the Tabix aggregator, make sure the order of "
INFO:cooler:Creating cooler at "/scratch/dataset.cool::/"
INFO:cooler:Writing chroms
INFO:cooler:Writing bins
INFO:cooler:Writing pixels
INFO:cooler:chrM
INFO:cooler:chrY
INFO:cooler:chr21
INFO:cooler:chr22
INFO:cooler:chrX
INFO:cooler:chr19
INFO:cooler:chr20
INFO:cooler:chr18
INFO:cooler:chr9
INFO:cooler:chr17
INFO:cooler:chr16
INFO:cooler:chr15
INFO:cooler:chr13
INFO:cooler:chr14
INFO:cooler:chr8
INFO:cooler:chr7
INFO:cooler:chr6
INFO:cooler:chr5
INFO:cooler:chr12
INFO:cooler:chr4
INFO:cooler:chr11
Traceback (most recent call last):
  File "/usr/local/bin/cooler", line 11, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cooler/cli/cload.py", line 186, in tabix
    create(cool_path, chromsizes, bins, iterator, metadata, assembly)
  File "/usr/local/lib/python2.7/dist-packages/cooler/io.py", line 164, in create
    filepath, target, n_bins, iterator, h5opts, lock)
  File "/usr/local/lib/python2.7/dist-packages/cooler/_writer.py", line 174, in write_pixels
    for chunk in iterator:
  File "/usr/local/lib/python2.7/dist-packages/cooler/_reader.py", line 313, in __iter__
    for df in self._map(self.aggregate, chroms):
  File "/usr/local/lib/python2.7/dist-packages/multiprocess/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/local/lib/python2.7/dist-packages/multiprocess/pool.py", line 567, in get
    raise self._value
multiprocess.pool.MaybeEncodingError: Error sending result: '[     bin1_id  bin2_id  count
0     384852  1127363      1
0     384858   977744      1
2     384858  1890814      1
1     384858  1977029      1
2     384862  1685662      1
1     384862  2310165      2
3     384862  2603767      1
0     384862  2892389      1
0     384876  2196724      1
0     384887  1889333      1
1     384889  1079339      1
0     384889  2820833      1
0     384890  2308658      1
0     384894  1879589      1
0     384896  2261960      1
20    384899   417637      1
27    384899   528891      1
24    384899   943862      1
3     384899  1228810      1
5     384899  1241744      1
28    384899  1297021      1
26    384899  1485562      1
1     384899  1530749      1
22    384899  1530863      2
2     384899  1531657      1
4     384899  1539596      1
21    384899  1551590      1
6     384899  1662865      1
18    384899  1664090      1
10    384899  1668517      1
..       ...      ...    ...
43    519732  2160311      1
57    519732  2175747      1
74    519732  2186566      1
107   519732  2199037      1
26    519732  2217834      1
64    519732  2220820      1
82    519732  2254178      1
83    519732  2261352      1
44    519732  2272952      1
2     519732  2302728      1
23    519732  2356330      1
0     519732  2373634      1
106   519732  2396659      1
69    519732  2418477      1
6     519732  2435624      1
59    519732  2435848      1
50    519732  2518239      1
98    519732  2550221      1
38    519732  2558119      1
21    519732  2559581      1
66    519732  2596125      1
70    519732  2616625      1
24    519732  2626677      1
37    519732  2724006      1
101   519732  2737631      1
41    519732  2772148      1
13    519732  2838601      1
11    519732  2862142      1
71    519732  2868541      2
72    519732  2922302      1

[68481974 rows x 3 columns]]'. Reason: 'IOError('bad message length',)'

Probably an overflow on the size flag of Pool... ?

bioconda installation on python 3.6 fails

UnsatisfiableError: The following specifications were found to be in conflict:

cooler -> multiprocess -> python 2.7* -> openssl 1.0.1*
python 3.6*

return restriction site indices in cooler digest

add an option to return genome-wide and chromosome-wide genome indices in cooler digest. This allows to calculate the number of (undigested) restriction sites between two fragments

Refactor aggregators and writers to exploit multiprocess aggregation without async issues

region parsing fails if the chromosome name has a dash

allow csort to take multiple pairs files

It would be very useful to allow csort to take multiple pair files at the same time and merge them into the same sorted (and, optionally, de-duped) output file. This would allow to merge multiple lanes of the same replica or multiple replicas into the same condition.

add an `as_table` option for matrix range queries

pixeltable lets you range query the pixel table (upper triangular data) by table row and returns a dataframe.

matrix supports 2-D genomic coordinate range queries and returns a rectangular sparse matrix. However, it should provide the option to return a dataframe of the upper triangular data corresponding to the 2-D range query..

Create a conda recipe

support non-uniform binning

cooler.io.from_fraghdf5 currently only generates coolers using uniform bins.

need for parallel gnu sort implementation

Either document this, or make it optional (with a warning).

Or maybe provide ability to pass options through to sort?

Revise default dtypes of required columns

max uint32 is 4,294,967,295

bin1_id, bin2_id and counts can easily be uint32
start and end could also be uint32
chrom_id?

access HTTP or FTP URI of cool files?

Hi,

What does the cooler URI mean?
I am trying to run cooler info against a HTTP URL of a .cool file, but get OSError error.
Thanks.

$ cooler info -h
Usage: cooler info [OPTIONS] COOL_PATH

  Display file info and metadata.

  COOL_PATH : Path to a COOL file or Cooler URI.

Options:
  -f, --field TEXT  Print the value of a specific info field.
  -m, --metadata    Print the user metadata in JSON format.
  -o, --out TEXT    Output file (defaults to stdout)
  -h, --help        Show this message and exit.

Best way to convert other formats to cooler

Hi, I have several matrices stored in various format like hic format, and output from mirnylib *.hm, and plain text matrix. What's the best way to convert these matrices to cooler? I can easily read these matrices into memory but it seems that the cooler.io.create() is not functioning now.

One other issue is that these matrices may already be balanced. Can the cooler store counts other than int32, like float32?

ImportError: cannot import name 'PairixAggregator'

Tried using cooler after updating it, here is the problem I get:

(Hi-C)[s1529682@login04(eddie) scripts]$ cooler
Traceback (most recent call last):
  File "/exports/igmm/eddie/wendy-lab/ilia/Hi-C/bin/cooler", line 11, in <module>
    load_entry_point('cooler==0.6.7.dev0', 'console_scripts', 'cooler')()
  File "/exports/igmm/eddie/wendy-lab/ilia/Hi-C/lib/python3.5/site-packages/pkg_resources/__init__.py", line 560, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/exports/igmm/eddie/wendy-lab/ilia/Hi-C/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2648, in load_entry_point
    return ep.load()
  File "/exports/igmm/eddie/wendy-lab/ilia/Hi-C/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2302, in load
    return self.resolve()
  File "/exports/igmm/eddie/wendy-lab/ilia/Hi-C/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2308, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/exports/igmm/eddie/wendy-lab/ilia/Hi-C/lib/python3.5/site-packages/cooler/cli/__init__.py", line 34, in <module>
    from . import (
  File "/exports/igmm/eddie/wendy-lab/ilia/Hi-C/lib/python3.5/site-packages/cooler/cli/cload.py", line 16, in <module>
    from ..io import create, TabixAggregator, HDF5Aggregator, PairixAggregator
ImportError: cannot import name 'PairixAggregator'

Am I missing anything?..

Full support for nested Cooler groups in CLI

roadmap for balancing

implement MAD-max filtering
add maximum iteration criterion for failed convergence
[ ]

lexsort check before lexbisect

io.from_readhdf5 can hang on unsorted data or fail for the wrong reason because before checking for sort order chunks are selected with lexbisect which assumes the data is already sorted.

Add pairix aggregator

Failure to index numpy array with pd.Series on older numpy and newer pandas

This is a known issue: pandas-dev/pandas#8310

It's fixed in numpy > 1.9 or 1.10. But it's better anyways to unbox Series into numpy arrays when used for indexing a numpy array.

Add pypairix to requirements

Re: #29

possible bug generating from file with single sided reads

an optional blacklist of regions for cooler balance

Sometimes a region of a chromosome can be "misbehaving", e.g it might contain a translocation or a duplication. It would be useful to allow the user to exclude it from balancing and downstream analyses by providing a .bed file specifying the locations of bad regions.