deeptools / hicexplorer Goto Github PK

View Code? Open in Web Editor NEW

222.0 14.0 68.0 598.94 MB

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.

Home Page: https://hicexplorer.readthedocs.org

License: GNU General Public License v3.0

Python 89.14% HTML 10.49% Makefile 0.31% Shell 0.06%

hic genomics chromosome-conformation-capture galaxy python bioinformatics

hicexplorer's Introduction

deepTools

User-friendly tools for exploring deep-sequencing data

deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome.

For support or questions please post to Biostars. For bug reports and feature requests please open an issue on github.

Citation:

Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research. 2016 Apr 13:gkw257.

Documentation:

Our documentation contains more details on the individual tool scopes and usages and an introduction to our deepTools Galaxy web server including step-by-step protocols.

Please see also the FAQ, which we update regularly. Our Gallery may give you some more ideas about the scope of deepTools.

For more specific troubleshooting, feedback, and tool suggestions, please post to Biostars.

Installation

deepTools are available for:

Command line usage (via pip / conda / github)
Integration into Galaxy servers (via toolshed/API/web-browser)

There are many easy ways to install deepTools. More details can be found here.

In Brief:

Install through pypi

$ pip install deeptools

Install via conda

$ conda install -c bioconda deeptools

Install by cloning the repository

$ git clone https://github.com/deeptools/deepTools
$ cd deepTools
$ pip install .

Galaxy Installation

deepTools can be easily integrated into Galaxy. Please see the installation instructions in our documentation for further details.

Note: From version 2.3 onwards, deepTools support python3.

This tool suite is developed by the Bioinformatics Facility at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg.

Documentation | deepTools Galaxy | FAQ

hicexplorer's People

Contributors

Stargazers

Watchers

Forkers

joachimwolff wyim-pgl wonaya gladex hexylena bgruening resurgo-genetics ajank vreuter vivekbhr adgoctic danledinh rungetf xflicsu zhixuqiu ahworld07 fengyq ishasethi7 namekawalab kalavattam gtrichard nrkssa buguashushu fengpku biogeek fkarg jasondanic heard-lab klestatoti jakevc bio-lijs yixf-self simonbray wangyibin sarahmaria27 xjyx qisun1 kenichihorisawa bowen1992 lldelisle ryys1122 alecw henryxie1982 jsemple19 altingia tw7649116 dilawar albertlidel unique379r dawe panguangze yuzhenpeng ralwoss nanguage vickismemo emeraldeye007 obada-alzoubi ningshuang-yao contessoto husseinao mia0509 jianhuil robomics gitbackspacer ghxdghxd pavanvidem

hicexplorer's Issues

hicFindTADs: Bonferroni correction / FDR?

During the update for the readthedocs material I registered for the mESC-tutorial that I do not find any TADs using hicFindTADs. I traced it down to the used Bonferroni correction which is very strict. Given the example data around 10,000 - 15,000 TADs (depending on how strict I prune before) are considered which leads to the effect that only TADs will pass the test if e.g. for a p-value of 0.05 they have to have a p-value of 0,000005. For none of the considered TADs this is true. Setting the p-value to 0.99 still gives no results, only the value of 1.
I want to ask and discuss:

Is this behaviour expected?
Did I made some major mistakes which lead to this situation? My updated readthedocs material is here. All steps I have done are documented there.
Is maybe the Bonferroni correction for this large number of tests not suitable? I implemented the FDR, attached is the plot using different q-values. FDR finds for a q-value of e.g. 0.05 4027 TADs. What is an expected number of TADs to find for this dataset?
Should we add the FDR as an additional multitest to hicFindTADs? This would give our users the decision what is the best testing for their specific setting.

Fractal/melt polymers in plot distance vs interaction

Is there a way to have plotted (using hicPlotDistVsCounts) the lines for distance/interaction for fractal and melted polymers? Something like this:

https://www.nature.com/article-assets/npg/nature/journal/v516/n7531/images/nature13833-sf2.jpg

Thank you very much!

Default value for hicCorrectMatrix correct --iterNum

The default value for the parameter --iterNum of the hicCorrectMatrix is set to 500. I am a little bit surprised by this high value because Imakaev et al. wrote in Iterative correction of Hi-C data reveals hallmarks of chromosome organization:

Throughout the paper, we use 50 iterations for iterative correction; this is twice the number required for the worst case encountered.

Why did we choose a value which is 10 times more?

I think a high default value is problematic because it leads to the following error message: ERROR:iterative correction:*Error* matrix correction produced extremely large values. This is often caused by bins of low counts. Use a more stringent filtering of bins.

In my opinion this error message is misleading the user to the wrong error source, which is in this case too many iterations of the correction and not that the z-score based bin filtering was to tolerant.

Add testing

add a test to call significant minimas so that we can avoid the delta threshold.

TypeError in TAD_score

Dear all,
I am trying to use HiCExplorer for calling TADs, but I encounter the following issue. I have a Hi-C matrix in Dekker's format, I first convert it into npz format using hicExport

hicExport -in HiC_matrix_colrow.txt.gz -o out_test_50.npz --inputFormat=dekker --outputFormat=npz

this works perfectly, but then I try to calculate the scores (with different parameters, here below just an example)

hicFindTADs TAD_score -m out_test_50.npz --minDepth 150000 --maxDepth 300000 --step 50000 -o out_test_score_50

the function hic_matrix.getRegionBinRange(chrom, left_start, left_start + 1)[0] raises the following error:

TypeError: 'NoneType' object has no attribute 'getitem'

do you have any idea of what can cause this error ??

thank you in advance

here below is the entire output:

INFO:hicFindTADs:removing diagonal values

binsize: 50000
command: TAD_score
matrix: out_test_50.npz
maxDepth: 300000
minDepth: 150000
numberOfProcessors: 1
outFileName: out_test_score_50
step: 50000
INFO:hicFindTADs:Computing z-score matrix...

processing chromosome chr6
WARNING File already exists out_test_score_50_zscore_matrix.h5
Overwriting ...
INFO:hicFindTADs:Computing TAD-separation scores...

INFO:hicFindTADs:computing spectrum for window sizes between 3 (150000 bp)and 6 (300000 bp) at the following window sizes 1 [150000, 200000, 291421]

Traceback (most recent call last):
File "/home/user/.local/bin/hicFindTADs", line 7, in
main()
File "/home/user/.local/lib/python2.7/site-packages/hicexplorer/hicFindTADs.py", line 1202, in main
chrom, chr_start, chr_end, matrix = compute_spectra_matrix(args)
File "/home/user/.local/lib/python2.7/site-packages/hicexplorer/hicFindTADs.py", line 1013, in compute_spectra_matrix
res = map(func, TASKS)
File "/home/user/.local/lib/python2.7/site-packages/hicexplorer/hicFindTADs.py", line 255, in compute_matrix_wrapper
return compute_matrix(*args)
File "/home/user/.local/lib/python2.7/site-packages/hicexplorer/hicFindTADs.py", line 299, in compute_matrix
mult_matrix = [get_cut_weight(hic_ma, cut, depth, return_mean=True) for depth in incremental_step]
File "/home/user/.local/lib/python2.7/site-packages/hicexplorer/hicFindTADs.py", line 222, in get_cut_weight
left_idx, right_idx = get_idx_of_bins_at_given_distance(hic_matrix, cut, window_len)
File "/home/user/.local/lib/python2.7/site-packages/hicexplorer/hicFindTADs.py", line 204, in get_idx_of_bins_at_given_distance
left_idx = hic_matrix.getRegionBinRange(chrom, left_start, left_start + 1)[0]
TypeError: 'NoneType' object has no attribute 'getitem'

enriched contacts

I am interested in finding enriched contacts, for instance bins with z-scores > than 3. Specially, I want to identify enriched contacts, intra- and inter-chromosome. For this purpose, I think I should compute z-scores considering only on one hand intra-chromosome bins and on the other hand inter-chromosome bins, right?

Is hicExplorer able to do so?

Thank you,
A.

track creation for plots and track example

Links for the Plot TAD section are not available:


We can plot the TADs for a given chromosomal region. For this we need to create a tracks file, which is a file containing instructions to plot. `HERE <>`__ are the instructions on how to build the track file. One of the example is attached here.

Test cases for hicBuildMatrix

In hicBuildMatrix is a bug concerning the computation of the cut_intervals. I have a fix for this and the test case test_buildMatrix.test_build_matrix is running as it should. (It is not uploaded right now because I am uncertain if everything is fixed.) The test case test_buildMatrix.test_build_matrix_rf is still failing. The print out of the first ten intervals of the test data, read from small_test_rf_matrix.h5 looks like it was created with a version which was having the bug. I want to verify how the test data should look like.

To verify I have printed out the first ten elements of test.cut_intervals and new.cut_intervals of the test case test_buildMatrix.test_build_matrix_rf:

('test.cut_intervals', [('chr2L', 0, 455, 43), ('chr2L', 455, 913, 88), ('chr2L', 913, 1371, 133), ('chr2L', 1371, 1829, 178), ('chr2L', 1829, 2287, 223), ('chr2L', 2287, 2745, 268), ('chr2L', 2745, 3203, 313), ('chr2L', 3203, 3661, 358), ('chr2L', 3661, 4119, 403), ('chr2L', 4119, 4577, 448)])

The last element is always increasing by 45. This behaviour was a bug caused by a wrong initalization of the coverage list. Instead of a 0-initalization of each element a 0, 1, 2, 3, ..., n initalization was done. The maximum element within a coverage is now always i * width_of_coverage for i in n. But here it looks like that it should be this way? The computed output with the fix is now:

('new.cut_intervals', [('chr2L', 0, 455, nan), ('chr2L', 455, 913, nan), ('chr2L', 913, 1371, nan), ('chr2L', 1371, 1829, nan), ('chr2L', 1829, 2287, nan), ('chr2L', 2287, 2745, nan), ('chr2L', 2745, 3203, nan), ('chr2L', 3203, 3661, nan), ('chr2L', 3661, 4119, nan), ('chr2L', 4119, 4577, nan)])

The question is now if you know how the 4th element of an interval should look like. The given test data is looking like the bug version and I am not sure what the correct values should be.

Thanks for your help!

hicPlotTADs help

hicPlotMatrix argument list typo

hicPlotMatrix, without arguments, gives this help:
usage: hicPlotMatrix [-h] --matrix MATRIX [--title TITLE]
[--scoreName SCORENAME] --outFileName OUTFILENAME
[--perChromosome] [--clearMaskedBins]
[--whatToShow {heatmap,3D,both}]
[--chromosomeOrder CHROMOSOMEORDER [CHROMOSOMEORDER ...]]
[--region REGION] [--region2 REGION2] [--log1p] [--log]
[--colorMap COLORMAP] [--vMin VMIN] [--vMax VMAX]
[--zMax ZMAX] [--dpi DPI] [--version]
hicPlotMatrix: error: argument --matrix/-m is required

The correct switch for the matrix should be -m, not -h (I think).

AttributeError: 'csamtools.AlignedRead' object has no attribute 'has_tag'

I am facing this problem when running HiCExplorer software:

command:
hicBuildMatrix -s mapping/SRR1956527_1.bam mapping/SRR1956527_2.bam -rs dpnII_positions_GRCm38.bed -seq GATC -b hiCmatrix/SRR1956527_ref.bam -o hiCmatrix/SRR1956527.matrix

reading mapping/SRR1956527_1.bam and mapping/SRR1956527_2.bam to build hic_matrix
Minimum distance considered between restriction sites is 300
Max distance: 800
Matrix size: 2666241
dangling sequences to check are {'pat_forw': 'ATC', 'pat_rev': 'GAT'}
Traceback (most recent call last):
File "/usr/local/bin/hicBuildMatrix", line 7, in
main()
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicBuildMatrix.py", line 644, in main
mate1_supplementary_list = get_supplementary_alignment(mate1, str1)
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicBuildMatrix.py", line 410, in get_supplementary_alignment
if read.has_tag('SA'):
AttributeError: 'csamtools.AlignedRead' object has no attribute 'has_tag'

pysam version:
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import pkg_resources
pkg_resources.get_distribution("pysam").version
'0.9.1.4'

Can you please help me to solve it out.
Thanks in advance

Wrong labels on scale bar for HiC heatmap

The scalebar for hicPlotTADs is wrong..

HicExplorer Version 1.3.
MatplotLib Version 1.4.3

hicFindTADs TAD_score error

Dear HiCExplorer,

After I build matrix with --binSize 10000 and merge with --nb 10 option which made 100000 binsize.

after that I ran hicFindTADs TAD_score with following option

hicFindTADs TAD_score -m hic_CorrectMatrix/hic_corrected_100kbp.npz.h5 --minDepth 500000 --maxDepth 1000000 --numberOfProcessors 12 -o a

It generated something below.

 hicFindTADs TAD_score -m hic_CorrectMatrix/hic_corrected_100kbp.npz.h5  --minDepth 500000 --maxDepth 1000000  --numberOfProcessors  12 -o a
\INFO:hicFindTADs:removing diagonal values

Number of poor regions to remove: 281
{'Chr08': 37, 'Chr09': 34, 'Chr04': 35, 'Chr05': 37, 'Chr06': 25, 'Chr07': 25, 'Chr01': 45, 'Chr02': 21, 'Chr03': 22}
found existing 281 nan bins that will be included for masking
binsize:        100000
command:        TAD_score
matrix: hic_CorrectMatrix/hic_corrected_100kbp.npz.h5
maxDepth:       1000000
minDepth:       500000
numberOfProcessors:     12
outFileName:    a
step:   200000
INFO:hicFindTADs:Computing z-score matrix...

processing chromosome Chr01
processing chromosome Chr02
processing chromosome Chr03
processing chromosome Chr04
processing chromosome Chr05
processing chromosome Chr06
processing chromosome Chr07
processing chromosome Chr08
processing chromosome Chr09
masked bins were restored
*WARNING* File already exists a_zscore_matrix.h5
 Overwriting ...
INFO:hicFindTADs:Computing TAD-separation scores...

INFO:hicFindTADs:computing spectrum for window sizes between 5 (500000 bp)and 10 (1000000 bp) at the following window sizes 2 [500000, 700000]

INFO:hicFindTADs:Using 12 processors

/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)
/mnt/d/scratch/bin/hicexplorer/hic/local/lib/python2.7/site-packages/scipy/sparse/base.py:470: RuntimeWarning: divide by zero encountered in true_divide
  return self.astype(np.float_)._mul_scalar(1./other)

I don't know how to handle these.

scipy version is below.
scipy==0.19.0

Do you have any idea?

Won

Removing outliers question

Hello,

I started working on Hi-C data available from GEO and my question is how to overcome the extremely large values produced when I run "hicCorrectMatrix correct"?
I found the explanation in the closed issue #113. But, even ranging more stringent filtering of bins (from +-2.5 to +-0.5), I still having the message:
ERROR:iterative correction:*Error* matrix correction produced extremely large values. This is often caused by bins of low counts. Use a more stringent filtering of bins.

I followed the HiCExplorer tutorial and could get results for 10 kb bins, but now I want to compare with 40 kb and 100 kb bins and got stuck on this point. Do you have any suggestion?

Thank you in advance.

Eijy

Enhancements to hicPlotTADs

plot points
do not plot NaN values (i.e. do not assume NaN = zero)

HiCBuildMatrix -bs required

for binsize (--bs) argument, the code says default = 10000 and required = TRUE.. I guess default doesn't work when required = TRUE ?

Potential bug in hicCompare

@fidelram @bgruening
Please verify the following:
hicCompareMatrices does not recognize the content of --operation option if in line 61 or 63 is is used Link. If I change it to == it is. Unfortunately the python console gives me for both True.
Can you confirm that a) it is working like it is. If not: b) it is working if we use == ?

Thanks!

Joachim

error because pandas is not installed

A user reports the following error

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/hicBuildMatrix", line 4, in <module>
    from hicexplorer.hicBuildMatrix import main
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hicexplorer/hicBuildMatrix.py", line 12, in <module>
    from hicexplorer.utilities import getUserRegion, genomicRegion
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hicexplorer/utilities.py", line 9, in <module>
    from hicexplorer.HiCMatrix import hiCMatrix
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hicexplorer/HiCMatrix.py", line 20, in <module>
    class hiCMatrix:
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/hicexplorer/HiCMatrix.py", line 203, in hiCMatrix
    def getLiebermanBins(self, filenameList, chrnameList, pandas = pandas):
NameError: name 'pandas' is not defined

hicPlotMatrix strips | character from identifiers

hicPlotMatrix with --region gi|568336006|gb|CM000680.2|
results in Chromosome name gi568336006gbCM0006802 in --region not in matrix

There seems to be no way to escape the pipe ("|") character in an ID. I suppose there is some internal munging that is removing the pipe character, even escaped, from an ID.

enable pep8 linting with flake8 and Galaxy linting

Code needs to be linted and the Galaxy wrappers as well. Both is deactivated in travis for now.

No module named hicPlotTADs_multit

Hello,
I have installed HiCExplorer and run the demo follow the mES-HiC_analysis document.

When I using hicPlotTADs to plot TAD domain, it reports an error No module named hicPlotTADs_multit

[yulj@ln01 yulj]$ hicPlotTADs --tracks tracks.ini --region chr1:1,000,000-2,000,000  -o tads.pdf
Traceback (most recent call last):
  File "/home/yulj/bin/hicPlotTADs", line 4, in <module>
    from hicexplorer.hicPlotTADs_multit import main
ImportError: No module named hicPlotTADs_multit

I didn't find any module named hicPlotTADs_multit in the library lib/python2.7/site-packages/hicexplorer. Could any one know how to fix the bug?
Should I using other module name (for example hicPlotTADs) to in lieu of hicPlotTADs_multit ?

I really do the test, copy hicPlotTADs.py and hicPlotTADs.pyc to hicPlotTADs_multit.py and hicPlotTADs_multit.pyc in the same library.

However, it reports another error:

[yulj@ln01 yulj]$ hicPlotTADs --tracks tracks.ini --BED TADs/marks_et-al_TADs_20kb-Bins_boundaries.bed  -o tads.pdf
{'x-axis': True, 'section_name': '1. [x-axis]'}
{'colormap': 'RdYlBu_r', 'title': 'Hi-C', 'file_type': 'hic_matrix', 'transform': 'log1p', 'depth': 1000000, 'section_name': '2. [hic track]', 'file': 'hiCmatrix/replicateMerged.Corrected-100bins.npz'}
Number of poor regions to remove: 2157
{'chr7': 147, 'chr6': 110, 'chr5': 104, 'chr4': 150, 'chr3': 127, 'chr2': 146, 'chr1': 142, 'chrY': 14, 'chr9': 71, 'chr8': 92, 'chr13': 80, 'chr12': 115, 'chr11': 79, 'chr10': 107, 'chr17': 70, 'chr16': 81, 'chr15': 71, 'chr14': 124, 'chr19': 51, 'chr18': 74, 'chrX': 202}
found existing 2157 nan bins that will be included for masking 
WARNING: bin size is not homogeneous. Median 190000
time initializing tracks
4.28803610802
saving tads.pdf_chr1:9975000-9975001'
Figure size in cm is 40 x 19.72. Dpi is set to 72
setting min, max values for track 2. [hic track] to: 1.0, 1.0
time before saving
1450667345.01
saving tads.pdf_chr1:9975000-9975001
Traceback (most recent call last):
  File "/home/yulj/bin/hicPlotTADs", line 7, in <module>
    main()
  File "/home/yulj/lib/python2.7/site-packages/hicexplorer/hicPlotTADs_multit.py", line 316, in main
    trp.plot(file_name, chrom, start, end, title=args.title)
  File "/home/yulj/lib/python2.7/site-packages/hicexplorer/trackPlot.py", line 186, in plot
    fig.savefig(file_name, dpi=self.dpi, transparent=False)
  File "/home/yulj/anaconda2/lib/python2.7/site-packages/matplotlib/figure.py", line 1565, in savefig
    self.canvas.print_figure(*args, **kwargs)
  File "/home/yulj/anaconda2/lib/python2.7/site-packages/matplotlib/backend_bases.py", line 2139, in print_figure
    canvas = self._get_output_canvas(format)
  File "/home/yulj/anaconda2/lib/python2.7/site-packages/matplotlib/backend_bases.py", line 2079, in _get_output_canvas
    '%s.' % (format, ', '.join(formats)))
ValueError: Format "pdf_chr1:9975000-9975001" is not supported.

It seems that the program cannot generate the correct output file name.

How to fix the problem I have told above?

Hope to receive your feedback.

Thanks,
Lijia

HiCBuildMatrix crash

HiCBuildMatrix is crashing if you restrict it to some chromosomes.

hicFindEnrichedContacts issue

I want to find enriched contacts between matrix1.npz and matrix2.npz. They come from different cell types. For that, I call the following command:

hicFindEnrichedContacts --matrix matrix1.npz --originalMat matrix2.npz --outFileName enrichedcontacts_zscore.npz --method z-score

And what I get is:

Traceback (most recent call last):
  File "/usr/bin/hicFindEnrichedContacts", line 4, in <module>
    __import__('pkg_resources').run_script('HiCExplorer==1.3', 'hicFindEnrichedContacts')
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 738, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1499, in run_script
    exec(code, namespace, namespace)
  File "/DATA/centos/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/EGG-INFO/scripts/hicFindEnrichedContacts", line 7, in <module>
    main()
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/hicFindEnrichedContacts.py", line 149, in main
    depth_in_bins=max_depth_in_bins)
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/utilities.py", line 75, in transformMatrix
    counts_by_distance, cut_intervals = hicma.getCountsByDistance(per_chr=per_chr, depth_in_bins=depth_in_bins)
TypeError: getCountsByDistance() got an unexpected keyword argument 'depth_in_bins'

Recommendation window size

Dear all,

I have a question regarding the window size parameters. If I am correct, in your article "High-resolution TADs reveal [...]" you use the following parameters: min window length = 10'000, max window length = 40'000 and a step of ~2'000 for a 0.5 kb resolution map.
What would be your recommendation for a Hi-C matrix of 50kb or 100 kb resolution ?
Can I use the following ratio minDepth = 3 x bin size and maxDepth = 6 x bin size as a rule of thumb ? (but how to choose the step size ?)

Thank you in advance and best regards,

Marie

Correct term for HiCExplorer, Hi-C matrix

We should use one notation for the term 'HiCExplorer' and similar ones like 'Hi-C Matrix'. So far I have seen 'HiCExplorer', 'Hi-C Explorer', 'Hi-C explorer', and 'hicexplorer' in our documentation. Additional there is 'Hi-C Matrix', 'hicmatrix', 'HiC Matrices' and 'Hi-C matrices'.

Which one is the correct one? In the publication the term 'HiCExplorer' and 'Hi-C contact matrix' is used.

aggregate plot

@fidelram can you push what you have into some branch?

Matrix resolution

How do you choose the best resolution? How do I know that my matrix is too sparse?

Thank you.

a/b compartment module

Initial work: https://github.com/maxplanck-ie/HiCExplorer/tree/pca

Get matrix with masked bins as nans

Hi,

I have .h5 norm matrices but I'd like to see the masked bins as nans in the python console. Is that possible?

Thanks,
A.

None valies in Y axis of Hi-C data

I upgraded HiCExplorer version and now hicPlotTADs outputs (with transform = log1p) the following axis:

hicPlotTADs visualization track

I was wondering if something like this:

is possible in hicPlotTADs.

Thank you,
A.

interaction arc plots

I'm having trouble generating link/arc plots using hicPlotTADs. My link file has the following format (tab delimited):

############
chrX 11470000 11480000 chr2L 160000 170000 4
chrX 11470000 11480000 chr2L 290000 300000 3
chrX 11470000 11480000 chr2L 480000 490000 4
############

I'm getting the attribute error:

Traceback (most recent call last):
File "//anaconda/envs/snowflakes/bin/hicPlotTADs", line 7, in
main()
File "//anaconda/envs/snowflakes/lib/python2.7/site-packages/hicexplorer/hicPlotTADs.py", line 333, in main
trp.plot(args.outFileName, *region, title=args.title)
File "//anaconda/envs/snowflakes/lib/python2.7/site-packages/hicexplorer/trackPlot.py", line 189, in plot
track.plot(axis, label_axis, chrom, start, end)
File "//anaconda/envs/snowflakes/lib/python2.7/site-packages/hicexplorer/trackPlot.py", line 1701, in plot
if interval.start < region_start and interval.end > region_end:
AttributeError: 'Interval' object has no attribute 'start'

###########

Is this a formatting problem in my links file, or a bug in hicPlotTADs.py? If the former, can you post an example links file in the test folder for testing?

documentation improvements

I think we should adopt the documentation strategy of deeptools as much as possible. Highlight the different modules from QC to HiCdiff.

Warnings when normalizing and error when plotting TADs

When I'm correcting with

hicCorrectMatrix correct -m HepG2_150000.npz -o HepG2_150000.corrected.npz --filterThreshold -1 5 -n 10

I can see the following warnings. Are they normal?

Warning. No bins removed for chromosome chrY using thresholds -1.0 5.0
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/usr/lib64/python2.7/site-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py:3569: RuntimeWarning: Invalid value encountered in median
  RuntimeWarning)
Warning. No bins removed for chromosome chrM using thresholds -1.0 5.0
Bins that are MAD outliers after merge (19.99%) out of: 4129
{'chrX': 196, 'chr13': 216, 'chr12': 196, 'chr11': 185, 'chr10': 148, 'chr17': 50, 'chr16': 130, 'chr15': 184, 'chr14': 163, 'chr19': 28, 'chr18': 81, 'chr22': 113, 'chr20': 62, 'chr21': 114, 'chr7': 222, 'chr6': 171, 'chr5': 225, 'chr4': 226, 'chr3': 275, 'chr2': 326, 'chr1': 350, 'chr9': 279, 'chr8': 189}
/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/iterativeCorrection.py:49: RuntimeWarning: divide by zero encountered in divide
  s = 1.0 / s
masked bins were restored

I see "Bins that are MAD outliers after merge (19.99%) out of: 4129". What does it mean?

Once I have the corrected npz.h5, I want to look for TADs:

hicFindTADs TAD_score -m HepG2_150000.corrected.npz.h5 --minDepth 300000 --maxDepth 900000 --step 150000 -p 20 -o tad_score.txt
hicFindTADs find_TADs --lookahead 4 --outPrefix find_tads --tadScoreFile tad_score.txt

and then plot regions with them:

hicPlotTADs --tracks track.txt --region X:99974316-101359967 --dpi 300 -out plots_tad.png

However, I've got an error:

Number of poor regions to remove: 4129
{'chrX': 196, 'chr13': 216, 'chr12': 196, 'chr11': 185, 'chr10': 148, 'chr17': 50, 'chr16': 130, 'chr15': 184, 'chr14': 163, 'chr19': 28, 'chr18': 81, 'chr22': 113, 'chr20': 62, 'chr21': 114, 'chr7': 222, 'chr6': 171, 'chr5': 225, 'chr4': 226, 'chr3': 275, 'chr2': 326, 'chr1': 350, 'chr9': 279, 'chr8': 189}
found existing 4129 nan bins that will be included for masking 

*ERROR*
Matrix contains negative values.
log1p transformation can not be applied to 
values in matrix: HepG2_150000.corrected.npz.h5

Use npz file I generated

I have a matrix I generated in npz. I've realized I need to have besides of matrix, chrNameList, startList, endList, and extraList. What's inside them?

Thank you.

cooler format support

It would be nice to support for formats in the long run. Cooler is one of them.

hic differential analysis

comparing two hic samples and obtaining contacts that are differential between both

hicFindEnrichedContacts, doubts about method

I see that hicFindEnrichedContacts, its option --method has lots of options:

z-score,t-score,residuals,obs/exp,p-value,pearson,nbinom-p-value,nbinom-expected,nbinom-est-dist,log-norm,chi-squared,none

What is residuals?
In case of p-value, how it is calculated?

Which method is better? Are some of them better under some conditions or assumptions? What is your experience?

Thank you.

TADs are called for centromers

Huge TADs that span centromers in mammalian data are being wrongly called. This is probably caused by bins around centromers that are removed, yet the algorithm is not aware of this gap.

hicPlotMatrix: direction of the diagonal

What is the reason that the diagonal in hicPlotMatrix goes from the bottom left corner to upper right if the parameter --region is used but from the top left corner to bottom right if the parameter --chromosomeOrder is set? Is this on purpose?

NaN to float failed

Encountered on my work on the develop branch.

|  processing chromosome chrM
|  /home/bag/miniconda3/envs/mulled-v1-9edb84bafdd4afcd296070604e305ff8b4927b823b3b2fdf9db528a123d76d21/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
|  out=out, **kwargs)
|  /home/bag/miniconda3/envs/mulled-v1-9edb84bafdd4afcd296070604e305ff8b4927b823b3b2fdf9db528a123d76d21/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
|  ret = ret.dtype.type(ret / rcount)
|  Traceback (most recent call last):
|  File "/home/bag/miniconda3/envs/mulled-v1-9edb84bafdd4afcd296070604e305ff8b4927b823b3b2fdf9db528a123d76d21/bin/hicPlotDistVsCounts", line 7, in <module>
|  main()
|  File "/home/bag/.local/lib/python2.7/site-packages/hicexplorer/hicPlotDistVsCounts.py", line 303, in main
|  mean_dict[matrix_file] = compute_distance_mean(hic_ma, maxdepth=args.maxdepth, perchr=args.perchr)
|  File "/home/bag/.local/lib/python2.7/site-packages/hicexplorer/hicPlotDistVsCounts.py", line 197, in compute_distance_mean
|  HiCMatrix.hiCMatrix.fit_cut_intervals(cut_intervals[chrname]))
|  File "/home/bag/.local/lib/python2.7/site-packages/hicexplorer/HiCMatrix.py", line 458, in fit_cut_intervals
|  median = int(np.median(np.diff(start)))
|  ValueError: cannot convert float NaN to integer

AttributeError: 'list' object has no attribute 'shape'

I am calling this command:

hicPlotMatrix --log1p --dpi 300 -m HepG2_150000.npz -o heatmap150000.png

and I get

Traceback (most recent call last):
  File "/usr/bin/hicPlotMatrix", line 5, in <module>
    pkg_resources.run_script('HiCExplorer==1.3', 'hicPlotMatrix')
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 540, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1455, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/EGG-INFO/scripts/hicPlotMatrix", line 7, in <module>
    main()
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/hicPlotMatrix.py", line 374, in main
    ma = HiCMatrix.hiCMatrix(args.matrix)
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/HiCMatrix.py", line 56, in __init__
    hiCMatrix.load_npz(matrixFile)
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/HiCMatrix.py", line 148, in load_npz
    assert len(cut_intervals) == matrix.shape[0], \
AttributeError: 'list' object has no attribute 'shape'

matrix is indeed a list after checking if the lower triangle is filled or not.

Two times 'color' in galaxy wrapper hicPlotTADs.xml

In line 57 the color of the track and in line 61 the border color is set. Both times this is done with 'color'. I am wondering if the case could occur that one is overriding the value of the other. If both are set the output file will have two times 'color' in consecutive lines. This behavior can be seen e.g. in the test cases which are failing right now caused by the new pandas version: https://travis-ci.org/maxplanck-ie/HiCExplorer/builds/237228087?utm_source=github_status&utm_medium=notification line 1645 or 1666.

log file for HiCbuildmatrix output

If the output summary of hiCbuildmatrix could be saved in a log file that will be nice.

hicPlotDistVsCounts table of counts

Is it possible to get the data from hicPlotDistVsCounts? In a txt file, for instance.

Thanks.

No TADs predicted

Hi,

I am using the last version of HiCExplorer, installed via pip install HiCExplorer.
I am using a resolution of 50K over the mouse mm10 genome
I am using the following command to calculate the scores:
hicFindTADs TAD_score --matrix fibroblasts_50K.npz --outFileName fibroblasts_tadscores.txt --minDepth 150000 --maxDepth 500000 --numberOfProcessors 8
I am using the following command to predict the TADs:
hicFindTADs find_TADs --tadScoreFile fibroblasts_tadscores.txt_zscore_matrix.h5 --outPrefix fibroblasts_tads

However, no TADs predicted in any of the different samples I have.

INFO:hicFindTADs:Number of boundaries for delta 0.01 and pval 0.01: 0

In order to get TADs, I have to set --pvalue 1. I have also plotted the values in fibroblasts_tads_score.bedgraph, this is the result (red lines delimit the different chromosomes):

What should I do?

TADs are there for sure, this is a part of the chromosome 1 (TADs shown -black lines- come from another algorithm):

hicPlotTADs: IndexError: index 0 is out of bounds for axis 0 with size 0

Hi,

when transform = -log in tracks.txt, everything's OK. Command:

hicPlotTADs --tracks tracks.txt --region 1:20000000:22000000 --dpi 300 -out test.png

When transform = log, it throws:

time initializing track(s):
21.2699968815
Figure size in cm is 40 x 14.42832. Dpi is set to 300
setting min, max values for track 2. [hic] to: 323.714421836, 329.513362322
Traceback (most recent call last):
  File "/usr/bin/hicPlotTADs", line 5, in <module>
    pkg_resources.run_script('HiCExplorer==1.3', 'hicPlotTADs')
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 540, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1455, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/EGG-INFO/scripts/hicPlotTADs", line 7, in <module>
    main()
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/hicPlotTADs.py", line 320, in main
    trp.plot(args.outFileName, *region, title=args.title)
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/trackPlot.py", line 177, in plot
    track.plot(axis, label_axis, chrom, start, end)
  File "/usr/lib/python2.7/site-packages/HiCExplorer-1.3-py2.7.egg/hicexplorer/trackPlot.py", line 949, in plot
    if ticks[0] == 0:
IndexError: index 0 is out of bounds for axis 0 with size 0

hicPlotMatrix None value

Dear HiCExplorer,

I tried to draw matrix with hicPlotMatrix.

The value in Y axis legend shows None.

The version I used below.
matplotlib==2.0.0
numpy==1.12.0
HiCExplorer===1.7.2-28-g9a5379f

I tested pip install version but it generated the same result.

Here is my plot.

TADs and subTADs

I'd like to differentiate TADs (clear TADs, strong boundaries) from those more weaker TAD boundaries. How can I do that? Which value or thresholds should I set?

Thanks.

Rename .bg output to bedgraph

The .bg files are not automatically recognized by IGV, also is a bit confusing what the file type is with .bg extension. I propose to change it to .bedGraph

deeptools / hicexplorer Goto Github PK

hicexplorer's Introduction

deepTools

User-friendly tools for exploring deep-sequencing data

Citation:

Documentation:

Installation

Galaxy Installation

hicexplorer's People

Contributors

Stargazers

Watchers

Forkers

hicexplorer's Issues

Recommend Projects

Recommend Topics

Recommend Org