jernst98 / chromhmm Goto Github PK

View Code? Open in Web Editor NEW

69.0 7.0 18.0 293.08 MB

License: GNU General Public License v3.0

Java 100.00%

chromhmm's Introduction

See https://ernstlab.biolchem.ucla.edu/ChromHMM/ for more information on ChromHMM.

chromhmm's People

Contributors

Stargazers

Watchers

Forkers

arcolombo jiningq alicez2016 jakoberr zefeng-wu kernco yu-1011 weng-lab tools-jusue404 zorrodong el-castor neekonsu mahshaaban dahun73 mahjiong slives-lab iamzhangxiaoyu tliu76

chromhmm's Issues

ChromHmm output segment file of varying segment size

Hi, I have a question regarding the chromHmm output segment file.
I have used binarization at 200 bp, and my out segments have a minimal length of 200bp but the maximal length is of different sizes. is this normal that software to segment the genomic regions based on enrichment of markers? I was just a bit confused that each replicate produces quite different segment sizes with the same set of parameters and number markers.

Thank you

Exception in thread "main" java.lang.IllegalArgumentException

Hello,

I am new to ChromHMM. I tried to enter the example command and got this:

C:\Program Files (x86)\ChromHMM>java -mx4000M -jar ChromHMM.jar LearnModel SAMPL
EDATA_HG18 OUTPUTSAMPLE 10 hg18
Exception in thread "main" java.lang.IllegalArgumentException: OUTPUTSAMPLE does
not exist and could not be created!
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:6351)

Any idea of this? Thanks.

Model fit and uncertainity of the estimates

Hi @jernst98

I would like to ask if there is a way to quantify

how well the model constructed by ChromHMM fits the data
and, the uncertainity of the estimated probabilities (emission/transmissions)

nullpointerexception error while running CompareModels

Hi,

I am getting a nullpointerexception error while running CompareModels and can't quite troubleshoot what is wrong.
This is similar to this issue here (#27) but the difference being I am using the emission file generated directly from ChromHMM.

java -mx4000M -jar ChromHMM.jar CompareModels treatment1/results/emissions_8.txt treatment2/results/ 1v2

Exception in thread "main" java.lang.NullPointerException
	at edu.mit.compbio.ChromHMM.StateAnalysis.makeModelEmissionCompare(StateAnalysis.java:3820)
	at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13585)

head treatment1/results/emissions_8.txt
State (Emission order)	mark1	mark2	mark3
1	2.1727870800772254E-72	2.4236493512549204E-47	0.858399716497905
2	6.697771414533143E-102	0.9995916808920738	0.9982136561038701
3	7.62884882600209E-115	0.017388585226759533	2.9595321706205998E-248
4	0.8868409570279551	0.9546918414337829	1.2736435867177494E-166
5	1.0	0.9999999999997488	0.9985053640320722
6	0.9658642491977975	1.2972980770896698E-153	9.23970285249423E-242
7	0.99994958182997	4.055123944305359E-90	0.9951794781672061
8	1.2784797488695767E-211	9.67225807800232E-179	7.774716490529544E-269

Thanks!

Exception in thread "main" java.lang.IllegalArgumentException: . is an invalid strand!

I have downloaded and unzipped the latest chromhmm version, the 'learn model' test worked on the supplied test data.

Now I have tried to work with my own bed files.

less ES-Bruce4/cell-markfile.txt
ES-Bruce4       CTCF    CTCF_ES-Bruce4_ENCFF001YAC.bed.gz
ES-Bruce4       EP300   EP300_ES-Bruce4_ENCFF001YAD.bed.gz
ES-Bruce4       H3K27ac H3K27ac_ES-Bruce4_ENCFF001XWR.bed.gz
ES-Bruce4       H3K27me3        H3K27me3_ES-Bruce4_ENCFF001XWQ.bed.gz
ES-Bruce4       H3K36me3        H3K36me3_ES-Bruce4_ENCFF001XWS.bed.gz
ES-Bruce4       H3K4me1 H3K4me1_ES-Bruce4_ENCFF001XWT.bed.gz
ES-Bruce4       H3K4me3 H3K4me3_ES-Bruce4_ENCFF001XWU.bed.gz
ES-Bruce4       H3K9ac  H3K9ac_ES-Bruce4_ENCFF001XWO.bed.gz
ES-Bruce4       H3K9me3 H3K9me3_ES-Bruce4_ENCFF001XWP.bed.gz
ES-Bruce4       POLR2A  POLR2A_ES-Bruce4_ENCFF001YAE.bed.gz

ls ES-Bruce4/
cell-markfile.txt                      H3K4me1_ES-Bruce4_ENCFF001XWT.bed.gz
CTCF_ES-Bruce4_ENCFF001YAC.bed.gz      H3K4me3_ES-Bruce4_ENCFF001XWU.bed.gz
EP300_ES-Bruce4_ENCFF001YAD.bed.gz     H3K9ac_ES-Bruce4_ENCFF001XWO.bed.gz
H3K27ac_ES-Bruce4_ENCFF001XWR.bed.gz   H3K9me3_ES-Bruce4_ENCFF001XWP.bed.gz
H3K27me3_ES-Bruce4_ENCFF001XWQ.bed.gz  POLR2A_ES-Bruce4_ENCFF001YAE.bed.gz
H3K36me3_ES-Bruce4_ENCFF001XWS.bed.gz  Predictions

java -mx1600M -jar ~/ChromHMM/ChromHMM.jar BinarizeBed ~/ChromHMM/CHROMSIZES/mm9.txt ES-Bruce4 ES-Bruce4/cell-markfile.txt ChromHMM_binary
Exception in thread "main" java.lang.IllegalArgumentException: . is an invalid strand!
    at edu.mit.compbio.ChromHMM.Preprocessing.loadGrid(Preprocessing.java:419)
    at edu.mit.compbio.ChromHMM.Preprocessing.makeBinaryDataFromBed(Preprocessing.java:654)
    at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:5336)

The contents of the bed files look as follows:

chr1    4132274 4133274 .       18      .       1.83145 -1      -1
chr1    4286878 4287878 .       21      .       2.16857 -1      -1
chr1    4322211 4323211 .       14      .       1.49586 -1      -1
chr1    4335945 4336945 .       15      .       1.52915 -1      -1
chr1    4406497 4407497 .       72      .       7.25079 -1      -1
chr1    4481706 4482706 .       41      .       4.1287  -1      -1
chr1    4506335 4507335 .       31      .       3.14219 -1      -1
chr1    4758160 4759160 .       34      .       3.49081 -1      -1
chr1    4759678 4760678 .       16      .       1.60139 -1      -1
chr1    4904160 4905160 .       24      .       2.43038 -1      -1

ConvertGeneTable "main" java.util.NoSuchElementException

Dear all,

I am trying to creat new CHOORDS and ANCHORFILES from Ensemble Homo_sapiens.GRCh38.103.gtf.gz, but I am getting this error message: Exception in thread "main" java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at edu.mit.compbio.ChromHMM.ConvertGeneTable.convertGeneTableToAnnotations(ConvertGeneTable.java:145)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:15228)
I have checked the previous thread about this issue and it didn't help tbh. I checked the chromsizes file "with ensemble notation", and the problem doesn't seem from it, I think it's from the gtf file, I opened the ensemble and the UCSC annotation files via R and they seem to be similar, both 9 columns, tab-delimited files, and provide the same information, have anyone faced this problem before? Thanks in advance.

The picture is attached to the error message from the terminal.

ChromHMM v1.21 errors: java.lang.OutOfMemoryError

Hi there,

I am using ChromHMM conda version which was downloaded like this:

conda create -n chromhmm
conda activate chromhmm
conda install --yes -c bioconda chromhmm

Then I converted all input files from bam to bed

bedtools bamtobed -i input.bam > output.bed

And finally ran ChromHMM.sh BinarizeBed with default options

ChromHMM.sh BinarizeBed -b 200 GRCm38.genome $PWD/bed $PWD/cellmarkfiletable.tsv $PWD/binarized_bed

However, I got this error message:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at edu.mit.compbio.ChromHMM.Preprocessing.makeBinaryDataFromBed(Preprocessing.java:1027)
        at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13054)

Versions:
bedtools:
v2.29.2

Java:
openjdk version "1.8.0_152-release"
OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12)
OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)

chromHMM:
This is Version 1.21 of ChromHMM (c) Copyright 2008-2012 Massachusetts Institute of Technology

Memory/CPUs available:
MemTotal: 1044135232 kB
MemFree: 214513088 kB
MemAvailable: 970925604 kB
Cached: 742086360 kB
SwapCached: 0 kB
SwapTotal: 6292476 kB
SwapFree: 6292476 kB
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Socket(s): 2

I am not entirely familiar with java memory management yet, but is there a way to manage the java.lang.OutOfMemoryError and wasnt the ChromHMM.sh supposed to manage that automatically?

Thanks!

Best regards,
Adrija

Compare with 2 different result

Hello !

I'm biginer using chromHMM. I'm just curious about how to compare with 2 different result.
Many people want to know how different they are. For example, I'm curious about which chromatin state changes are included after treatment something.

chromHMM overlap enrichments error for input string: "E"

I get the following error when I try to use OverlapEnrichments on a model with 18 states.

Computing Enrichments...
Exception in thread "main" java.lang.NumberFormatException: For input string: "E"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at edu.mit.compbio.ChromHMM.StateAnalysis.enrichmentMax(StateAnalysis.java:691)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13336)**

The problem is apparently the first row of the third column because the input string value in the command changes to whatever I put in the model file in that position unless the value I change it to is an integer, in which case it gives me this error:

Computing Enrichments...
Exception in thread "main" java.lang.IllegalArgumentException: Binsize of 200 does not agree with input segment 18 11 200 -5142465.9198876005 200
at edu.mit.compbio.ChromHMM.StateAnalysis.enrichmentMax(StateAnalysis.java:694)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13336)

Is there something wrong with my segmentation file? It's labeled as model_18.txt from the original LearnModel ouput, and I thought the E was just meant to show that the states were ordered based on emission.

Thank you!

Coordinates in CpGIsland.MSU7.bed not assigned to any state

I run OverlapEnrichment for my data, but I got this error:
Exception in thread "main" java.lang.IllegalArgumentException: Coordinates in CpGIsland.MSU7.bed not assigned to any state
at edu.mit.compbio.ChromHMM.StateAnalysis.enrichmentMax(StateAnalysis.java:1065)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:10015)
I was wondering how I can fix this error. I made CpGislands file by running EMBOSS.
Thanks
Maryam

output truncates chromosomes

I ran LearnModel on binarized BEDs to completion without error. HTML and dense_BEDs are generated, but the output coordinates are all randomly truncated halfway down the genome. Any help resolving this issue would be appreciated.

All output BEDs exclude coordinates >100Mb in chr1, e.g.

$ grep chr1 WT_5_dense.bed | sort -k3,3n | tail
chr1 99565000 99565600 3 0 . 99565000 99565600 51,255,153
chr1 99565600 99566000 4 0 . 99565600 99566000 255,255,0
chr1 99566000 99567200 3 0 . 99566000 99567200 51,255,153
chr1 99567200 99567400 5 0 . 99567200 99567400 204,0,51
chr1 99567400 99569400 2 0 . 99567400 99569400 0,102,0
chr1 99569400 99571400 5 0 . 99569400 99571400 204,0,51
chr1 99571400 99576800 2 0 . 99571400 99576800 0,102,0
chr1 99576800 99577400 5 0 . 99576800 99577400 204,0,51
chr1 99577400 99578600 4 0 . 99577400 99578600 255,255,0
chr1 99578600 99582400 2 0 . 99578600 99582400 0,102,0

The chromsizes file looks to be correct

$ cat ../CHROMSIZES/hg38.txt | grep chr1
chr1 248956422

My input BEDs certainly have coordinates >100Mb.

$ grep chr1 ../../bed/WT-K27Ac_Apr09.bed | sort -k3,3n | tail
chr1 248946385 248946422 NB552575:103:HJKCVBGXJ:3:22406:11676:9181/2 0 -
chr1 248946385 248946422 NB552575:103:HJKCVBGXJ:3:22512:15394:19501/1 0 -
chr1 248946385 248946422 NB552575:103:HJKCVBGXJ:4:12510:23537:3827/1 0 -
chr1 248946385 248946422 NB552575:103:HJKCVBGXJ:4:13410:21523:15639/2 0 -
chr1 248946386 248946422 NB552575:103:HJKCVBGXJ:3:13405:6777:19110/2 0 -
chr1 248946388 248946422 NB552575:103:HJKCVBGXJ:1:22312:16836:14295/1 0 -
chr1 248946389 248946422 NB552575:103:HJKCVBGXJ:1:11311:7160:11685/2 0 -
chr1 248946389 248946422 NB552575:103:HJKCVBGXJ:1:12208:18474:1096/2 0 -
chr1 248946389 248946422 NB552575:103:HJKCVBGXJ:1:23312:23278:7231/1 0 -
chr1 248946393 248946422 NB552575:103:HJKCVBGXJ:4:21603:4777:6602/2 18 -

Removing multi-mapping reads when using ChromHMM

Hi,
thank you for your work and your amazing software.

I'm using ChromHMM to build models with some histone marks. In my particular pipeline, as is common in lots of pipelines, I filter out multi-mapping reads. There are some marks (e.g. H3K9me3) which have large proportions of multi-mapping reads, so I was wondering: because of how ChromHMM works, what would be better: to input my BAM files with "all" the information (unfiltered), or to input my filtered BAMs?

Can ChromHMM benefit from inputting multimapping reads? Do you have any experience with this?

thank you for the help!

inputdir with symlinks

I was trying to run BinarizeBam with the inputdir filled with soft symlinks to my bam files. I got this:
WARNING not able to load any data for ...
Is there something that prevents the use of symlinks? Would be handy to be able to use them if you bam files are all over the place and you don't want to create extra copies.

ChromHMM state transmissions

I'm running ChromHMM with cells before and after knockdown of a gene, with 4 histone marks. I was playing around with the state transmission plot and am wondering whether what "From" and "To" means. I was assuming "From" means the first sample group that is defined in the config file, but it seems the other way round?

Push data files into git repository

Hi,
Could you push data files which are available into the zip file from http://compbio.mit.edu/ChromHMM/? (i.e. ANCHORFILES, CHROMSIZES, COORDS and SAMPLEDATA_HG18 folders).
This could be useful to get a clean tarball with all files required for testing ChromHMM.

Thanks

Exception in thread "main" java.lang.UnsatisfiedLinkError:

Hi,

Thanks for your impressive tools! I have used it without error some years ago but using the same conda environnement now it did'nt work, see the error below :

Exception in thread "main" java.lang.UnsatisfiedLinkError: /opt/share/FLOCAD/userspace/cpichot/miniconda3/envs/chromHMM/x86_64-conda_cos6-linux-gnu/sysroot/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.131.x86_64/jre/lib/amd64/libnet.so: libgconf-2.so.4: Ne peut ouvrir le fichier d'objet partagé: Aucun fichier ou dossier de ce type
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1968)
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1893)
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1875)
	at java.lang.Runtime.loadLibrary0(Runtime.java:849)
	at java.lang.System.loadLibrary(System.java:1088)
	at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:67)
	at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:47)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.InetAddress.<clinit>(InetAddress.java:291)
	at sun.font.FcFontConfiguration.getFcInfoFile(FcFontConfiguration.java:352)
	at sun.font.FcFontConfiguration.readFcInfo(FcFontConfiguration.java:425)
	at sun.font.FcFontConfiguration.init(FcFontConfiguration.java:94)
	at sun.font.FcFontConfiguration.<init>(FcFontConfiguration.java:76)
	at sun.awt.X11FontManager.createFontConfiguration(X11FontManager.java:747)
	at sun.font.SunFontManager$2.run(SunFontManager.java:431)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.font.SunFontManager.<init>(SunFontManager.java:376)
	at sun.awt.X11FontManager.<init>(X11FontManager.java:32)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at java.lang.Class.newInstance(Class.java:383)
	at sun.font.FontManagerFactory$1.run(FontManagerFactory.java:83)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.font.FontManagerFactory.getInstance(FontManagerFactory.java:74)
	at sun.font.SunFontManager.getInstance(SunFontManager.java:250)
	at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:264)
	at sun.java2d.SunGraphics2D.getFontMetrics(SunGraphics2D.java:819)
	at org.tc33.jheatchart.HeatChart.measureComponents(HeatChart.java:1411)
	at org.tc33.jheatchart.HeatChart.getChartImage(HeatChart.java:1301)
	at org.tc33.jheatchart.HeatChart.getChartImage(HeatChart.java:1354)
	at edu.mit.compbio.ChromHMM.Util.printImageToSVG(Util.java:140)
	at edu.mit.compbio.ChromHMM.ChromHMM.printEmissionImage(ChromHMM.java:1547)
	at edu.mit.compbio.ChromHMM.ChromHMM.trainParametersParallel(ChromHMM.java:11563)
	at edu.mit.compbio.ChromHMM.ChromHMM.buildModel(ChromHMM.java:1039)
	at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13867)

Interestigly the model in output I have the emmission and transition files :
see output in console 👍

Using 12 threads for Baum-Welch training
Writing to file /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometry_oct2018/machine_learning/building_promoter_states_model/chromoHMM/models_based_on_epigenomeLandscape/test_on_all_chIPseq_data/output_leran_model_leaf_melo_chipMedipData_defaultOption/transitions_8.txt
Writing to file /K/FLOCAD/Bioinfo/Workspace/ClementP/DavidL/ATACseq/ATACseq_cytometry_oct2018/machine_learning/building_promoter_states_model/chromoHMM/models_based_on_epigenomeLandscape/test_on_all_chIPseq_data/output_leran_model_leaf_melo_chipMedipData_defaultOption/emissions_8.txt

Do you know how can I fix the java error please to continue the analysis ?

Thanks in advance!

Credible interval of fold-enrichment (OverlapEnrichment functionality)

Dear developer,

the function OverlapEnrichment computes fold-enrichment for each combination of state and external annotation.

Would it be possible to obtain e.g. 95% confidence intervals or credible intervals of the fold-enrichment?

P.S. Any idea how this functionality might be expanded (in case not yet available)?
Given independent and identically distributed observations (here genomic bins with chromatin states), I would have applied bootstrapping with replacement, however, here we have dependent observations.

Best

Coverage as Learnmodel input feature

Hi, @jernst98

Thanks for your great work!

Together with @makc-sel we are working on building models for Apis Mellifera based on 4 Chip-seqs and WGBS data. We noticed that WGBS calls processed similarily to this paper results in 0, 1 and lots of 2(missing data)

But in case of Chip-seq data input there only 0 or 1 after the binarization. I'd like to also input the information about mapability of regions in the Chip-seq data. This approach will allow to distinguish the bins without peak (0) and bins uncovered by sequencing (2).

So the question is the following - is it better to

create the standalone track with coverage by binarizing bam of non-antibody control sample

to assess the coverage separately and replace zeros by 2's in binarized chip-seq data in low-coverage regions?

running ChromHMM LearnModel on a headless node (no X11)

Hi,

Is it possible to run ChromHMM LearnModel on a Linux node with no X11?

I am running it like below and get the following error:

java -mx1600M -jar /bi/home/rocks-av/ChromHMM/ChromHMM.jar LearnModel -p 0 -nobrowser /bi/group/cegx/CEGX_Run434/CEGX_Run434-12345678/inputbams/BINARIES /bi/group/cegx/CEGX_Run434/CEGX_Run434-12345678/inputbams/CHROMHMM 2 hg38                                                                                                                                                                                            Using 8 threads for Baum-Welch training
Writing to file /bi/group/cegx/CEGX_Run434/CEGX_Run434-12345678/inputbams/CHROMHMM/transitions_2.txt
Writing to file /bi/group/cegx/CEGX_Run434/CEGX_Run434-12345678/inputbams/CHROMHMM/emissions_2.txt
connect localhost port 6044: Connection refused
Exception in thread "main" java.awt.AWTError: Can't connect to X11 window server using 'localhost:10.0' as the value of the DISPLAY variable.
        at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
        at sun.awt.X11GraphicsEnvironment.access$200(X11GraphicsEnvironment.java:65)
        at sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:115)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:74)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at java.awt.GraphicsEnvironment.createGE(GraphicsEnvironment.java:103)
        at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:82)
        at java.awt.image.BufferedImage.createGraphics(BufferedImage.java:1181)
        at org.tc33.jheatchart.HeatChart.measureComponents(HeatChart.java:1406)
        at org.tc33.jheatchart.HeatChart.getChartImage(HeatChart.java:1301)
        at org.tc33.jheatchart.HeatChart.getChartImage(HeatChart.java:1354)
        at edu.mit.compbio.ChromHMM.Util.printImageToSVG(Util.java:140)
        at edu.mit.compbio.ChromHMM.ChromHMM.printEmissionImage(ChromHMM.java:1215)
        at edu.mit.compbio.ChromHMM.ChromHMM.trainParametersParallel(ChromHMM.java:4817)
        at edu.mit.compbio.ChromHMM.ChromHMM.buildModel(ChromHMM.java:816)
        at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:6412)

Sliding window bins vs. separate bins

Dear Developers,

I am currently using ChromHmm using custom input (BED files binarized by myself), with the major difference being that the bins are overlapping, i.e. created using a sliding window approach (e.g. bin size 200bp, slide = 50bp). Could such an input cause any problems/biases in the model learning procedure?

I am mostly interested in the effects by such a binarization on the model learning, and not as much in the neightborhood enrichment and the other annotations provided by ChromHmm.

The goal of doing this is just to increase the resolution of the analysis.

Thanks!

Bug: ChromHMM in screen

when using screen command and then execute ChromHMM in that newly created screen, it use abort with:
Writing to file OUTPUTSAMPLE/transitions_10.txt
Writing to file OUTPUTSAMPLE/emissions_10.txt
Exception in thread "main" java.awt.AWTError: Can't connect to X11 window server using 'localhost:11.0' as the value of the DISPLAY variable.
at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
at sun.awt.X11GraphicsEnvironment.access$200(X11GraphicsEnvironment.java:65)
at sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:115)
at java.security.AccessController.doPrivileged(Native Method)
at sun.awt.X11GraphicsEnvironment.(X11GraphicsEnvironment.java:74)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at java.awt.GraphicsEnvironment.createGE(GraphicsEnvironment.java:103)
at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:82)
at java.awt.image.BufferedImage.createGraphics(BufferedImage.java:1181)
at org.tc33.jheatchart.HeatChart.measureComponents(HeatChart.java:1406)
at org.tc33.jheatchart.HeatChart.getChartImage(HeatChart.java:1301)
at org.tc33.jheatchart.HeatChart.getChartImage(HeatChart.java:1354)
at edu.mit.compbio.ChromHMM.Util.printImageToSVG(Util.java:140)
at edu.mit.compbio.ChromHMM.ChromHMM.printEmissionImage(ChromHMM.java:1285)
at edu.mit.compbio.ChromHMM.ChromHMM.trainParameters(ChromHMM.java:6204)
at edu.mit.compbio.ChromHMM.ChromHMM.buildModel(ChromHMM.java:885)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:10485)

discrepancy between X_Y_overlap.png and X_Y_overlap.txt

Hi,
I wanted to redo the overlap plot based on the .txt file output. However, the values do not correlate with the plot.
E.g. for state 11 the enrichment seems to be higher on TSS2kb compared to the TSS (see image left). However, according to the txt file, the genome value is 30 and the gene 11 (se image middle).

Dividing each column by its base value and scaling it all to one (genome column is treated differently) creates a fairly similar plot, however its not identical (e.g the above described difference persists).

So it seems I am either misinterpreting the plot or the table, but I am not sure which one.

Chromatin marks for ChIP seq data

Hi @jernst98
I have a question about using chromHMM in a standard ChIP seq pipeline (mostly used for determining binding of multiple transcription factors and DNA/mtDNA binding proteins) for the functional annotation of peaks. This might be a very basic question but I am a bit unsure of what marks I should use in the cellmarkfile.txt

I tried with all the marks mentioned here: https://static-content.springer.com/esm/art%3A10.1038%2Fnbt.1662/MediaObjects/41587_2010_BFnbt1662_MOESM5_ESM.pdf

but firstly it takes very long to run and secondly I'm not quite sure if all the marks would be relevant in such cases.

I would really like to know your opinion on this

Thank you.

ChromHMM with additional genomes

ChromHMM's usability can be improved for users working with genomes that aren't included with the software. If ChromHMM is installed in a user's local directory, they can manually generate files and put them in the ANCHORFILES, COORDS, and CHROMSIZES directories. But for the purpose of running ChromHMM on a computing cluster or as part of a publicly distributed pipeline, users don't always have permission to write to the location that ChromHMM is installed and might not have the skills to set up their own local installation. It would help if the location of these files could be specified, and weren't required to be in the directory where ChromHMM is installed. Going a step further, it would be great if ChromHMM included commands to generate these files from a GTF/GFF file (COORDS and ANCHORFILES) and FASTA file (CHROMSIZES).

error in binarizeSignal

Hi! I am running ChromHMM binarizeSignal for a signal matrix, with comment "java -Xmx16G -jar ChromHMM.jar BinarizeSignal signalFile binaryFile", and occationally get an error (shown below)

Exception in thread "main" java.util.NoSuchElementException
at java.base/java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at edu.mit.compbio.ChromHMM.Preprocessing.makeBinaryDataFromSignalUniform(Preprocessing.java:3207)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13197)

Do you have any suggestions?

Keep reporting error when I try to run CompareModels

Hi ChromHMM community,
I am trying to compare the model learned(15 states) from my data to the 25 state reference model using the command:
java -mx40000M -jar ../Software/ChromHMM/ChromHMM.jar CompareModels ./emissions_15.txt comparedir/ CompareModels_25_vs_15_output &
Or java -mx40000M -jar ../Software/ChromHMM/ChromHMM.jar CompareModels ./emissions_models_25_imputed12marks.txt comparedir/ CompareModels_25_vs_15_outputbut. But java keeps reporting the same error:
Exception in thread "main" java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at edu.mit.compbio.ChromHMM.StateAnalysis.makeModelEmissionCompare(StateAnalysis.java:3709)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13364)

My 15 states model file(emissions_15.txt) is like:

15 9 E -7.269922091477281E7 200
probinit 1 5.235622776341956E-17
probinit 2 1.2517148579080288E-143
probinit 3 1.904313880953911E-90
probinit 4 3.503455166677451E-55
probinit 5 0.0
probinit 6 0.0
probinit 7 0.033447000056057066
probinit 8 7.423335149345567E-38
probinit 9 0.00142119621268763
probinit 10 0.0
probinit 11 8.880188451275828E-198
probinit 12 0.5565674946701097
probinit 13 0.13754666026699372
probinit 14 0.1661923584390444
probinit 15 0.10482529035510728
transitionprobs 1 1 0.8372502461768572
transitionprobs 1 2 0.06050386521636649
transitionprobs 1 3 0.0030487339485235484

my 25 states model file(emissions_models_25_imputed12marks.txt) is like:
25 12 E -8.803547773719434E8 200
probinit 1 0.0
probinit 2 0.0
probinit 3 0.0
probinit 4 0.0
probinit 5 0.0
probinit 6 0.0
probinit 7 0.0
probinit 8 0.0
probinit 9 0.0
probinit 10 0.0
probinit 11 0.0
probinit 12 0.0
probinit 13 0.0
probinit 14 0.0
probinit 15 0.0
probinit 16 0.0
probinit 17 2.276661721543571E-275
probinit 18 2.150547469890972E-283
probinit 19 1.2298440620696358E-152
probinit 20 0.0
probinit 21 9.446359175964622E-254
probinit 22 0.0
probinit 23 0.0
probinit 24 0.0
probinit 25 1.0
transitionprobs 1 1 0.5418988538970787
transitionprobs 1 2 0.196631989532827
transitionprobs 1 3 0.19101097372676046
transitionprobs 1 4 6.23309398148488E-4
transitionprobs 1 5 0.0
transitionprobs 1 6 0.0
transitionprobs 1 7 0.0
transitionprobs 1 8 2.1212879554907406E-4

Also I have a question here that, can I compare a model learned from mouse data to a model learned from human data? thanks!
Best,
Ruitu

overlap enrichment issue

Hi! dear developer.

When I ran overlap enrichemnt, I met a problem. I searched in the issues websites, but no one had similar question.

my command is "java -jar /xtdisk/renjie_group/chixu/chromHMM/ChromHMM.jar OverlapEnrichment -signal /LearnModel/fetal-liver_13_segments.bed /chromHMM/inputcoorddir/ /chromHMM/OverlapEnrichment-signal/OverlapEnrichment-family"

and the error is
Computing Enrichments...
Exception in thread "main" java.lang.NumberFormatException: For input string: "AluSp"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at edu.mit.compbio.ChromHMM.StateAnalysis.enrichmentMax(StateAnalysis.java:1097)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13926)

looking forward your suggestions. and thank you

Why is 'binarizeBed -peaks' not recommended for broad peaks?

Dear jernst98,

I would like to create a chromatin state annotation using publicly available chip data, e.g. on blueprint or roadmaps. The raw data is only available after some bureaucracy, but the enriched (peak) regions are freely downloadable. I saw you can use binarizeBed with the -peaks flag to have chromHMM handle the input bed files as peak files, however, according to the manual, this is not recommended for broad peaks.
Why is binarizeBed -peaks not recommended for broad peaks? How would it affect the results?

Thank you for your answers,

Best

ChromHMM BinarizeBam error "Permission denied"

Hi,

I'm trying to run the script below to binarise my bam files before feeding them to ChromHMM LearnModel.
However, I keep getting the following error:
/var/spool/torque/mom_priv/jobs/378907.SC: line 18: /lmod/apps/chromhmm/1.19/CHROMSIZES/hg38.txt: Permission denied .

I've tried to copy the chromsizes file to my working directory, which did not help. Then I used a completely unrelated chromsizes file, which I had previously used for a different analysis, and I still got the error "Permission denied". To me, that indicates this is either a bug or an error message that actually means something else is going wrong.

If anyone has any idea why this might be happening, that would be greatly appreciated.

Thank you!


cd $PBS_O_WORKDIR

ml apps/chromhmm/1.19

#mkdir chromHMM_out_binary

java -mx7000M -jar /lmod/apps/chromhmm/1.19/ChromHMM.jar BinarizeBam -paired \
/lmod/apps/chromhmm/1.19/CHROMSIZES/hg38.txt \
$PBS_O_WORKDIR \
chromHMM_table \
./chromHMM_out_binary

WGBS binarization

Hello!
I'm trying to add WGBS data to ChromHMM model, and I ran into the issue with binarizing WGBS bedgraph file.
From BSBOLT software I got the file containing those columns:

Chromosome
Start Position
End Position
Methylation Percentage, percentage of methylated bases to total observed bases
Methylated Bases, methylated nucleotides observed
Unmethylated Bases, total unmethylated bases
The paper (https://www.nature.com/articles/s42003-021-01756-4) suggests this method to process data:

For WGBS data, BED files were downloaded from the ENCODE portal (Supplementary Data 1), These files contain, among other values, the percent methylation at each CpG dinucleotide in the genome (ranging from 1–100). For each set of two replicates, these values were averaged in 200-bp genomic bins to obtain the mean percent methylation of CpGs in each window. The 200-bp bins were subsequently binarized based on a 50% methylation threshold. Bins that did not contain any CpGs were marked as missing data, as specified by the ChromHMM binarized data format.

But ChromHMM LearnModel seems not to support missing values. Can you please suggest the way to deal with them?
And how can we binarize bedgraph file?

what influence will be made among histone markers for different sequencing depth ?

Hi,
my data without control, including seven histone markers and two cell types. These data are different sequencing depth, I noticed the BinarizeBed parameter output the result of the global average number of reads per bin based on the poisson distribution. Could you explain my doubt about that normalize reads count among different sequencing depth of histone markers ? Thanks very much !

Please use release tags

Hi,
the Debian Med team intends to package ChromHMM for Debian. To make sure we will always spot your latest and greatest release it would help a lot if you would use release tags. There are automatic tools inside Debian that will detect a new tag and we will thus know for sure what you intend to be used on user machines.
Thanks for considering, Andreas.

Understanding of OverlapEnrichment command

Hi,

I'm trying to use your software to build model based on melo epigenomics data. I'm binarized my bam file and I performed the model with the learnModel command.
Now, I'm trying to used this model and more specially to do the overlap enrichment but I' don't understand what is the inputCoordDir? Is the coordonate of wath ?my gene or my peaks ? Do you have an expemple file please?

Thanks in advance
best regards

bam not found by BinarizeBam command

Hi I'm trying to use chromHMM on my Chipseq data. But I got an error trying to binarize my bam file with the BinarizeBam command. Please can you help to fix this issue.

here you have my cellMarkFile.tab :

leaf H3K9ac trimmed_11375_Melon_MBDL36_ACTTGA_L00345_001_filtered.bam
leaf H3K27me3 trimmed_11376_Melon_MBDL37_GATCAG_L00345_001_filtered.bam
leaf meDip trimmed_M_15_A781_MBDL18_P-16_ArabidopsisandMelon_CGATGT_L008_001_filtered.bam

here you have the executed command :

$java -mx1600M -jar "$classPath/ChromHMM.jar" BinarizeBam -b $binSizeOption -t $outputSignalDir $pairedOption $chromSizePath $bamFileDir $cellMarkFileTablePath $outputBinaryDir

Here you have the java error :

Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.io.FileNotFoundException: data/trimmed_11375_Melon_MBDL36_ACTTGA_L00345_001_filtered.bam (Aucun fichier ou dossier de ce type)
at htsjdk.samtools.FileInputResource$1.make(SamInputResource.java:171)
at htsjdk.samtools.FileInputResource$1.make(SamInputResource.java:165)
at htsjdk.samtools.util.Lazy.get(Lazy.java:24)
at htsjdk.samtools.FileInputResource.asUnbufferedSeekableStream(SamInputResource.java:194)
at htsjdk.samtools.FileInputResource.asUnbufferedInputStream(SamInputResource.java:199)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:245)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:133)
at edu.mit.compbio.ChromHMM.Preprocessing.loadGrid(Preprocessing.java:195)
at edu.mit.compbio.ChromHMM.Preprocessing.makeBinaryDataFromBed(Preprocessing.java:1016)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:12483)
Caused by: java.io.FileNotFoundException: data/trimmed_11375_Melon_MBDL36_ACTTGA_L00345_001_filtered.bam (Aucun fichier ou dossier de ce type)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:241)
at htsjdk.samtools.seekablestream.SeekableFileStream.(SeekableFileStream.java:47)
at htsjdk.samtools.FileInputResource$1.make(SamInputResource.java:169)
... 9 more

More over here you have the my environment that I used :

java==1.7 (tested also with java8 and java11 but failed)
chromHMM==1.19 (also tried with chromHMM==1.15 but failed)

Warning did not find data

I use my own data to perform BinarizeBam with the command"
java -jar ~/opt/softs/ChromHMM/ChromHMM.jar BinarizeBam -o ../3ChroHMM/ ../tair.genome ./ cellmarkfiletable ../3ChroHMM/"
but it gave the warnings as followings:
Warning did not find data for leaf H3K27ac treating as missing
Warning did not find data for leaf H3K27me3 treating as missing
Warning did not find data for leaf H3K36ac treating as missing
Warning did not find data for leaf H3K36me3 treating as missing
Warning did not find data for leaf H3K4me1 treating as missing
Warning did not find data for leaf H3K4me2 treating as missing
Warning did not find data for leaf H3K4me3 treating as missing
Warning did not find data for leaf H3K9ac treating as missing
Warning did not find data for leaf H3K9me2 treating as missing
Writing to file ../3ChroHMM//leaf _Pt_binary.txt
Writing to file ../3ChroHMM//leaf _Mt_binary.txt
Writing to file ../3ChroHMM//leaf _4_binary.txt
Writing to file ../3ChroHMM//leaf _2_binary.txt
Writing to file ../3ChroHMM//leaf _3_binary.txt
Writing to file ../3ChroHMM//leaf _5_binary.txt
Writing to file ../3ChroHMM//leaf _1_binary.txt
Warning did not find data for leaf H3K27me1 treating as missing
Warning did not find control data for leaf H3K27ac treating as missing
Warning did not find control data for leaf H3K27me1 treating as missing
Warning did not find control data for leaf H3K36me3 treating as missing
Warning did not find control data for leaf H3K4me1 treating as missing
Writing to file ../3ChroHMM//leaf_Pt_controlsignal.txt
Writing to file ../3ChroHMM//leaf_Mt_controlsignal.txt
Writing to file ../3ChroHMM//leaf_4_controlsignal.txt
Writing to file ../3ChroHMM//leaf_2_controlsignal.txt
Writing to file ../3ChroHMM//leaf_3_controlsignal.txt
Writing to file ../3ChroHMM//leaf_5_controlsignal.txt
Writing to file ../3ChroHMM//leaf_1_controlsignal.txt
Writing to file ../3ChroHMM//leaf_Pt_binary.txt
Writing to file ../3ChroHMM//leaf_Mt_binary.txt
Writing to file ../3ChroHMM//leaf_4_binary.txt
Writing to file ../3ChroHMM//leaf_2_binary.txt
Writing to file ../3ChroHMM//leaf_3_binary.txt
Writing to file ../3ChroHMM//leaf_5_binary.txt
Writing to file ../3ChroHMM//leaf_1_binary.txt

My cell files are as followings: (some marks have no control files with no fourth columns):
$ less cellmarkfiletable
leaf H3K4me1 H3K4me1.bam
leaf H3K4me2 H3K4me2.bam H3K4me3_H3K4me2.input.bam
leaf H3K4me3 H3K4me3.bam H3K4me3_H3K4me2.input.bam
leaf H3K9ac H3K9ac.bam H3K9ac.input.bam
leaf H3K9me2 H3K9me2.bam H3K9me2_H3K27me3.input.bam
leaf H3K27ac H3K27ac.bam
leaf H3K27me1 H3K27me1.bam
leaf H3K27me3 H3K27me3.bam H3K9me2_H3K27me3.input.bam
leaf H3K36ac H3K36ac.bam H3K36ac.input.bam
leaf H3K36me3 H3K36me3.bam

my marks file is as followings:
$ ls
cellmarkfiletable H3K27me1.bam H3K27me3.ni.bam H3K36ac.input.bam H3K4me1.bam H3K4me3.bam H3K9ac.bam H3K9me2.bam
H3K27ac.bam H3K27me3.bam H3K36ac.bam H3K36me3.bam H3K4me2.bam H3K4me3_H3K4me2.input.bam H3K9ac.input.bam H3K9me2_H3K27me3.input.bam

Is there any suitable way to compare the states of different tissue?

Hi, dear developer

I am wondering whether you can recommend some ways to compare states between different tissue. Now I find some paper will use unionbedg in bedtools to track state between different tissue. But as the tissue number increase, I find the result will be difficult to explain.

Best wishes

Guandong Shang

ChromHMM overlap enrichment

Hello, I've used ChromHMM with ChIP-seq aligment and identified some chromatin states of interest.

I am interested in the overlapping with other datasets of regions to annotate them according to the segment/state they are located on. OverlapEnrichment seems exactly what I'm looking for, but I think I'm failing to interpret it correctly.

After getting the segmentation I'm using bedtools intersect to get the intersections between my regions and the segments, and be able to affirm a region or a peak is in a certain chromatin state.
However, the overlaps I'm getting don't make sense compared to the plots from OverlapEnrichment. I'm aware this is showing FoldChange computed as specified in the manual, but I'm getting very different results. For example, there's a case where more than half of my regions overlap with a segment which is not enriched according to the figure.

Do you ahve any advice on how should I annotate my regions? Have I missed any option on ChromHMM to get the segments corresponding to certain regions?

Thanks for your work and your time

How are paired-end reads handled?

Does ChromHMM recognize paired-end reads, and if so, how are paired-end reads handled during the "binarization" process? How are the parameters to BinarizeBam affected by paired-end data, in particular the -center, -e, -n, -s parameters?

Is it recommended in this case to first convert PE BAM files into BED/BEDPE format where the bed entry refers to the entire sequenced fragment?

Thanks

Calculation of Overlap/Neighborhood/Enrichment

Dear developers,

I am having trouble understanding how the different enrichment/overlap analysis are performed. Could you please provide the explicit equations or a description in your wiki or here on how this is done?

If this is already explained in some publication, could you please name it here.

Best Regards

Error when using ConvertGeneTable function

Hi, I encountered the following error when trying to run ConvertGeneTable on genepred file generated using UCSC-tools from GTF file of a custom genome on ChromHMM 1.24.

I got the following error:

Exception in thread "main" java.lang.NullPointerException
        at edu.mit.compbio.ChromHMM.ConvertGeneTable.convertGeneTableToAnnotations(ConvertGeneTable.java:216)
        at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:15266)

The code I ran:

java -mx8000M -jar ChromHMM.jar ConvertGeneTable -nobin -noheader \
        -l chromosome_lengths.tsv \
        -v Bar2_p4.Flye.gm/ANCHORFILES \
        -u Bar2_p4.Flye.gm/COORDS \
        Bar2_p4.Flye.gm.mod.genepred Bar2_p4.Flye.gm Bar2_p4

To begin with, I'm not sure how to use custom genome using the ConvertGeneTable function. I'm not sure what to put for the last argument "assembly".

Any help is greatly appreciated!

Applying segmentation analysis to ChIP-seq data from DNA binding proteins

Hi @jernst98
I would like to ask whether it's possible, at least in priniciple, to apply ChromHMM to ChIP-seq data of transcription factors or other DNA binding proteins?

NullPointerException in ConvertGeneTable

I'm trying to create a gene annotation for the Gallus gallus 5.0 assembly, but getting a NullPointerException.

Command:

 java -Xmx8G -jar <path_to>/ChromHMM.jar ConvertGeneTable -gzip
 -l <path_to>/Gallus_gallus.Gallus_gallus-5.0.dna.toplevel.fa.fai <path_to>/gg5_ucsc_genome.txt gg5 gg5

Error returned is:

Exception in thread "main" java.lang.NullPointerException
	at edu.mit.compbio.ChromHMM.ConvertGeneTable.convertGeneTableToAnnotations(ConvertGeneTable.java:199)
	at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:13080)

Line 199 of ConvertGeneTable.java is:

int nchromlength =((Integer) hmlengths.get(szchrom)).intValue();

I don't see hmlengths.get() in any of the ChromHMM source files (but perhaps imported from elsewhere?).

Is equal read depth necessary?

Is it recommended to down-sample the input BAM/BED files to the same sequencing depth before running ChromHMM?

how to associate color and FoldEnrichment in OverlapEnrichment result

Hi, Dr Ernst.
I am confused about the color in FoldEnrichment. Below is the result of my OverlapEnrichment.

> chromHMM_overlap
# A tibble: 10 x 8
   `State (Emission order)` `Genome %`   Exon   Gene    TES TES_1k    TSS TSS_1k
   <chr>                         <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 1                            20.4    1.32   1.78  0.257   0.850 0.126   0.451
 2 2                            38.8    0.858  0.514 1.22    0.916 0.624   0.756
 3 3                             0.700  1.67   1.55  0.541   1.31  0.627   1.16 
 4 4                            13.4    0.852  0.953 1.05    1.22  1.10    1.22 
 5 5                             1.13   0.523  0.602 0.870   0.961 1.95    1.19 
 6 6                            12.5    0.531  0.536 2.05    1.23  2.80    1.40 
 7 7                             2.62   1.71   1.61  0.991   1.18  5.63    2.04 
 8 8                             9.66   1.46   1.76  0.348   0.976 0.575   1.93 
 9 9                             0.778  1.78   1.71  0.276   1.14  0.631   1.79 
10 Base                        100     48.2   55.5   0.0277 41.0   0.0277 44.5

But I do not know how to associate color and Foldenrichment

In my opinion, maybe you use the column scale, like

chromHMM_overlap %>% 
  mutate_at(c(-1), function(y) (y - min(y)) / (max(y)-min(y)))

# A tibble: 9 x 8
  `State (Emission order)` `Genome %`    Exon   Gene    TES TES_1k    TSS TSS_1k
  <chr>                         <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 1                           0.517   0.638   1      0       0     0       0    
2 2                           1       0.267   0      0.538   0.141 0.0905  0.191
3 3                           0       0.917   0.822  0.158   1     0.0911  0.446
4 4                           0.333   0.262   0.348  0.441   0.798 0.176   0.482
5 5                           0.0113  0       0.0698 0.342   0.238 0.332   0.462
6 6                           0.311   0.00580 0.0174 1       0.808 0.486   0.597
7 7                           0.0503  0.947   0.870  0.409   0.716 1       1    
8 8                           0.235   0.744   0.992  0.0504  0.271 0.0815  0.929
9 9                           0.00204 1       0.950  0.0107  0.624 0.0917  0.844

And I did plot the same plot

But I not know sure whether it is true.So I put a issue.

Best wishes

Guandong Shang

Exception` in thread "main" java.util.zip.ZipException: invalid distance too far back

Dear all,
I was using the latest version of ChromHMM. But I could not get a .html file using LearnModel and got an err when like this:

Exception` in thread "main" java.util.zip.ZipException: invalid distance too far back
	at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:165)
	at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
	at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
	at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
	at edu.mit.compbio.ChromHMM.StateAnalysis.enrichmentMax(StateAnalysis.java:962)
	at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:14861)

How could I fix it ,thanks!

NoSuchElementException in ConvertGeneTable

I'm trying to use ConvertGeneTable on, but I'm getting a NoSuchElementException error and I can't find any information to help me troubleshoot this issue.

Command:

java -mx1200M -jar ChromHMM.jar ConvertGeneTable pathTo/RefSeq_EquCab3_UCSC_tableSchema.txt EC3 EquCab3

The chromosome length file is saved into CHROMSIZES/ with the correct default name based on my assembly, and a previous command worked using the file. I downloaded the gene table file from UCSC directly.

Error returned is:

Exception in thread "main" java.util.NoSuchElementException
at java.base/java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at edu.mit.compbio.ChromHMM.ConvertGeneTable.convertGeneTableToAnnotations(ConvertGeneTable.java:60)
at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:14406)

Help?

ConvertGeneTable: Exception in thread "main" java.lang.IllegalArgumentException: invalid strand

Hi, I have been trying to use the ConvertGeneTable command to generate the ANCHORFILES and COORDS files from my .genepred file, and i get the following error:

> java -Xmx1500m -jar ChromHMM.jar ConvertGeneTable E:/ImmunoMaps_ChromHMM/ASM1334776v1.genePred turbot ASM1334776v1 Exception in thread "main" java.lang.IllegalArgumentException: invalid strand 14322601 at edu.mit.compbio.ChromHMM.ConvertGeneTable.convertGeneTableToAnnotations(ConvertGeneTable.java:208) at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:15233)

According to the error it looks like for the second row (ENSSMAT00000008045) it skips the value on the third column (-) and takes the value on the fourth (14322601) as the "strand" for that entry. The format of the genePred file is as follows

ENSSMAT00000061270 1 - 51967 54707 52894 54707 2 51967,54230, 53881,54707, 0 ENSSMAG00000030099 cmpl cmpl 0,0,
ENSSMAT00000008045 1 - 14322601 14331939 14322791 14331939 8 14322601,14323719,14324234,14325124,14327855,14329940,14331566,14331785, 14322988,14324074,14324315,14325342,14327992,14330050,14331736,14331939, 0 ENSSMAG00000004910 cmpl cmpl 1,0,0,1,2,0,1,0,
ENSSMAT00000038001 1 + 9276975 9278241 9276975 9278241 1 9276975 9278241 0 ENSSMAG00000022266 incmpl cmpl 0
ENSSMAT00000044920 1 + 14333081 14337821 14337821 14337821 2 14333081,14335838, 14333224,14337821, 0 ENSSMAG00000024536 none none -2

Binarized data

Hello, I have a file with chromosomes and their lengths, this file also contains contigs and scaffolds, such as NW_****. I'm trying to make a module that translates bisulfite sequencing into binarized data with some preprocessing, for further merging with data that does not require such preprocessing. But in the process I discovered that the size of the chromosomes does not match. My module returns the number of rows equal to chr/bin_size rounded up + 2 header lines. When binarizing the data (BinarizeBed module), I got binarized data for each chromosome in separate files, but the number of lines in the file does not match the size of the chromosome divided by the size of the bin. For example, chromosome NC_037642.1 has a size of 13896941 nucleotides, when dividing this number by the bin size (by default is 200) we get 69484.705 bins, rounded up we get 694845 bins, in the file that contains binarized data we have 69486 lines, but 2 the first lines are responsible for header (the cell and marks). It is correct? Is this a bug?

Describing input control data when running ChromHMM

Hi, I am experimenting with different options in ChromHMM after reading the Nature Protocols paper from late 2017. In the paper, it says:

"ChromHMM can use input control data, such as those produced by mapping whole-cell extracts or IgG ChIP data. Control data provide information on local background signal levels, which ChromHMM can use to adjust binarization thresholds locally instead of having a uniform threshold genom wide. ChromHMM can also use control data as an additional model feature".

I am running ChromHMM with datasets of different tissue types where there is essentially one type of pulldown, which I label as "pull" below, and then the Input data equivalent to IgG ChIP, which I label "Input". See below:

D0004AD	Input	CEG66-107-4IC_S27_L00.bml.GRCh38.karyo.deduplicated.bam
D0016AP	pull	CEG66-107-8PC_S8_L00.bml.GRCh38.karyo.deduplicated.bam
D0004AD	pull	CEG66-107-4PC_S4_L00.bml.GRCh38.karyo.deduplicated.bam
D0012AL	pull	CEG66-112-16PC_S16_L00.bml.GRCh38.karyo.deduplicated.bam
D0011AK	Input	CEG66-112-15IC_S38_L00.bml.GRCh38.karyo.deduplicated.bam
D0005AE	pull	CEG66-107-5PC_S5_L00.bml.GRCh38.karyo.deduplicated.bam
D0002AB	pull	CEG66-107-2PC_S2_L00.bml.GRCh38.karyo.deduplicated.bam
D0023AW	Input	CEG66-117-23IC_S46_L00.bml.GRCh38.karyo.deduplicated.bam
[...]

I have successfully run this with subsets of my data, and it works well. I use:

java -mx$java_mem -jar $chromhmm BinarizeBam $genome_file $dir $listfile $binariesdir

and later on:

unset DISPLAY && java -mx$java_mem -jar $chromhmm LearnModel -s 1 -p $threads -nobrowser $binariesdir $modelsdir $num_states $assembly

It gives me states that, when inspected via the browser, seem to make sense: e.g. when there is no enrichment in the pulldowns, it associates one state to it, when there is, it associates one or more states to it, depending on the $num_states I use as a parameter.

But I want to confirm that I am letting ChromHMM know that "Input" in my second column corresponds to the IgG ChIP data equivalent, in case I am currently running it suboptimally and there is a better way of handling my datasets (1 mark, 1 input, several tissues).

The controls are relatively flat except for usual overpileups from Illumina sequencing. My understanding is that I should use option A: Treat controls as an additional feature.

Any thoughts? Are my options the correct one? Should I change them? Should I use the "Multiple cell types with stacking features"labeling described in the Nature protocols paper? E.g.

[...]
genome D0002AB_pull file2
genome D0004AD_Input file3
[...]

"flankwidthcontrol" in Preprocess

The ChromHMM manual said that flankwidthcontrol is used to compute average control counts for all bins within x-w and x+w for the xth bin, however, the sum of counts rather than average counts is calculated in the code (windowSumGrid method in Preprocessing class, see the following picture).

Manual:

Code:

So I am wondering which one is actually used? It seems that average value is more reasonable. Maybe the mean counts is computed somewhere else?

Thanks
Nan

jernst98 / chromhmm Goto Github PK

chromhmm's Introduction

See https://ernstlab.biolchem.ucla.edu/ChromHMM/ for more information on ChromHMM.

chromhmm's People

Contributors

Stargazers

Watchers

Forkers

chromhmm's Issues

Recommend Projects

Recommend Topics

Recommend Org