aidenlab / juicer Goto Github PK

View Code? Open in Web Editor NEW

408.0 17.0 182.0 79.14 MB

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments

Home Page: http://aidenlab.org

License: MIT License

Shell 50.71% Awk 32.88% Perl 14.61% Java 1.21% Python 0.59%

3d-genome 3d-genome-browser genomics hi-c bioinformatics ngs

juicer's Introduction

Read this first!!

To access Juicer 1.6 (last stable release), please see the Github Release. If you clone the Juicer repo directly from Github, it will clone Juicer 2, which is under active development. If you encounter any bugs, please let us know.

ENCODE's Hi-C uniform processing pipeline based on Juicer can be found here.

About Juicer

Juicer is a platform for analyzing kilobase resolution Hi-C data. In this distribution, we include the pipeline for generating Hi-C maps from fastq raw data files and command line tools for feature annotation on the Hi-C maps.

The beta release for Juicer version 1.6 can be accessed via the Github Release. The main repository on Github is now focused on the Juicer 2.0 release and is under active development. For general questions, please use the Google Group.

If you are interested in running Juicer in the cloud, you may want to check out the dockerized version of Juicer hosted by ENCODE.

If you have any difficulties using Juicer, please do not hesitate to contact us ([email protected])

If you use Juicer in your research, please cite: Neva C. Durand, Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. "Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments." Cell Systems 3(1), 2016.

Documentation

Please see the wiki for extensive documentation.

Questions?

For FAQs, or for asking new questions, please see our forum: aidenlab.org/forum.html.

Distribution

In this repository, we include the scripts for running Juicer on AWS, LSF, Univa Grid Engine, SLURM, and a single CPU.

The SLURM and CPU scripts are the most up to date. For cloud computing, we recommend the ENCODE uniform processing pipeline based on Juicer

/SLURM - scripts for running pipeline and postprocessing on SLURM

/CPU - scripts for running pipeline and postprocessing on a single CPU

/AWS - scripts for running pipeline and postprocessing on AWS Deprecated

/UGER - scripts for running pipeline and postprocessing on UGER (Univa) Deprecated

/LSF - scripts for running pipeline and postprocessing on LSF Deprecated

/misc - miscellaneous helpful scripts

Hardware and Software Requirements

Juicer is a pipeline optimized for parallel computation on a cluster. Juicer consists of two parts: the pipeline that creates Hi-C files from raw data, and the post-processing command line tools.

Cluster requirements:

Juicer requires the use of a cluster or the cloud, with ideally >= 4 cores (min 1 core) and >= 64 GB RAM (min 16 GB RAM)

Juicer currently works with the following resource management software:

OpenLava
LSF
SLURM
GridEngine (Univa, etc. any flavor)

We recommend ENCODE's Hi-C processing pipeline, based on Juicer to run in the cloud; the AWS scripts are out of date.

Juicer tools requirements

The minimum software requirement to run Juicer is a working Java installation (version >= 1.8) on Windows, Linux, and Mac OSX. We recommend using the latest Java version available, but please do not use the Java Beta Version. Minimum system requirements for running Java can be found at https://java.com/en/download/help/sysreq.xml

To download and install the latest Java Runtime Environment (JRE), please go to https://www.java.com/download

GNU CoreUtils

The latest version of GNU coreutils can be downloaded from https://www.gnu.org/software/coreutils/manual/

Burrows-Wheeler Aligner (BWA)

The latest version of BWA should be installed from http://bio-bwa.sourceforge.net/

CUDA (for HiCCUPS peak calling)

You must have an NVIDIA GPU to install CUDA.

Instructions for installing the latest version of CUDA can be found on the NVIDIA Developer site.

The native libraries included with Juicer are compiled for CUDA 7 or CUDA 7.5. See the download page for Juicer Tools.

Other versions of CUDA can be used, but you will need to download the respective native libraries from JCuda.

For best performance, use a dedicated GPU. You may also be able to obtain access to GPU clusters through Amazon Web Services, Google cloud, or a local research institution.

If you cannot access a GPU, you can run the CPU version of HiCCUPS directly using the .hic file and Juicer Tools.

Building new jars

See the Juicebox documentation at https://github.com/theaidenlab/Juicebox for details on building new jars of the juicer_tools.

Quick Start

Run the Juicer pipeline on your cluster of choice with "juicer.sh [options]"

Usage: juicer.sh [-g genomeID] [-d topDir] [-q queue] [-l long queue] [-s site]
                 [-a about] [-R end] [-S stage] [-p chrom.sizes path]
                 [-y restriction site file] [-z reference genome file]
                 [-C chunk size] [-D Juicer scripts directory]
                 [-Q queue time limit] [-L long queue time limit] [-e] [-h] [-x]
* [genomeID] must be defined in the script, e.g. "hg19" or "mm10" (default
  "hg19"); alternatively, it can be defined using the -z command
* [topDir] is the top level directory (default
  "/Users/nchernia/Downloads/neva-muck/UGER")
     [topDir]/fastq must contain the fastq files
     [topDir]/splits will be created to contain the temporary split files
     [topDir]/aligned will be created for the final alignment
* [queue] is the queue for running alignments (default "short")
* [long queue] is the queue for running longer jobs such as the hic file
  creation (default "long")
* [site] must be defined in the script, e.g.  "HindIII" or "MboI"
  (default "none")
* [about]: enter description of experiment, enclosed in single quotes
* [stage]: must be one of "chimeric", "merge", "dedup", "final", "postproc", or "early".
    -Use "chimeric" when alignments are done but chimeric handling has not finished
    -Use "merge" when alignment has finished but the merged_sort file has not
     yet been created.
    -Use "dedup" when the files have been merged into merged_sort but
     merged_nodups has not yet been created.
    -Use "final" when the reads have been deduped into merged_nodups but the
     final stats and hic files have not yet been created.
    -Use "postproc" when the hic files have been created and only
     postprocessing feature annotation remains to be completed.
    -Use "early" for an early exit, before the final creation of the stats and
     hic files
* [chrom.sizes path]: enter path for chrom.sizes file
* [restriction site file]: enter path for restriction site file (locations of
  restriction sites in genome; can be generated with the script
  (misc/generate_site_positions.py) )
* [reference genome file]: enter path for reference sequence file, BWA index
  files must be in same directory
* [chunk size]: number of lines in split files, must be multiple of 4
  (default 90000000, which equals 22.5 million reads)
* [Juicer scripts directory]: set the Juicer directory,
  which should have scripts/ references/ and restriction_sites/ underneath it
  (default /broad/aidenlab)
* [queue time limit]: time limit for queue, i.e. -W 12:00 is 12 hours
  (default 1200)
* [long queue time limit]: time limit for long queue, i.e. -W 168:00 is one week
  (default 3600)
* -f: include fragment-delimited maps from hic file creation
* -e: early exit
* -h: print this help and exit

Juicer Usage

Running Juicer with no arguments will run it with genomeID hg19 and site MboI
Providing a genome ID: if not defined in the script, you can either directly modify the script or provide the script with the files needed. You would provide the script with the files needed via "-z reference_sequence_path" (needs to have the BWA index files in same directory), "-p chrom_sizes_path" (these are the chromosomes you want included in .hic file), and "-s site_file" (this is the listing of all the restriction site locations, one line per chromosome). Note that ligation junction won't be defined in this case. The script (misc/generate_site_positions.py) can help you generate the file
Providing a restriction enzyme: if not defined in the script, you can either directly modify the script or provide the files needed via the "-s site_file" flag, as above. Alternatively, if you don't want to do any fragment-level analysis (as with a DNAse experiment), you should assign the site "none", as in juicer.sh -s none
Directory structure: Juicer expects the fastq files to be stored in a directory underneath the top-level directory. E.g. HIC001/fastq. By default, the top-level directory is the directory where you are when you launch Juicer; you can change this via the -d flag. Fastqs can be zipped. [topDir]/splits will be created to contain the temporary split files and should be deleted once your run is completed. [topDir]/aligned will be created for the final files, including the hic files, the statistics, the valid pairs (merged_nodups), the collisions, and the feature annotations.
Queues are complicated and it's likely that you'll have to modify the script for your system, though we did our best to avoid this. By default there's a short queue and a long queue. We also allow you to pass in wait times for those queues; this is currently ignored by the UGER and SLURM versions. The short queue should be able to complete alignment of one split file. The long queue is for jobs that we expect to take a while, like writing out the merged_sort file
Chunk size is intimitely associated with your queues; a smaller chunk size means more alignment jobs that complete in a faster time. If you have a hard limit on the number of jobs, you don't want too small of a chunk size. If your short queue has a very limited runtime ceiling, you don't want too big of a chunk size. Run time for alignment will also depend on the particulars of your cluster. We launch ~5 jobs per chunk. Chunk size must be a multiple of 4.
Relaunch via the same script. Type juicer.sh [options] -S stage where "stage" is one of merge, dedup, final, postproc, or early. "merge" is for when alignment has finished but merged_sort hasn't been created; "dedup" is for when merged_sort is there but not merged_nodups (this will relaunch all dedup jobs); "final" is for when merged_nodups is there and you want the stats and hic files; "postproc" is for when you have the hic files and just want feature annotations; and "early" is for early exit, before hic file creation. If your jobs failed at the alignment stage, run relaunch_prep.sh and then run juicer.sh.
Miscelleaneous options include -a 'experiment description', which will add the experiment description to the statistics file and the meta data in the hic file; -r, which allows you to use bwa aln instead of bwa mem, useful for shorter reads; -R [end], in case you have one read end that's short and one that's long and you want to align the short end with bwa aln and the long end with bwa mem; and -D [Juicer scripts directory], to set an alternative Juicer directory; must have scripts/, references/, and restriction_sites/ underneath it

Command Line Tools Usage

Detailed documentation about the command line tools can be found on the wiki:

To launch the command line tools, use the shell script “juicer_tools” on Unix/MacOS or type

java -jar juicer_tools.jar (command...) [flags...] <parameters...>`

In the command line tools, there are several analysis functions:

apa for conducting aggregate peak analysis
hiccups for annotating loops
motifs for finding CTCF motifs
arrowhead for annotating contact domains
eigenvector for calculating the eigenvector (first PC) of the Pearson's
pearsons for calculating the Pearson's

The juicer_tools (Unix/MacOS) script can be used in place of the unwieldy java -Djava.library.path=path/to/natives/ -jar juicer_tools.jar

juicer's People

Contributors

Stargazers

Watchers

Forkers

siliconfeather cyang-2014 vivekanandanramalingamatstowers windfreedom borimifsud mgarber liz-is fedxa bestbioinformatics yixf-self tannerbeck cy288 paulmenzel krc3004 soolee xuanheiiis anton386 coreywischmeyer gabdank photocyte mzhibo balwierz alexander-nash zxzhu fengpku werhoog msk-cer vreuter mackzhang txiao2018 snikumbh linguoliang labdevgen verdurin wangzhennan14 sa501428 xuelei-dai rxcoux hgu0717 nikleotide cesul sidiropoulos ecsedi francisfa haozhangyn yog31 jemilianosf biov tintingli irenemota zojka li-michael kairukuma remiolsen yamyyin sisyyuan jdwheaton ealun abhijitcbio trgolla dongwei1220 leezhen1991 pythseq dexterdandi nekramer jessica-2019 tangerzhang bio-lijs cerikson ksmetz zqyou bowangxjtu skurscheid javrodriguez hc27oclock luciaalvarez95 vijender-singh hungweichen0327 qitsweauca kimj50 guoshuai1314 xjyx biobenkj zhaokai2014 xinyangbing zhang-jiankun shulp2211 ifanirene1 bgbrink utsw-bicf cmacphillamy hollylhh helengracehuang linjianqing2009 srcoulombe neekonsu zhenzhenyang-psu heziqing zeyu-yao changliangwang

juicer's Issues

Can we use juicer to call compartment with interchromosomal matrix?

Hi,

Very powerful software. I just wonder whether we can take advantage of juicer to call the 6 subcompartment with interchromosomal matrix?

Best,
Yu

SLURM issues with scontrol update

Some SLURM users can't use scontrol update. The workaround is to run in two stages. If stage is early exit, the scontrol commands should not happen.

CPU threads flag

I'm using AWS EC2 instances, and I was wondering how I can utilize more than one cpu (which is how I assume the cpu "version" works). Was there something like a --threads flag? I also noticed the AMI for Juicer is for version 1.06, so I decided to install it on a fresh instance instead.

juicer on PBS, first job terminated then remaining jobs are orphaned

After the update to the PBS version of the juicer scripts I am able to run juicer.sh. However now all the jobs are created but the first job for some reason terminates and ends up causing the remaining jobs to become orphans. I am just trying it on the small test data set provided in the wiki.

When I first run juicer.sh it creates 5 jobs seen here:

Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
207555.merlot     AlnWrpC18126     stansfieldjc             0 R workq
207556.merlot     MStWrpC18126     stansfieldjc             0 H workq
207557.merlot     RDpWrpC18126     stansfieldjc             0 H workq
207558.merlot     SpWrp1C18126     stansfieldjc             0 H workq
207561.merlot     SpWrp2C18126     stansfieldjc             0 H workq

I then get the following email from the cluster after a minute or two:
PBS Job Id: 207556.merlot.bis.vcu.edu
Job Name: MStWrpC18126
Aborted by PBS Server
Job deleted as result of dependency on job 207555.merlot.bis.vcu.edu

And after that the remaining 3 jobs remain orphaned and on hold.

I then got the next email from the cluster:

PBS Job Id: 207555.merlot.bis.vcu.edu
Job Name: AlnWrpC18126
Post job file processing error; job 207555.merlot.bis.vcu.edu on host node10

Do you know what is going on here or how I can fix it?

Juicer stops in the merging step for some samples

Hi,

I am trying to map Hi-C raw reads downloaded from GEO using juicer.
For some samples (not all) , juicer stopped in the merging step with the out file like:

"
_### Sun Apr 30 15:21:29 EDT 2017
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
/ysm-gpfs/pi/gerstein/cy288/RenBing_fires_tissue_cellR_11_15_2016/STL003_Pancreas_Rep3/splits/SRR4272017007.fastq.sam created successfully.
***! No /ysm-gpfs/pi/gerstein/cy288/RenBing_fires_tissue_cellR_11_15_2016/STL003_Pancreas_Rep3/splits/SRR4272017007.fastq_norm.txt file created "

Also see this in the attached file:
merge-1327310.txt

It seems that the problem comes from the script:
chimeric_blacklist.awk

Could you please tell me what caused this problem?

Thank you!
Chengfei Yan
Postdoc Associate from the Gerstein Lab at Yale Univeristy

Add -f flag into all Juicer versions

The sort by name should be -k1,1f

issue with dump on multiple .hic

I use the CPU version of juicer to dump data from two .hic files, but the programs seems can't reconganize the .hic file. BTW, juicer works well on single .hic with the same command.
Juicer Tools Version 1.7.6

Resolution=10000
JUICER=/home/software/juicer/CPU/juicer_tools.jar
for j in {1..22}; do java -jar ${JUICER}  dump observed NONE GSM1551601_HIC052_30.hic,GSM1551602_HIC053_30.hic  ${j} ${j}   BP $Resolution  raw_${Resolution}.chr${j}; done

error

Could not read hic file: null
Could not read hic file: null
Could not read hic file: null

juicebox_clt.jar file

Dear professor,

I tried to install the juicer on the computer, but I can't find juicebox_clt.jar files, I am not sure whether it has been replaced by juicer_tools_0.7.5.jar or just because I didn't install it correctly?

Best,
Yu

Reference file availability on AWS mirror

Hello, I'm downloading reference files for use with Juicer. hg19 works fine (Homo_sapiens_assembly19.* at https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references), but I can't access mm9. I tried Mus_musculus_assembly9_norandom.fasta as in the "Installation" wiki, but that does not work; it fails with a 403 Forbidden response. I tried some variants on the name, but none of those was a hit. I can generated the necessary files if needed, but are they available for mm9 on AWS or elsewhere? Thanks!

The location in generate_site_positions.py script

Dear,

I want to use the generate_site_positions.py script to create a restriction sites file for my study genome. But I don't know what is the [location] parameter in this python script?
generate_site_positions.py <restriction enzyme> <genome> [location]
By the way, can I use the [-s site] or alternatively use[-y restriction site file] parameters in juicer.sh? I meaning the restriction sites file is not need if I set the [-s site] parameter.
What is the [-p chrom.sizes path] parameter and function in the juicer sortware?

Thanks.

about normalization

Hi,sorry to bother you!
could you please tell me about the three normalization methods(VC,VC_SQRT,KR)?
Are they similar to the distance normalization when we calculated the eigenvector?
waiting for you reply!
Best wishes

AWS tutorial

Hi,
I cannot seem to find an AMI corresponding to ami-458fc22f. Was the tutorial moved? Is it still available?
thanks

90% chimeric ambiguous reads on more than 10 experiments using standard enzymes & syntax.

Hello!

The title says it all. Is there any way to discover why these reads are registering as chimeric ambiguous? None of the reference sets tend to have such odd stats. I have substituted the names of our conditions and genes in order to protect our ability to publish the results.
Here is the syntax used to run juicer:
module load juicer
cd /scratch/Experiment1/
juicer.sh -p $JUICER/references/hg19.chrom.sizes -s HindIII -y /usr/local/apps/juicer/juicer-1.5/SLURM/restriction_sites/hg19_HindIII.txt
Each folder has two fastq files and they are paired with the _R1.fastq.gz extension.

-bash-4.1$ head .hic -n 14
HI69/usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 51,225,101
Normal Paired: 4,837,169 (9.44%)
Chimeric Paired: 0 (0.00%)
Chimeric Ambiguous: 46,387,931 (90.56%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 17,626,936 (34.41%)
Alignable (Normal+Chimeric Paired): 4,837,169 (9.44%)
Unique Reads: 4,371,746 (8.53%)
PCR Duplicates: 460,436 (0.90%)
Optical Duplicates: 4,987 (0.01%)
Library Complexity Estimate: 23,718,686
Intra-fragment Reads: 41,555 (0.08% / 0.95%)
Below MAPQ Threshold: 832,482 (1.63% / 19.04%)
-bash-4.1$ head .hic -n 14
HI /usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 53,112,216
Normal Paired: 4,531,213 (8.53%)
Chimeric Paired: 0 (0.00%)
Chimeric Ambiguous: 48,581,002 (91.47%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 15,353,175 (28.91%)
Alignable (Normal+Chimeric Paired): 4,531,213 (8.53%)
Unique Reads: 4,165,313 (7.84%)
PCR Duplicates: 361,219 (0.68%)
Optical Duplicates: 4,681 (0.01%)
Library Complexity Estimate: 26,831,778
Intra-fragment Reads: 51,098 (0.10% / 1.23%)
Below MAPQ Threshold: 821,990 (1.55% / 19.73%)
-bash-4.1$ head .hic -n 14
HIn▒/usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 70,885,255
Normal Paired: 4,735,332 (6.68%)
Chimeric Paired: 1 (0.00%)
Chimeric Ambiguous: 66,149,921 (93.32%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 16,650,157 (23.49%)
Alignable (Normal+Chimeric Paired): 4,735,333 (6.68%)
Unique Reads: 4,391,686 (6.20%)
PCR Duplicates: 338,623 (0.48%)
Optical Duplicates: 5,024 (0.01%)
Library Complexity Estimate: 31,443,095
Intra-fragment Reads: 50,678 (0.07% / 1.15%)
Below MAPQ Threshold: 807,014 (1.14% / 18.38%)

Thanks,
James D

awk: /home/ljw/juicer/scripts/common/chimeric_blacklist.awk: line 515: function and never defined

I have run a single CPU version of juicer by the command
bash ~/juicer/scripts/juicer.sh -d ~/juicer/work/DNA -s none -z ~/juicer/references/hg19.fa -p ~/juicer/references/hg19.sizes -D ~/juicer -x
And I got the following error
awk: /home/ljw/juicer/scripts/common/chimeric_blacklist.awk: line 515: function and never defined
It seems that chimeric_blacklist.awk only has 513 lines. How can I fix this? Thank you.

juicebox_tools.jar pre erro

Dear professor,
when I use the command ,there also have an error. could you help me?

java -jar /share/nas30/liufuyan/Project/AT/Interaction/06.TAD/Soft/juicebox-master/out/artifacts/Juicebox_tools_jar/juicebox_tools.jar pre -f ../QC/digest_AT.fa.bed -q 0 tmp/93500_allValidPairs.pre_juicebox_sorted test ../QC/AT.fa.len
Skipping Chr1 30427671
Skipping Chr2 19698289
Skipping Chr3 23459830
Skipping Chr4 18585056
Skipping Chr5 26975502
Warning: Unable to process fragment file. Pre will continue without fragment file.
Start preprocess
Writing header
Writing body
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1457)
at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1228)
at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:642)
at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:373)
at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:106)
at juicebox.tools.HiCTools.main(HiCTools.java:83)

Running juicer from pre-aligned R1 and R2 BAMs

Hello,

Thanks very much for juicer!

I am new to juicer, and I'd like to know how I should proceed from pre-aligned R1 and R2 BAMs? Is there a work-around without re-aligning? From the error output, I can see juicer were looking for specific intermediate files to continue which I dont have.

Any help would be appreciated!

cheers,
Simo

PLEASE POST QUESTIONS TO THE FORUM; GITHUB ISSUES FOR BUGS ONLY

General announcement -
Please use our forum, aidenlab.org/forum.html for asking questions about juicer, including anything related to installation, running the software, interpreting warnings/errors, or general questions related to 3D genomics.

Please use Github issues specifically for reporting bugs in the software or for new feature requests.

WARNING for calculating Pearson's and eigenvector at high resolution

I'm calculating eigenvectors from hic files and when I go below 500,000 bp resolution, I get this warning:
WARNING: Pearson's and eigenvector calculation at high resolution can take a long time
and then it fails.

It is possible to bypass this warning and forge ahead with higher resolution?

The problem using hiccupsdiff

We run hiccupsdiff between two .hic maps in cmd exeulation and have some trouble as following figure.

The program returns two folders, each with six files as following:

I don't know if I'm running the program correctly, and if so, which file should be the correct difference loops？

function not defined in chimeric_blacklist.awk

Hi, i've updated the CPU / chimeric_blacklist.awk script and now get an error on line 223 when running the example.
223: str[j] = and(tmp[2],16);
it may be that and() is not defined in the OS X version of awk. bwa runs and completes, then after two sorting steps there is an error:

$ ./juicer.sh -s HindIII -g hg38
(-: Looking for fastq files...fastq files exist
Fri 6 Jan 2017 13:32:35 GMT
Juicer version:1.5
....
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
(-: /Users/stuart/NGSTools/Juicer/CPU/splits/HIC003_S2_L001_001.fastq.sam created successfully.
awk: calling undefined function and
input record number 3, file /Users/stuart/NGSTools/Juicer/CPU/splits/HIC003_S2_L001_001.fastq.sam
source line number 223

thanks,

Stuart

_msplit_optdups.txt no file or directory

Hi,

I have gotten this error in trying to run Juicer .
The job output suggests it is successfully completed, but it looks like that job runs scripts to create _msplit_optdups.txt failed, and opt_dups.txt is an empty file.

Does this error matter ?
Any thoughts you have would be really appreciated!
Thanks.
Chengfei

Download problem

Hi,
Why I cannot download juice_tools form https://github.com/theaidenlab/juicer/wiki/Download ?
Thank You!

restriction site file

Hi,
Thanks for sharing the code in this much detail!

just wondering where can I find the restriction site file

$site_file = "/opt/juicer/restriction_sites/hg19_DpnII.txt";

Is it generated by HICUP? just wanna know what's the format look like.

Thanks!
Hurley

What are the general principles of the VC, VC_SQRT and KR normalization methods?

hello,
I used juicer_tools to dump my Hi-C data recently. In juicer's dump you provided three normalization methods: VC, VC_SQRT, KR, and I want to know what are the principles of them. I searched them on the internet and your paper(Rao et al. 2014), but only find KR.
I have to know about the normalization method of Hi-C in my study, so would you tell me the general principles of the three normalization method in your tools? Or some relative materails and references is good.
Thank you!
Yours,
J.Wan

Relative directory - fix is probably to prepend $(pwd) to input directory

The error message throws like this:

juicer$ ./juicer.sh -g hg19 -d XXX -s HindIII -p references/hg19.chrom.sizes
(-: Looking for fastq files...fastq files exist
Wed Jan 4 11:52:48 EST 2017
Juicer version:1.5
./juicer.sh -g hg19 -d XXX -s HindIII -p references/hg19.chrom.sizes
(-: Aligning files matching XXX/fastq/_R.fastq*
in queue to genome hg19 with site file ./restriction_sites/hg19_HindIII.txt
--- Using already created files in XXX/splits
gzip: XXX/splits/XXXHiC-HI-1_S0_R1.fastq.gz: No such file or directory
gzip: XXX/splits/XXXHiC-HI-1_S0_R2.fastq.gz: No such file or directory

The problem has to do with the soft links and relative directories. If you do ls -lh XXX/splits/XXXHiC-HI-1_S0_R1.fastq.gz
it probably points to XXX/fastq/XXXHiC-HI-1_S0_R1.fastq.gz - which from the perspective of that directory, does not exist (would be under the splits directory).

To correct it, either run juicer from your directory (i.e., cd XXX then run juicer instead of sending in “-d” flag - juicer calls ‘pwd’ which gives the absolute path) or run with the -d flag but with the absolute directory (i.e.-d /path/to/my/folder/XXX)

calculate_map_resolution.sh error

I am trying to run calculate_map_resolution.sh on GSM1551620_HIC071_merged_nodups.txt from your 2014 GEO repository. When I run the following command

./calculate_map_resolution.sh GSM1551620_HIC071_merged_nodups.txt 50bp.txt I get this error:

../calculate_map_resolution.sh: line 104: [: -lt: unary operator expected

generate_site_positions.py fails when location is provided

Slight logic error in the if/else loops at the top.

If genome is one of the listed genomes, AND the location is provided, run still fails because /seq/reference isn't universal.

You need to add an

elif len(sys.argv)==3

(etc) at line 24 or check if the len(sys.argv)==4 before and use filename= instead.

Option to use other Matrix-types

Hi there,

I was wondering if it would be possible to allow a user to perform HICCUPS/APA/Arrowheads-analyses using their own matrices. I can imagine that not everybody has access to the original data and still want to use this excellent tool-kit.

The most easy way of doing this would be to make a conversion-tool (from e.g. Hi-C summary files/validpairs) to .hic files. This will lead to more people using the "aiden-lab Hi-C ecosystem".

Thanks for both reading this issue and for developing juicer 👍

Kind regards,
Robin
(happy to help btw)

STDOUT vs STDERR

java -jar ~/tools/juicebox/juicer_tools_0.7.0.jar eigenvector VC K526.links.hic chr11 BP 100000 -p > test.txt

It looks like you're printing the HiC file version to stdout rather than stderr.

Error running on PBS cluster

I am trying to run juicer on a PBS cluster using the new PBS scripts. When I run the juicer.sh script I get the following error:

 Starting job to launch other jobs once splitting is complete
207474.merlot.bis.vcu.edu
below is the jID_alignwrap jobid
207474.
#PBS -W depend=afterok:207474.
qsub: illegal -W value

I think this is because of the period after the job ID number. For reference on our cluster jobs are named like this: 206998.merlot and can be called using only the number.

How can I modify the script to only use the number and drop the period from the job ID being used for the PBS -W command?

Path issue for CPU script execution

Hi there,

It seems like there is a bit of a path error for script execution with the "CPU" pipeline. For example,

JUICER_INSTALL_DIR=/lab/solexa_weng/testtube/juicer
SCRIPT_DIR=/lab/solexa_weng/testtube/juicer/CPU/common/
JUICER_WORK_DIR=(absolute path to directory in current directory, fastq files properly setup there)
${JUICER_INSTALL_DIR}/CPU/juicer.sh -D $SCRIPT_DIR -g MyGenome -t 16 -z ../../MyGenome.fasta -p chrom_sizes.txt -y MyGenome.fasta_MboI.txt -d $JUICER_WORK_DIR 1>juicer.stdout.log 2>juicer.stderr.log

Runs for a bit, giving a non-exiting error:

/lab/solexa_weng/testtube/juicer/CPU/juicer.sh: line 363: /lab/solexa_weng/testtube/juicer/CPU/common//scripts/common/countligations.sh: No such file or directory

and the exiting error:

awk: fatal: can't open source file /lab/solexa_weng/testtube/juicer/CPU/common//scripts/common/chimeric_blacklist.awk' for reading (No such file or directory)

If we look at where the countligations.sh and chimeric_blacklist.awk files are:

tfallon@tak4 /lab/solexa_weng/Seq_data/Projects/Tim_Fallon/ppyralis_genome/Genome_project_reference_assemblies/version2/analyses/juicer$ find /lab/solexa_weng/testtube/juicer/ -name countligations.sh
/lab/solexa_weng/testtube/juicer/CPU/common/countligations.sh
/lab/solexa_weng/testtube/juicer/SLURM/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/AWS/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/UGER/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/LSF/scripts/countligations.sh

tfallon@tak4 /lab/solexa_weng/Seq_data/Projects/Tim_Fallon/ppyralis_genome/Genome_project_reference_assemblies/version2/analyses/juicer$ find /lab/solexa_weng/testtube/juicer/ -name chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/CPU/common/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/SLURM/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/AWS/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/UGER/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/LSF/scripts/chimeric_blacklist.awk

It looks like the issue may be that the juicer.sh script expects "/scripts/common/" as a hardcoded prefix, however the CPU scripts don't follow this convention. Do you agree? Or am I executing the juicer pipeline wrong

Running on a SLURM cluster, gives a lot of errors.

Hi,

I have the following directory structure:

references:
total 8374528
-rwxr-xr-x+ 1 sm2556 mane 3157608038 Sep 13 12:32 Homo_sapiens_assembly19.fasta  
-rw-r--r--+ 1 sm2556 mane       6663 Sep 13 13:31 Homo_sapiens_assembly19.fasta.amb
-rw-r--r--+ 1 sm2556 mane        939 Sep 13 13:31 Homo_sapiens_assembly19.fasta.ann
-rw-r--r--+ 1 sm2556 mane 3095694072 Sep 13 13:30 Homo_sapiens_assembly19.fasta.bwt
-rw-r--r--+ 1 sm2556 mane  773923497 Sep 13 13:31 Homo_sapiens_assembly19.fasta.pac
-rw-r--r--+ 1 sm2556 mane 1547847040 Sep 13 13:44 Homo_sapiens_assembly19.fasta.sa
-rw-r--r--+ 1 sm2556 mane        377 Sep 13 15:19 Homo_sapiens_assembly19.sizes

restriction_sites:
total 15360
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII_new.txt
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII.txt

scripts:
total 92800
-rwxr-xr-x+ 1 sm2556 mane     3519 Sep 13 11:26 check.sh
-rwxr-xr-x+ 1 sm2556 mane    15349 Sep 13 11:26 chimeric_blacklist.awk
-rwxr-xr-x+ 1 sm2556 mane     1971 Sep 13 11:26 cleanup.sh
-rwxr-xr-x+ 1 sm2556 mane     3584 Sep 13 11:26 collisions.awk
-rwxr-xr-x+ 1 sm2556 mane     1616 Sep 13 11:26 countligations.sh
-rwxr-xr-x+ 1 sm2556 mane    13448 Sep 13 11:26 diploid.pl
-rw-r--r--+ 1 sm2556 mane     2449 Sep 13 11:26 diploid_split.awk
-rwxr-xr-x+ 1 sm2556 mane     5325 Sep 13 11:26 dups.awk
-rw-r--r--+ 1 sm2556 mane     3726 Sep 13 11:26 fragment_4dnpairs.pl
-rwxr-xr-x+ 1 sm2556 mane     3711 Sep 13 11:26 fragment.pl
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:31 juicebox
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:30 Juicebox.jar
-rw-r--r--+ 1 sm2556 mane 30751431 Sep 13 12:30 juicebox_tools.7.0.jar
-rwxr-xr-x+ 1 sm2556 mane     2388 Sep 13 11:26 juicer_arrowhead.sh
-rwxr-xr-x+ 1 sm2556 mane     3269 Sep 13 11:26 juicer_hiccups.sh
-rwxr-xr-x+ 1 sm2556 mane     3651 Sep 13 11:26 juicer_postprocessing.sh
-rwxr-xr-x+ 1 sm2556 mane    41529 Sep 13 11:26 juicer.sh
-rwxr-xr-x+ 1 sm2556 mane     4659 Sep 13 11:26 LibraryComplexity.class
-rwxr-xr-x+ 1 sm2556 mane     7204 Sep 13 11:26 LibraryComplexity.java
-rwxr-xr-x+ 1 sm2556 mane     2354 Sep 13 11:26 makemega_addstats.awk
-rwxr-xr-x+ 1 sm2556 mane    12782 Sep 13 11:26 mega.sh
-rwxr-xr-x+ 1 sm2556 mane     2455 Sep 13 11:26 relaunch_prep.sh
-rwxr-xr-x+ 1 sm2556 mane     5200 Sep 13 11:26 split_rmdups.awk
-rwxr-xr-x+ 1 sm2556 mane    14572 Sep 13 11:26 statistics.pl
-rwxr-xr-x+ 1 sm2556 mane     1751 Sep 13 11:26 stats_sub.awk

fastq:
total 0
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:33 S1_003_HiC_R1.fastq.gz -> ../../analysis jul052016/S1_003_HiC/Unaligned/S1_003_HiC_1.fastq.gz
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:34 S1_003_HiC_R2.fastq.gz -> ../../analysis-jul052016/S1_003_HiC/Unaligned/S1_003_HiC_2.fastq.gz

My run.sh script for the SLURM batch submission looks as follows:

#!/bin/bash
#SBATCH --partition=general
#SBATCH --job-name=Juicer
#SBATCH --ntasks=1 --nodes=1
#SBATCH --mem-per-cpu=6000
module load BWA; module load Java;  bash /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/scripts/juicer.sh -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a 'Reference' -S 'early' -p /home/sm2556/project/hic-golden-uconn-feb022216/hic-analysis-sept142017/references/Homo_sapiens_assembly19.sizes -s 'HindIII' -y /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/restriction_sites/hg19_HindIII.txt - D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x

The scripts folder was copied from the cloned GitHub repository of the juicer/SLURM/scripts.

I get tons of error messages about dependencies not being satisfied, but I still get the part of script that "split" the fastq.gz file correctly, but still ends with error. The actual bwa mem call never happens on the cluster. When I tried to run the script in the CPU mode it started the alignment. But my files are too big, and CPU mode will take a long time. Am I doing something wrong?

WARNING for calculating Pearson's and eigenvector at high resolution

Dear professor

I tried to use eigenvectors to generate compartment with .hic files, when I set resolution as 250,000, it just states like this and failed to generate any files:
WARNING: Pearson's and eigenvector calculation at high resolution can take a long time
and then it fails.

I have checked the issues and found other one has mentioned that before, but I am sorry I can't find out the solutions, should I updated any file ?

Best,
Yu

Request use such that GPUs are not mandatory

Hi there

I'm running HiCCUPs on a server with ~200 CPUs but no GPUS, using the following command:

java -Xmx2g -jar /path/Juicer/scripts/juicer_tools_linux_0.8.jar hiccups -m 500 -r 5000,10000 -f 0.1,0.1 -p 4,2 -i 7,5 -d 20000,20000 -c 22 --ignore_sparsity /pathpath/HMEC_HiCPro/Flow/HiCPro/HMEC/HMEC_allValidPairs.hic HMEC.hiccups.loops

this outputs the following:

Reading file: /pathpath/HMEC_HiCPro/Flow/HiCPro/HMEC/HMEC_allValidPairs.hic
HiC file version: 8
Using the following configurations for HiCCUPS:
Config res: 5000 peak: 4 window: 7 fdr: 10% radius: 20000
Config res: 10000 peak: 2 window: 5 fdr: 10% radius: 20000
Warning Hi-C map is too sparse to find many loops via HiCCUPS.
Running HiCCUPS for resolution 5000
GPU/CUDA Installation Not Detected
Exiting HiCCUPS

That's a bummer. Often times there's a way to use CPUs instead of GPUs (e.g. with Tensorflow).

Does this exist? Can I use my many CPUs instead of GPUs for this task?

juicer pre error for human

hi,
I am using pre to produce .hic file and the command line is:

java -jar juicer_tools.1.7.6_jcuda.0.8.jar pre -r 40000 -q 30 -f ../01.data/hg19_MobI.txt ../01.data/test3.txt.gz ./M3-736.hic hg19

while I met this problem:
Start preprocess
Writing header
Writing body
......Error: the chromosome combination 14_15 appears in multiple blocks

Do you know why the error happen? Look forward for your reply. Thank you so much.
min

Line 512 : chimeric_blacklist.awk error

Hi there,

After alignment the pipeline crashes around line 512 in the chimeric_blacklist.awk script.

Syntax error - I havent tested yet but could be a stray "}"

Nicola

(-: Looking for fastq files...fastq files exist
Tue  3 Jan 2017 23:28:47 GMT
Juicer version:1.5
../juicer.sh -z ../references/genome.fa -p ../references/genome.dict -y ../restriction_sites/ -D ..
(-: Aligning files matching 
opt/juicer/CPU/fastq/*_R*.fastq*
 in queue  to genome hg19 with site file ../restriction_sites/
(-: Created /opt/juicer/CPU/splits and 
/opt/juicer/CPU/aligned.
Running command bwa mem -t 4 ../references/genome.fa opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq > opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 215849 sequences (19773217 bp)...
[M::mem_process_seqs] Processed 215849 reads in 1673.624 CPU sec, 1568.074 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 4 ../references/genome.fa 
/opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq
[main] Real time: 1587.733 sec; CPU: 1684.597 sec
(-:  Align of /opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq.sam done successfully
Running command bwa mem -t 4 ../references/genome.fa /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq > /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 215849 sequences (19766979 bp)...
[M::mem_process_seqs] Processed 215849 reads in 2004.714 CPU sec, 1860.978 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 4 ../references/genome.fa /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq
[main] Real time: 1881.219 sec; CPU: 2015.746 sec
(-: Mem align of /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq.sam done successfully
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
(-: /opt/juicer/CPU/splits/CTCF_S1_L001.fastq.sam created successfully.
awk: syntax error at source line 512 source file ../scripts/common/chimeric_blacklist.awk
 context is
	t_norm, count_abnorm) >> >>>  fname1".res.txt" <<< ;
awk: illegal statement at source line 513 source file ../scripts/common/chimeric_blacklist.awk

useuse

Hi,

I'm using juicer 1.5.5 and I'm running into an issue right of the gate with a script called useuse. It's referenced in juicer.sh, but is not included in the 1.5.5 release.

source/juicer-1.5.5/UGER/scripts/juicer.sh: line 337: /broad/software/scripts/useuse: No such file or directory

juicebox pre for maize

Dump issue

Hi,
Sorry to bother you but I try to extract the matrix from the .hic data download from GEO, however, I always came with the error as below:
HiC file version: 8
Exception in thread "main" java.lang.NullPointerException
at juicebox.tools.clt.old.Dump.extractChromosomeRegionIndices(Dump.java:487)
at juicebox.tools.clt.old.Dump.readArguments(Dump.java:356)
at juicebox.tools.HiCTools.main(HiCTools.java:85)

Could you help me with that issue?

Best,
Yu

How can I to call TAD/compartment using juicer_tools ?

Thanks for your good software. But I didn't find a clear way to call TAD or compartment.Can juicer_tools run to call TAD or compartment (by using Arrowhead,HiCCUPS etc?)

Many Thanks!

line 673 juicer/SLURM/scripts/juicer.sh

on line 673 in the file juicer/SLURM/scripts/juicer.sh there is fi.

I cannot find the matching if statement. Is this a bug?

Thank you.

can I user Juicer on a cluster without root?

note that Juicer is assumed to be located in /opt/juicer, when I run the command as the instruction suggests, I got error "***! Reference sequence /opt/juicer/references/Homo_sapiens_assembly19.fasta does not exist", I don't have the root to build Juices in /opt/, is there any way that I can use it without root?
Thanks.

juicer.sh SLURM script

Discovered on lines 407 and 452 code is:

if [ -v shortread ] || [ "$shortreadend" -eq 1 ]

but getting a

[: -v: unary operator expected error

I'm guessing this needs to be changed to:
if [ -v $shortread ] || [ "$shortreadend" -eq 1 ]

to call the $shortread variable. Can you please check this?

hic scaffolding

Thanks for your work on hic scaffolding . when i download the code indicated in science , but it did not existed in the website: github.com/theaidenlab/HiC-assembly-pipeline-archive
.Could you help me?

Mitochondria hardcoded length in chimeric_blacklist

In chimeric_blacklist, the size of the mitochondria is hardcoded to hg19. This is to deal with circular chromosomes - trying to assign position correctly based on CIGAR string, sometimes the position will end up off the end of the chromosome, in which case it maps to beginning. This is the code:

# Mitochondria loops around
	  if (chr[j] ~ /MT/ && pos[j] >= 16569) {
	    pos[j] = pos[j] - 16569;
	  }

Theoretically, any differently sized MT could go off the end (mouse for example); and any mitochondrial chromosome not named "MT" could also go off the end. In practice this hasn't happened; however, we should keep our eyes on this issue.

Dump Issues

Hello, I'm new to manipulation of hic data and am trying to extract dense matrices from a hic file, and the dump command fails:

java -jar juicer/scripts/juicer_tools.jar dump -d observed KR GSE80701_DpnII_HinfI_combo.hic 25000 arm_2L arm_2L 2L.matrix
HiC file version: 8
Exception in thread "main" java.lang.NullPointerException
at juicebox.tools.clt.old.Dump.extractChromosomeRegionIndices(Dump.java:455)
at juicebox.tools.clt.old.Dump.readArguments(Dump.java:347)
at juicebox.tools.HiCTools.main(HiCTools.java:96)

I successfully extracted sparse matrices using straw.

Could anyone help me understanding the errors?
Thanks in advance
Remi - NYUSoM

arrowhead issue

For some reason when I run juicer I don't get any output for arrowhead. I do, however, get a good .hic file that I can visualize in juicebox. When I run arrowhead separately using juicer tools I get very few TAD domains (~100 for the entire genome at best when messing around with the r and m settings).

When I use another program (HiCexplorer) I get a HiC matrix that looks exactly the same, but it provides me with thousands of TAD domains. Any suggestions on what the issue might be with juicer?

Additionally, when visualizing the contact matrix in juicebox I'd like to change the order in which it displays the scaffolds/chromosomes. Is there any way to do this?

One last question, I can't seem to figure out how to run hiccup on my mac or linux. It requires GPU and I have no experience using GPU. Any suggestions on how to get it to work?

How essential is GPU?

Hi,

Is it necessary to have GPU, if i am not immediately interested in running HICCUPS?

Sameet

Exception thrown: "java.lang.IndexOutOfBoundsException: Index: 6, Size: 4"

Hi,

I generated the input file for pre from BAM file (generated by Babraham HiCUPs) with the kind solution provided in this forum. Subsequently, I ran pre command, and have generated the .hic file. I am not sure if it is successful as I have encountered an exception.

java.lang.IndexOutOfBoundsException: Index: 6, Size: 4 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at java.util.Collections$UnmodifiableList.get(Collections.java:1211) at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:143) at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:247) at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:496) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:374) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:286) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:105) at juicebox.tools.HiCTools.main(HiCTools.java:97)
My question is how do I know if the hic file generated was complete and did not stop at the point where the exception was thrown?

Thank you.

PS:
Code for converting bam to input file for pre:
samtools view read1_2.hicup.bam | awk 'BEGIN {FS="\t"; OFS="\t"} {name1=$1; str1=and($2,16); chr1=substr($3, 4); pos1=$4; mapq1=$5; getline; name2=$1; str2=and($2,16); chr2=substr($3, 4); pos2=$4; mapq2=$5; if(name1==name2) { if (chr1>chr2){print name1, str2, chr2, pos2,1, str1, chr1, pos1, 0, mapq2, mapq1} else {print name1, str1, chr1, pos1, 0, str2, chr2, pos2 ,1, mapq1, mapq2}}}' | sort -k3,3d -k7,7d > Arrowhead.input
my command for generating .hic file:
java -Xmx2g -jar /mnt/projects/wlwtan/cardiac_epigenetics/george/juicer/juicer_tools.1.6.2_linux_jcuda.0.8.jar pre -f mm9_DpnII.txt -q 30 Arrowhead.input Arrowhead.hic mm9

Questions about the APA analysis

I would like to perform an APA analysis with your software. First of all, I have both a raw matrix and an ICE corrected matrix (500 ICE iterations) in text format. In order to perform the APA analysis, should I create the .hic file with the raw data or with the normalized one?

Once I know what kind of data to use, I should create the .hic file. I can do it, according to the docs, with the Pre tool. One of the accepted formats is the Short with score format, which has the following columns:

<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <score>

So, as I have already binned data at 80K, do I have to create this file, for instance, as below (ignoring the fragment, 0, and the strand always to +)?

+ chr1 80000 0 + chr1 320000 0 356

Once I have the .hic file. I need also to have a loops file. I do have a list of TADs from HiCExplorer. The APA analysis is suitable to check my TADs or do I need to check the loops with Juicer?

Thank you.

aidenlab / juicer Goto Github PK

juicer's Introduction

Read this first!!

About Juicer

Documentation

Questions?

Distribution

Hardware and Software Requirements

Cluster requirements:

Juicer tools requirements

GNU CoreUtils

Burrows-Wheeler Aligner (BWA)

CUDA (for HiCCUPS peak calling)

Building new jars

Quick Start

Juicer Usage

Command Line Tools Usage

juicer's People

Contributors

Stargazers

Watchers

Forkers

juicer's Issues

Recommend Projects

Recommend Topics

Recommend Org