Giter Site home page Giter Site logo

aidenlab / juicer Goto Github PK

View Code? Open in Web Editor NEW
391.0 391.0 181.0 79.14 MB

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments

Home Page: http://aidenlab.org

License: MIT License

Shell 50.65% Awk 32.93% Perl 14.63% Java 1.21% Python 0.59%
3d-genome 3d-genome-browser bioinformatics genomics hi-c ngs

juicer's People

Contributors

adadiehl avatar cerikson avatar cy288 avatar dudcha avatar ecsedi avatar eernst avatar jemilianosf avatar macroscian avatar mzhibo avatar nchernia avatar paulmenzel avatar photocyte avatar rmdickson avatar sa501428 avatar sidiropoulos avatar soolee avatar tannerbeck avatar theaidenlab avatar zojka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

juicer's Issues

Running juicer from pre-aligned R1 and R2 BAMs

Hello,

Thanks very much for juicer!

I am new to juicer, and I'd like to know how I should proceed from pre-aligned R1 and R2 BAMs? Is there a work-around without re-aligning? From the error output, I can see juicer were looking for specific intermediate files to continue which I dont have.

Any help would be appreciated!

cheers,
Simo

Questions about the APA analysis

I would like to perform an APA analysis with your software. First of all, I have both a raw matrix and an ICE corrected matrix (500 ICE iterations) in text format. In order to perform the APA analysis, should I create the .hic file with the raw data or with the normalized one?

Once I know what kind of data to use, I should create the .hic file. I can do it, according to the docs, with the Pre tool. One of the accepted formats is the Short with score format, which has the following columns:

<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <score>

So, as I have already binned data at 80K, do I have to create this file, for instance, as below (ignoring the fragment, 0, and the strand always to +)?

+ chr1 80000 0 + chr1 320000 0 356

Once I have the .hic file. I need also to have a loops file. I do have a list of TADs from HiCExplorer. The APA analysis is suitable to check my TADs or do I need to check the loops with Juicer?

Thank you.

Reference file availability on AWS mirror

Hello, I'm downloading reference files for use with Juicer. hg19 works fine (Homo_sapiens_assembly19.* at https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references), but I can't access mm9. I tried Mus_musculus_assembly9_norandom.fasta as in the "Installation" wiki, but that does not work; it fails with a 403 Forbidden response. I tried some variants on the name, but none of those was a hit. I can generated the necessary files if needed, but are they available for mm9 on AWS or elsewhere? Thanks!

The problem using hiccupsdiff

We run hiccupsdiff between two .hic maps in cmd exeulation and have some trouble as following figure.
problem
The program returns two folders, each with six files as following:
files
I don't know if I'm running the program correctly, and if so, which file should be the correct difference loops?

AWS tutorial

Hi,
I cannot seem to find an AMI corresponding to ami-458fc22f. Was the tutorial moved? Is it still available?
thanks

Dump Issues

Hello, I'm new to manipulation of hic data and am trying to extract dense matrices from a hic file, and the dump command fails:

java -jar juicer/scripts/juicer_tools.jar dump -d observed KR GSE80701_DpnII_HinfI_combo.hic 25000 arm_2L arm_2L 2L.matrix
HiC file version: 8
Exception in thread "main" java.lang.NullPointerException
at juicebox.tools.clt.old.Dump.extractChromosomeRegionIndices(Dump.java:455)
at juicebox.tools.clt.old.Dump.readArguments(Dump.java:347)
at juicebox.tools.HiCTools.main(HiCTools.java:96)

I successfully extracted sparse matrices using straw.

Could anyone help me understanding the errors?
Thanks in advance
Remi - NYUSoM

Option to use other Matrix-types

Hi there,

I was wondering if it would be possible to allow a user to perform HICCUPS/APA/Arrowheads-analyses using their own matrices. I can imagine that not everybody has access to the original data and still want to use this excellent tool-kit.

The most easy way of doing this would be to make a conversion-tool (from e.g. Hi-C summary files/validpairs) to .hic files. This will lead to more people using the "aiden-lab Hi-C ecosystem".

Thanks for both reading this issue and for developing juicer 👍

Kind regards,
Robin
(happy to help btw)

Juicer stops in the merging step for some samples

Hi,

I am trying to map Hi-C raw reads downloaded from GEO using juicer.
For some samples (not all) , juicer stopped in the merging step with the out file like:

"
_### Sun Apr 30 15:21:29 EDT 2017
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
/ysm-gpfs/pi/gerstein/cy288/RenBing_fires_tissue_cellR_11_15_2016/STL003_Pancreas_Rep3/splits/SRR4272017007.fastq.sam created successfully.
***! No /ysm-gpfs/pi/gerstein/cy288/RenBing_fires_tissue_cellR_11_15_2016/STL003_Pancreas_Rep3/splits/SRR4272017007.fastq_norm.txt file created "

Also see this in the attached file:
merge-1327310.txt

It seems that the problem comes from the script:
chimeric_blacklist.awk

Could you please tell me what caused this problem?

Thank you!
Chengfei Yan
Postdoc Associate from the Gerstein Lab at Yale Univeristy

issue with dump on multiple .hic

I use the CPU version of juicer to dump data from two .hic files, but the programs seems can't reconganize the .hic file. BTW, juicer works well on single .hic with the same command.
Juicer Tools Version 1.7.6

Resolution=10000
JUICER=/home/software/juicer/CPU/juicer_tools.jar
for j in {1..22}; do java -jar ${JUICER}  dump observed NONE GSM1551601_HIC052_30.hic,GSM1551602_HIC053_30.hic  ${j} ${j}   BP $Resolution  raw_${Resolution}.chr${j}; done

error

Could not read hic file: null
Could not read hic file: null
Could not read hic file: null

calculate_map_resolution.sh error

I am trying to run calculate_map_resolution.sh on GSM1551620_HIC071_merged_nodups.txt from your 2014 GEO repository. When I run the following command

./calculate_map_resolution.sh GSM1551620_HIC071_merged_nodups.txt 50bp.txt I get this error:

../calculate_map_resolution.sh: line 104: [: -lt: unary operator expected

arrowhead issue

For some reason when I run juicer I don't get any output for arrowhead. I do, however, get a good .hic file that I can visualize in juicebox. When I run arrowhead separately using juicer tools I get very few TAD domains (~100 for the entire genome at best when messing around with the r and m settings).

When I use another program (HiCexplorer) I get a HiC matrix that looks exactly the same, but it provides me with thousands of TAD domains. Any suggestions on what the issue might be with juicer?

Additionally, when visualizing the contact matrix in juicebox I'd like to change the order in which it displays the scaffolds/chromosomes. Is there any way to do this?

One last question, I can't seem to figure out how to run hiccup on my mac or linux. It requires GPU and I have no experience using GPU. Any suggestions on how to get it to work?

Running on a SLURM cluster, gives a lot of errors.

Hi,

I have the following directory structure:

references:
total 8374528
-rwxr-xr-x+ 1 sm2556 mane 3157608038 Sep 13 12:32 Homo_sapiens_assembly19.fasta  
-rw-r--r--+ 1 sm2556 mane       6663 Sep 13 13:31 Homo_sapiens_assembly19.fasta.amb
-rw-r--r--+ 1 sm2556 mane        939 Sep 13 13:31 Homo_sapiens_assembly19.fasta.ann
-rw-r--r--+ 1 sm2556 mane 3095694072 Sep 13 13:30 Homo_sapiens_assembly19.fasta.bwt
-rw-r--r--+ 1 sm2556 mane  773923497 Sep 13 13:31 Homo_sapiens_assembly19.fasta.pac
-rw-r--r--+ 1 sm2556 mane 1547847040 Sep 13 13:44 Homo_sapiens_assembly19.fasta.sa
-rw-r--r--+ 1 sm2556 mane        377 Sep 13 15:19 Homo_sapiens_assembly19.sizes

restriction_sites:
total 15360
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII_new.txt
-rw-r--r--+ 1 sm2556 mane 7762896 Sep 13 11:45 hg19_HindIII.txt

scripts:
total 92800
-rwxr-xr-x+ 1 sm2556 mane     3519 Sep 13 11:26 check.sh
-rwxr-xr-x+ 1 sm2556 mane    15349 Sep 13 11:26 chimeric_blacklist.awk
-rwxr-xr-x+ 1 sm2556 mane     1971 Sep 13 11:26 cleanup.sh
-rwxr-xr-x+ 1 sm2556 mane     3584 Sep 13 11:26 collisions.awk
-rwxr-xr-x+ 1 sm2556 mane     1616 Sep 13 11:26 countligations.sh
-rwxr-xr-x+ 1 sm2556 mane    13448 Sep 13 11:26 diploid.pl
-rw-r--r--+ 1 sm2556 mane     2449 Sep 13 11:26 diploid_split.awk
-rwxr-xr-x+ 1 sm2556 mane     5325 Sep 13 11:26 dups.awk
-rw-r--r--+ 1 sm2556 mane     3726 Sep 13 11:26 fragment_4dnpairs.pl
-rwxr-xr-x+ 1 sm2556 mane     3711 Sep 13 11:26 fragment.pl
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:31 juicebox
-rw-r--r--+ 1 sm2556 mane 30745856 Sep 13 12:30 Juicebox.jar
-rw-r--r--+ 1 sm2556 mane 30751431 Sep 13 12:30 juicebox_tools.7.0.jar
-rwxr-xr-x+ 1 sm2556 mane     2388 Sep 13 11:26 juicer_arrowhead.sh
-rwxr-xr-x+ 1 sm2556 mane     3269 Sep 13 11:26 juicer_hiccups.sh
-rwxr-xr-x+ 1 sm2556 mane     3651 Sep 13 11:26 juicer_postprocessing.sh
-rwxr-xr-x+ 1 sm2556 mane    41529 Sep 13 11:26 juicer.sh
-rwxr-xr-x+ 1 sm2556 mane     4659 Sep 13 11:26 LibraryComplexity.class
-rwxr-xr-x+ 1 sm2556 mane     7204 Sep 13 11:26 LibraryComplexity.java
-rwxr-xr-x+ 1 sm2556 mane     2354 Sep 13 11:26 makemega_addstats.awk
-rwxr-xr-x+ 1 sm2556 mane    12782 Sep 13 11:26 mega.sh
-rwxr-xr-x+ 1 sm2556 mane     2455 Sep 13 11:26 relaunch_prep.sh
-rwxr-xr-x+ 1 sm2556 mane     5200 Sep 13 11:26 split_rmdups.awk
-rwxr-xr-x+ 1 sm2556 mane    14572 Sep 13 11:26 statistics.pl
-rwxr-xr-x+ 1 sm2556 mane     1751 Sep 13 11:26 stats_sub.awk

fastq:
total 0
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:33 S1_003_HiC_R1.fastq.gz -> ../../analysis jul052016/S1_003_HiC/Unaligned/S1_003_HiC_1.fastq.gz
lrwxrwxrwx 1 sm2556 mane 67 Sep 13 15:34 S1_003_HiC_R2.fastq.gz -> ../../analysis-jul052016/S1_003_HiC/Unaligned/S1_003_HiC_2.fastq.gz

My run.sh script for the SLURM batch submission looks as follows:

#!/bin/bash
#SBATCH --partition=general
#SBATCH --job-name=Juicer
#SBATCH --ntasks=1 --nodes=1
#SBATCH --mem-per-cpu=6000
module load BWA; module load Java;  bash /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/scripts/juicer.sh -g hg19 -d /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -q general -l general -a 'Reference' -S 'early' -p /home/sm2556/project/hic-golden-uconn-feb022216/hic-analysis-sept142017/references/Homo_sapiens_assembly19.sizes -s 'HindIII' -y /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017/restriction_sites/hg19_HindIII.txt - D /home/sm2556/project/hic-golden-uconn-feb0222016/hic-analysis-sept142017 -x

The scripts folder was copied from the cloned GitHub repository of the juicer/SLURM/scripts.

I get tons of error messages about dependencies not being satisfied, but I still get the part of script that "split" the fastq.gz file correctly, but still ends with error. The actual bwa mem call never happens on the cluster. When I tried to run the script in the CPU mode it started the alignment. But my files are too big, and CPU mode will take a long time. Am I doing something wrong?

restriction site file

Hi,
Thanks for sharing the code in this much detail!

just wondering where can I find the restriction site file

$site_file = "/opt/juicer/restriction_sites/hg19_DpnII.txt";

Is it generated by HICUP? just wanna know what's the format look like.

Thanks!
Hurley

hic scaffolding

Thanks for your work on hic scaffolding . when i download the code indicated in science , but it did not existed in the website: github.com/theaidenlab/HiC-assembly-pipeline-archive
.Could you help me?

WARNING for calculating Pearson's and eigenvector at high resolution

Dear professor

I tried to use eigenvectors to generate compartment with .hic files, when I set resolution as 250,000, it just states like this and failed to generate any files:
WARNING: Pearson's and eigenvector calculation at high resolution can take a long time
and then it fails.

I have checked the issues and found other one has mentioned that before, but I am sorry I can't find out the solutions, should I updated any file ?

Best,
Yu

Request use such that GPUs are not mandatory

Hi there

I'm running HiCCUPs on a server with ~200 CPUs but no GPUS, using the following command:

java -Xmx2g -jar /path/Juicer/scripts/juicer_tools_linux_0.8.jar hiccups -m 500 -r 5000,10000 -f 0.1,0.1 -p 4,2 -i 7,5 -d 20000,20000 -c 22 --ignore_sparsity /pathpath/HMEC_HiCPro/Flow/HiCPro/HMEC/HMEC_allValidPairs.hic HMEC.hiccups.loops

this outputs the following:

Reading file: /pathpath/HMEC_HiCPro/Flow/HiCPro/HMEC/HMEC_allValidPairs.hic
HiC file version: 8
Using the following configurations for HiCCUPS:
Config res: 5000 peak: 4 window: 7 fdr: 10% radius: 20000
Config res: 10000 peak: 2 window: 5 fdr: 10% radius: 20000
Warning Hi-C map is too sparse to find many loops via HiCCUPS.
Running HiCCUPS for resolution 5000
GPU/CUDA Installation Not Detected
Exiting HiCCUPS

That's a bummer. Often times there's a way to use CPUs instead of GPUs (e.g. with Tensorflow).

Does this exist? Can I use my many CPUs instead of GPUs for this task?

Line 512 : chimeric_blacklist.awk error

Hi there,

After alignment the pipeline crashes around line 512 in the chimeric_blacklist.awk script.

Syntax error - I havent tested yet but could be a stray "}"

Nicola

(-: Looking for fastq files...fastq files exist
Tue  3 Jan 2017 23:28:47 GMT
Juicer version:1.5
../juicer.sh -z ../references/genome.fa -p ../references/genome.dict -y ../restriction_sites/ -D ..
(-: Aligning files matching 
opt/juicer/CPU/fastq/*_R*.fastq*
 in queue  to genome hg19 with site file ../restriction_sites/
(-: Created /opt/juicer/CPU/splits and 
/opt/juicer/CPU/aligned.
Running command bwa mem -t 4 ../references/genome.fa opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq > opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 215849 sequences (19773217 bp)...
[M::mem_process_seqs] Processed 215849 reads in 1673.624 CPU sec, 1568.074 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 4 ../references/genome.fa 
/opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq
[main] Real time: 1587.733 sec; CPU: 1684.597 sec
(-:  Align of /opt/juicer/CPU/splits/CTCF_S1_L001_R1.fastq.sam done successfully
Running command bwa mem -t 4 ../references/genome.fa /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq > /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 215849 sequences (19766979 bp)...
[M::mem_process_seqs] Processed 215849 reads in 2004.714 CPU sec, 1860.978 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 4 ../references/genome.fa /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq
[main] Real time: 1881.219 sec; CPU: 2015.746 sec
(-: Mem align of /opt/juicer/CPU/splits/CTCF_S1_L001_R2.fastq.sam done successfully
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
(-: /opt/juicer/CPU/splits/CTCF_S1_L001.fastq.sam created successfully.
awk: syntax error at source line 512 source file ../scripts/common/chimeric_blacklist.awk
 context is
	t_norm, count_abnorm) >> >>>  fname1".res.txt" <<< ;
awk: illegal statement at source line 513 source file ../scripts/common/chimeric_blacklist.awk

How essential is GPU?

Hi,

Is it necessary to have GPU, if i am not immediately interested in running HICCUPS?

Sameet

STDOUT vs STDERR

java -jar ~/tools/juicebox/juicer_tools_0.7.0.jar eigenvector VC K526.links.hic chr11 BP 100000 -p > test.txt

It looks like you're printing the HiC file version to stdout rather than stderr.

Exception thrown: "java.lang.IndexOutOfBoundsException: Index: 6, Size: 4"

Hi,

I generated the input file for pre from BAM file (generated by Babraham HiCUPs) with the kind solution provided in this forum. Subsequently, I ran pre command, and have generated the .hic file. I am not sure if it is successful as I have encountered an exception.

java.lang.IndexOutOfBoundsException: Index: 6, Size: 4 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at java.util.Collections$UnmodifiableList.get(Collections.java:1211) at juicebox.tools.utils.original.AsciiPairIterator.advance(AsciiPairIterator.java:143) at juicebox.tools.utils.original.AsciiPairIterator.next(AsciiPairIterator.java:247) at juicebox.tools.utils.original.Preprocessor.computeWholeGenomeMatrix(Preprocessor.java:496) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:374) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:286) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:105) at juicebox.tools.HiCTools.main(HiCTools.java:97)
My question is how do I know if the hic file generated was complete and did not stop at the point where the exception was thrown?

Thank you.

PS:
Code for converting bam to input file for pre:
samtools view read1_2.hicup.bam | awk 'BEGIN {FS="\t"; OFS="\t"} {name1=$1; str1=and($2,16); chr1=substr($3, 4); pos1=$4; mapq1=$5; getline; name2=$1; str2=and($2,16); chr2=substr($3, 4); pos2=$4; mapq2=$5; if(name1==name2) { if (chr1>chr2){print name1, str2, chr2, pos2,1, str1, chr1, pos1, 0, mapq2, mapq1} else {print name1, str1, chr1, pos1, 0, str2, chr2, pos2 ,1, mapq1, mapq2}}}' | sort -k3,3d -k7,7d > Arrowhead.input
my command for generating .hic file:
java -Xmx2g -jar /mnt/projects/wlwtan/cardiac_epigenetics/george/juicer/juicer_tools.1.6.2_linux_jcuda.0.8.jar pre -f mm9_DpnII.txt -q 30 Arrowhead.input Arrowhead.hic mm9

juicer on PBS, first job terminated then remaining jobs are orphaned

After the update to the PBS version of the juicer scripts I am able to run juicer.sh. However now all the jobs are created but the first job for some reason terminates and ends up causing the remaining jobs to become orphans. I am just trying it on the small test data set provided in the wiki.

When I first run juicer.sh it creates 5 jobs seen here:

Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
207555.merlot     AlnWrpC18126     stansfieldjc             0 R workq
207556.merlot     MStWrpC18126     stansfieldjc             0 H workq
207557.merlot     RDpWrpC18126     stansfieldjc             0 H workq
207558.merlot     SpWrp1C18126     stansfieldjc             0 H workq
207561.merlot     SpWrp2C18126     stansfieldjc             0 H workq

I then get the following email from the cluster after a minute or two:
PBS Job Id: 207556.merlot.bis.vcu.edu
Job Name: MStWrpC18126
Aborted by PBS Server
Job deleted as result of dependency on job 207555.merlot.bis.vcu.edu

And after that the remaining 3 jobs remain orphaned and on hold.

I then got the next email from the cluster:

PBS Job Id: 207555.merlot.bis.vcu.edu
Job Name: AlnWrpC18126
Post job file processing error; job 207555.merlot.bis.vcu.edu on host node10

Do you know what is going on here or how I can fix it?

awk: /home/ljw/juicer/scripts/common/chimeric_blacklist.awk: line 515: function and never defined

I have run a single CPU version of juicer by the command
bash ~/juicer/scripts/juicer.sh -d ~/juicer/work/DNA -s none -z ~/juicer/references/hg19.fa -p ~/juicer/references/hg19.sizes -D ~/juicer -x
And I got the following error
awk: /home/ljw/juicer/scripts/common/chimeric_blacklist.awk: line 515: function and never defined
It seems that chimeric_blacklist.awk only has 513 lines. How can I fix this? Thank you.

90% chimeric ambiguous reads on more than 10 experiments using standard enzymes & syntax.

Hello!

The title says it all. Is there any way to discover why these reads are registering as chimeric ambiguous? None of the reference sets tend to have such odd stats. I have substituted the names of our conditions and genes in order to protect our ability to publish the results.
Here is the syntax used to run juicer:
module load juicer
cd /scratch/Experiment1/
juicer.sh -p $JUICER/references/hg19.chrom.sizes -s HindIII -y /usr/local/apps/juicer/juicer-1.5/SLURM/restriction_sites/hg19_HindIII.txt
Each folder has two fastq files and they are paired with the _R1.fastq.gz extension.

-bash-4.1$ head .hic -n 14
HI69/usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 51,225,101
Normal Paired: 4,837,169 (9.44%)
Chimeric Paired: 0 (0.00%)
Chimeric Ambiguous: 46,387,931 (90.56%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 17,626,936 (34.41%)
Alignable (Normal+Chimeric Paired): 4,837,169 (9.44%)
Unique Reads: 4,371,746 (8.53%)
PCR Duplicates: 460,436 (0.90%)
Optical Duplicates: 4,987 (0.01%)
Library Complexity Estimate: 23,718,686
Intra-fragment Reads: 41,555 (0.08% / 0.95%)
Below MAPQ Threshold: 832,482 (1.63% / 19.04%)
-bash-4.1$ head .hic -n 14
HI /usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 53,112,216
Normal Paired: 4,531,213 (8.53%)
Chimeric Paired: 0 (0.00%)
Chimeric Ambiguous: 48,581,002 (91.47%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 15,353,175 (28.91%)
Alignable (Normal+Chimeric Paired): 4,531,213 (8.53%)
Unique Reads: 4,165,313 (7.84%)
PCR Duplicates: 361,219 (0.68%)
Optical Duplicates: 4,681 (0.01%)
Library Complexity Estimate: 26,831,778
Intra-fragment Reads: 51,098 (0.10% / 1.23%)
Below MAPQ Threshold: 821,990 (1.55% / 19.73%)
-bash-4.1$ head .hic -n 14
HIn▒/usr/local/apps/juicer/juicer-1.5/SLURM//references/hg19.chrom.sizesstatisticsExperiment description:
Sequenced Read Pairs: 70,885,255
Normal Paired: 4,735,332 (6.68%)
Chimeric Paired: 1 (0.00%)
Chimeric Ambiguous: 66,149,921 (93.32%)
Unmapped: 0 (0.00%)
Ligation Motif Present: 16,650,157 (23.49%)
Alignable (Normal+Chimeric Paired): 4,735,333 (6.68%)
Unique Reads: 4,391,686 (6.20%)
PCR Duplicates: 338,623 (0.48%)
Optical Duplicates: 5,024 (0.01%)
Library Complexity Estimate: 31,443,095
Intra-fragment Reads: 50,678 (0.07% / 1.15%)
Below MAPQ Threshold: 807,014 (1.14% / 18.38%)

Thanks,
James D

useuse

Hi,

I'm using juicer 1.5.5 and I'm running into an issue right of the gate with a script called useuse. It's referenced in juicer.sh, but is not included in the 1.5.5 release.

source/juicer-1.5.5/UGER/scripts/juicer.sh: line 337: /broad/software/scripts/useuse: No such file or directory

*_msplit*_optdups.txt no file or directory

Hi,

I have gotten this error in trying to run Juicer .
The job output suggests it is successfully completed, but it looks like that job runs scripts to create _msplit_optdups.txt failed, and opt_dups.txt is an empty file.

Does this error matter ?
Any thoughts you have would be really appreciated!
Thanks.
Chengfei

WARNING for calculating Pearson's and eigenvector at high resolution

I'm calculating eigenvectors from hic files and when I go below 500,000 bp resolution, I get this warning:
WARNING: Pearson's and eigenvector calculation at high resolution can take a long time
and then it fails.

It is possible to bypass this warning and forge ahead with higher resolution?

juicer.sh SLURM script

Discovered on lines 407 and 452 code is:

if [ -v shortread ] || [ "$shortreadend" -eq 1 ]

but getting a

[: -v: unary operator expected error

I'm guessing this needs to be changed to:
if [ -v $shortread ] || [ "$shortreadend" -eq 1 ]

to call the $shortread variable. Can you please check this?

CPU threads flag

I'm using AWS EC2 instances, and I was wondering how I can utilize more than one cpu (which is how I assume the cpu "version" works). Was there something like a --threads flag? I also noticed the AMI for Juicer is for version 1.06, so I decided to install it on a fresh instance instead.

Mitochondria hardcoded length in chimeric_blacklist

In chimeric_blacklist, the size of the mitochondria is hardcoded to hg19. This is to deal with circular chromosomes - trying to assign position correctly based on CIGAR string, sometimes the position will end up off the end of the chromosome, in which case it maps to beginning. This is the code:

# Mitochondria loops around
	  if (chr[j] ~ /MT/ && pos[j] >= 16569) {
	    pos[j] = pos[j] - 16569;
	  }

Theoretically, any differently sized MT could go off the end (mouse for example); and any mitochondrial chromosome not named "MT" could also go off the end. In practice this hasn't happened; however, we should keep our eyes on this issue.

Relative directory - fix is probably to prepend $(pwd) to input directory

The error message throws like this:

juicer$ ./juicer.sh -g hg19 -d XXX -s HindIII -p references/hg19.chrom.sizes
(-: Looking for fastq files...fastq files exist
Wed Jan 4 11:52:48 EST 2017
Juicer version:1.5
./juicer.sh -g hg19 -d XXX -s HindIII -p references/hg19.chrom.sizes
(-: Aligning files matching XXX/fastq/_R.fastq*
in queue to genome hg19 with site file ./restriction_sites/hg19_HindIII.txt
--- Using already created files in XXX/splits
gzip: XXX/splits/XXXHiC-HI-1_S0_R1.fastq.gz: No such file or directory
gzip: XXX/splits/XXXHiC-HI-1_S0_R2.fastq.gz: No such file or directory

The problem has to do with the soft links and relative directories. If you do ls -lh XXX/splits/XXXHiC-HI-1_S0_R1.fastq.gz
it probably points to XXX/fastq/XXXHiC-HI-1_S0_R1.fastq.gz - which from the perspective of that directory, does not exist (would be under the splits directory).

To correct it, either run juicer from your directory (i.e., cd XXX then run juicer instead of sending in “-d” flag - juicer calls ‘pwd’ which gives the absolute path) or run with the -d flag but with the absolute directory (i.e.-d /path/to/my/folder/XXX)

Error running on PBS cluster

I am trying to run juicer on a PBS cluster using the new PBS scripts. When I run the juicer.sh script I get the following error:

 Starting job to launch other jobs once splitting is complete
207474.merlot.bis.vcu.edu
below is the jID_alignwrap jobid
207474.
#PBS -W depend=afterok:207474.
qsub: illegal -W value

I think this is because of the period after the job ID number. For reference on our cluster jobs are named like this: 206998.merlot and can be called using only the number.

How can I modify the script to only use the number and drop the period from the job ID being used for the PBS -W command?

The location in generate_site_positions.py script

Dear,

  1. I want to use the generate_site_positions.py script to create a restriction sites file for my study genome. But I don't know what is the [location] parameter in this python script?
    generate_site_positions.py <restriction enzyme> <genome> [location]

  2. By the way, can I use the [-s site] or alternatively use[-y restriction site file] parameters in juicer.sh? I meaning the restriction sites file is not need if I set the [-s site] parameter.

  3. What is the [-p chrom.sizes path] parameter and function in the juicer sortware?

Thanks.

juicebox_clt.jar file

Dear professor,

I tried to install the juicer on the computer, but I can't find juicebox_clt.jar files, I am not sure whether it has been replaced by juicer_tools_0.7.5.jar or just because I didn't install it correctly?

Best,
Yu

about normalization

Hi,sorry to bother you!
could you please tell me about the three normalization methods(VC,VC_SQRT,KR)?
Are they similar to the distance normalization when we calculated the eigenvector?
waiting for you reply!
Best wishes

What are the general principles of the VC, VC_SQRT and KR normalization methods?

hello,
I used juicer_tools to dump my Hi-C data recently. In juicer's dump you provided three normalization methods: VC, VC_SQRT, KR, and I want to know what are the principles of them. I searched them on the internet and your paper(Rao et al. 2014), but only find KR.
I have to know about the normalization method of Hi-C in my study, so would you tell me the general principles of the three normalization method in your tools? Or some relative materails and references is good.
Thank you!
Yours,
J.Wan

Dump issue

Hi,
Sorry to bother you but I try to extract the matrix from the .hic data download from GEO, however, I always came with the error as below:
HiC file version: 8
Exception in thread "main" java.lang.NullPointerException
at juicebox.tools.clt.old.Dump.extractChromosomeRegionIndices(Dump.java:487)
at juicebox.tools.clt.old.Dump.readArguments(Dump.java:356)
at juicebox.tools.HiCTools.main(HiCTools.java:85)

Could you help me with that issue?

Best,
Yu

juicebox_tools.jar pre erro

Dear professor,
when I use the command ,there also have an error. could you help me?

java -jar /share/nas30/liufuyan/Project/AT/Interaction/06.TAD/Soft/juicebox-master/out/artifacts/Juicebox_tools_jar/juicebox_tools.jar pre -f ../QC/digest_AT.fa.bed -q 0 tmp/93500_allValidPairs.pre_juicebox_sorted test ../QC/AT.fa.len
Skipping Chr1 30427671
Skipping Chr2 19698289
Skipping Chr3 23459830
Skipping Chr4 18585056
Skipping Chr5 26975502
Warning: Unable to process fragment file. Pre will continue without fragment file.
Start preprocess
Writing header
Writing body
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1457)
at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1228)
at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:642)
at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:373)
at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:283)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:106)
at juicebox.tools.HiCTools.main(HiCTools.java:83)

Path issue for CPU script execution

Hi there,

It seems like there is a bit of a path error for script execution with the "CPU" pipeline. For example,

JUICER_INSTALL_DIR=/lab/solexa_weng/testtube/juicer
SCRIPT_DIR=/lab/solexa_weng/testtube/juicer/CPU/common/
JUICER_WORK_DIR=(absolute path to directory in current directory, fastq files properly setup there)
${JUICER_INSTALL_DIR}/CPU/juicer.sh -D $SCRIPT_DIR -g MyGenome -t 16 -z ../../MyGenome.fasta -p chrom_sizes.txt -y MyGenome.fasta_MboI.txt -d $JUICER_WORK_DIR 1>juicer.stdout.log 2>juicer.stderr.log

Runs for a bit, giving a non-exiting error:

/lab/solexa_weng/testtube/juicer/CPU/juicer.sh: line 363: /lab/solexa_weng/testtube/juicer/CPU/common//scripts/common/countligations.sh: No such file or directory

and the exiting error:

awk: fatal: can't open source file /lab/solexa_weng/testtube/juicer/CPU/common//scripts/common/chimeric_blacklist.awk' for reading (No such file or directory)

If we look at where the countligations.sh and chimeric_blacklist.awk files are:

tfallon@tak4 /lab/solexa_weng/Seq_data/Projects/Tim_Fallon/ppyralis_genome/Genome_project_reference_assemblies/version2/analyses/juicer$ find /lab/solexa_weng/testtube/juicer/ -name countligations.sh
/lab/solexa_weng/testtube/juicer/CPU/common/countligations.sh
/lab/solexa_weng/testtube/juicer/SLURM/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/AWS/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/UGER/scripts/countligations.sh
/lab/solexa_weng/testtube/juicer/LSF/scripts/countligations.sh

tfallon@tak4 /lab/solexa_weng/Seq_data/Projects/Tim_Fallon/ppyralis_genome/Genome_project_reference_assemblies/version2/analyses/juicer$ find /lab/solexa_weng/testtube/juicer/ -name chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/CPU/common/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/SLURM/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/AWS/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/UGER/scripts/chimeric_blacklist.awk
/lab/solexa_weng/testtube/juicer/LSF/scripts/chimeric_blacklist.awk

It looks like the issue may be that the juicer.sh script expects "/scripts/common/" as a hardcoded prefix, however the CPU scripts don't follow this convention. Do you agree? Or am I executing the juicer pipeline wrong

generate_site_positions.py fails when location is provided

Slight logic error in the if/else loops at the top.

If genome is one of the listed genomes, AND the location is provided, run still fails because /seq/reference isn't universal.

You need to add an

elif len(sys.argv)==3

(etc) at line 24 or check if the len(sys.argv)==4 before and use filename= instead.

juicer pre error for human

hi,
I am using pre to produce .hic file and the command line is:

java -jar juicer_tools.1.7.6_jcuda.0.8.jar pre -r 40000 -q 30 -f ../01.data/hg19_MobI.txt ../01.data/test3.txt.gz ./M3-736.hic hg19

while I met this problem:
Start preprocess
Writing header
Writing body
......Error: the chromosome combination 14_15 appears in multiple blocks

Do you know why the error happen? Look forward for your reply. Thank you so much.
min

can I user Juicer on a cluster without root?

note that Juicer is assumed to be located in /opt/juicer, when I run the command as the instruction suggests, I got error "***! Reference sequence /opt/juicer/references/Homo_sapiens_assembly19.fasta does not exist", I don't have the root to build Juices in /opt/, is there any way that I can use it without root?
Thanks.

SLURM issues with scontrol update

Some SLURM users can't use scontrol update. The workaround is to run in two stages. If stage is early exit, the scontrol commands should not happen.

function not defined in chimeric_blacklist.awk

Hi, i've updated the CPU / chimeric_blacklist.awk script and now get an error on line 223 when running the example.
223: str[j] = and(tmp[2],16);
it may be that and() is not defined in the OS X version of awk. bwa runs and completes, then after two sorting steps there is an error:

$ ./juicer.sh -s HindIII -g hg38
(-: Looking for fastq files...fastq files exist
Fri 6 Jan 2017 13:32:35 GMT
Juicer version:1.5
....
(-: Sort read 1 aligned file by readname completed.
(-: Sort read 2 aligned file by readname completed.
(-: /Users/stuart/NGSTools/Juicer/CPU/splits/HIC003_S2_L001_001.fastq.sam created successfully.
awk: calling undefined function and
input record number 3, file /Users/stuart/NGSTools/Juicer/CPU/splits/HIC003_S2_L001_001.fastq.sam
source line number 223

thanks,

Stuart

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.