fl-yu / cut-runtools-2.0 Goto Github PK

View Code? Open in Web Editor NEW

88.0 88.0 57.0 204.69 MB

CUT&RUN and CUT&Tag data processing and analysis

License: MIT License

Python 9.83% Shell 53.31% Awk 0.10% C 5.58% R 31.19%

bioinformatics chromatin cutnrun cutntag epigenome genomics single-cell tf tn5

cut-runtools-2.0's Introduction

Hi there 👋

cut-runtools-2.0's People

Contributors

Stargazers

Watchers

Forkers

gcyuan simiro garci1294 dklcdbi ifanirene bit-vs-it ferenckata jamesc99 matthewdbradley l1angyan mubashermohammed ghkpddt ashkernow jjz46 cukalon mistrm82 baijiexu rmarable yuanzehu88 nandankita yuzutao1990 xlw1207 rhodnius pvvhoang shyhihihi rdong08 zyan0626 hjistb helloyihan ljl-suosi fengqq yuhaotan2 stuguy gaoming-liao mia-yao hchen-melody lzgzyzy xuexinyu11 pinpinsui liulab wangzishan shanshan950 chenweng1991 minyan19940 andyliuwb skaonis sukanyar92 liuchuan111 epalto ruby-luo-0309 cynthiamoncadareid rickyzhaoweihan 256wangliu

cut-runtools-2.0's Issues

bug fix suggestions on bulk-pipeline

Hi, I was tesing CutRunTools2 bulk-pipeline, and suggest you to fix some bugs.
1: The file validate.py is wrote in python2, suggest to change it to python3, since that is your conda env based on. Please replace type=file to type=open, and the rest can be fixed by 2to3.

< 	parser.add_argument("config", type=open)
---
> 	parser.add_argument("config", type=file)

2: The version of meme installed using your command is 5.3.0, also conda needs to run 'conda update --all' to avoid the lack of library error when calling meme/5.3.0:

	libicui18n.so.58 => not found
	libicuuc.so.58 => not found
	libicudata.so.58 => not found

But your code is based on 5.1.0, -dreme-m is replaced with -streme-nmotifs in 5.3.0.

$memebin/meme-chip -oc $meme_outdir -dreme-m $num_motifs -meme-nmotifs $num_motifs $mpaddedfa/$summitfa

I changed the code, but then there is error about can't read motif meme.xml. So have you tested if it work on meme/5.3.0? I'm thinking about rolling back to install meme/5.1.0 instead with "conda install meme=5.1.0".

3: There is one extra R package was needed.

R -e 'install.packages("viridis")'

inquire about trim adapter

Hi,

I am using CRTool and I wanna give my thanks to you for developing such great software!

But I do have a question about the adapter. How did you choose the adapter for CUT&RUN, CUT&Tag data? is there any doc/paper that recommends it? I am using cutadapt to mimic your analysis process and wondering which adapter should I use, I notice there are plenty of adapters in your dir.

If you do have some universal adapter specifically for CUT&Tag-seq data, please let me know! really appreciate!

Thanks for your time and help!

Ryan

MEME-chip error

Hi
I downloaded the tool and installed it following your instructions, including MEME, which is installed in my home directory. I get the following error when CUT-RUN-Tools runs MEME:
I get an error when I try to cut the tool.

Unknown option: dreme-m
The sequences specified do not exist.

meme-chip [options] [-db ]*

Options:
-o

: output to the specified directory, failing if the directory exists
-oc : output to the specified directory, overwriting if the directory exists

Indeed, I checked and meme-chip doesn't have this option (also in the online manual).

What am I doing wrong? Should I also download MEME databases? I didn't see the databases in the installation.

Thanks
Tsviya

Please improve error reporting with 'validate.py'

When running validate.py against a non-existing configuration file, the following output is returned:

./validate.py --ignore-input-output --software bulk-config2.json
Traceback (most recent call last):
  File "./validate.py", line 41, in <module>
    args = vars(parser.parse_args(sys.argv[1:]))
  File "/usr/lib64/python2.7/argparse.py", line 1705, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/usr/lib64/python2.7/argparse.py", line 1737, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/usr/lib64/python2.7/argparse.py", line 1946, in _parse_known_args
    stop_index = consume_positionals(start_index)
  File "/usr/lib64/python2.7/argparse.py", line 1902, in consume_positionals
    take_action(action, args)
  File "/usr/lib64/python2.7/argparse.py", line 1795, in take_action
    argument_values = self._get_values(action, argument_strings)
  File "/usr/lib64/python2.7/argparse.py", line 2235, in _get_values
    value = self._get_value(action, arg_string)
  File "/usr/lib64/python2.7/argparse.py", line 2264, in _get_value
    result = type_func(arg_string)
IOError: [Errno 2] No such file or directory: 'bulk-config2.json'

It would be great if this IOError could be properly trapped so the script returns more descriptive output.

./validate.py --ignore-input-output --software bulk-config2.json
***ERROR***
Missing configuration file: bulk-config2.json

input sample_name

Hi,

I have a question. Does this pipeline only read one pair of samples? What if I have multiple samples?

Thanks

Spike-in normalization also uses CPM normalization?

Hi FLYu,

I am wondering why you use CPM normalization along with spike-in normalization in the following call to bamCoverage:

$path_deeptools/bamCoverage --bam $bam_file -o $outdir/"$base_file".spikein_normalized.bw \
                --binSize 10 \
                --normalizeUsing CPM \
                --effectiveGenomeSize $eGenomeSize \
                --scaleFactor $scale_factor

Why do you need to normalize by CPM if you are already normalizing by spike-in control? Don't these two normalization steps conflict?

Thanks for your insight,
David

input sample sequencing data

Hi, I am a biologist with a limited background in NGS analysis. And recently I did a bunch of bulk CUT&RUN seq and tried to analyze by myself. So glad to see this pipeline!
In my case, I was using the fragmentase to generate input control for each of my samples, but for this analysis, I don't know where to put this input sequence data.
Please indicate how to use the input control data and appreciate your response.

Best,
Chen

scale_factor equals 0

Hi @fl-yu,

Quick question;

I keep on getting the wrong spikein_normalized files for my samples. I believe the issue is generating from the --scaleFactor from $path_deeptools/bamCoverage in bulk-pipeline.sh.

scale=$spikein_scale
scale_factor=`printf "%.0f" $(echo "$scale / $spikein_reads"|bc)`

My $scale is 10000 (the default) and $spikein_reads is 14638036. When the pipeline inserts these into the scale_factor equation, it just equals zero, and it uses 0 for normalization. My bw normalized file ends up being very small, and it doesn't look correct.

Is this what is suppose to do?

Thank you so much,
Jesus

empty trim2 directory

Dear Fulong,

Sorry for bothering again. Now I have the right adaptor sequence fasta and added it to the programme. However, the logs seem to tell only first trimming happened. I got perl warning:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

I tried to fix it by adding the following to ~/.bashrc
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

After source ~/.bashrc, the waring from perl disappeared but still nothing coming up in trim2 directory. Therefore, no input to continue the analysis. There was also error for no version finding of bowtie2.

Our bioinformatic facility is also helping to solve this problem. But I think maybe you more about this issue.

Thanks for your help in advance!

Best,
Yuling

Can't run single-cell example

Hi,

I am having issues to run the tool with the single cell example dataset.

[info] 200 PE fastq files were detected ...
[info] First stage trimming ...
Thu Jun  9 10:00:24 AEST 2022
[info] Second stage trimming ...
Thu Jun  9 10:00:51 AEST 2022
[info] Aligning FASTQ files ...
Thu Jun  9 10:00:55 AEST 2022
[info] Alignment finished ...
Thu Jun  9 10:06:44 AEST 2022
[info] Sorting BAMs... 
Thu Jun  9 10:06:44 AEST 2022
[info] Marking duplicates... 
Thu Jun  9 10:07:52 AEST 2022
[info] Filtering unmapped, low quality, unproper paired fragments... 
Thu Jun  9 10:15:11 AEST 2022
[info] Generating coverage .bed files used for single-cell genome track visualization ... 
Thu Jun  9 10:15:13 AEST 2022
[info] Aggregation analysis of individual cells... 
Thu Jun  9 10:28:24 AEST 2022
Thu Jun  9 10:28:24 2022 Processing the group [[ groups_aggregation ]] ...
Thu Jun  9 10:28:24 2022 Build the scbam_dir directory /home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells ...
Thu Jun  9 10:28:24 2022 Build the scPS_dir directory /home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/pseudo_bulk_data ...
Thu Jun  9 10:28:24 2022 200 Barcode bam files will be copied and merged ...
[info] single-cell track generating
/home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk # printed after adding pwd to file src/BASHscript/qbed.sh to check what was going on
/home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells # printed after adding pwd to file src/BASHscript/qbed.sh to check what was going on
[info] 1 files will be processed into a single-cell genome track file
awk: fatal: cannot open file `*.bed' for reading (No such file or directory)

It looks like the command bed_file=(`echo *.bed`) in src/BASHscript/qbed.sh is returning '*.bed' (hence causing the awk error shown above) because there are no bed files in the directory /home/baldoni.p/cutruntools2/scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells, only bam files:

(base) [baldoni.p@med-n14 cutruntools2]$ ll scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/*.bed
ls: cannot access scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/*.bed: No such file or directory
(base) [baldoni.p@med-n14 cutruntools2]$ ll scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/*.bam | head
-rw-r--r-- 1 baldoni.p  1.9M Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204735.bam
-rw-r--r-- 1 baldoni.p 1008K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204736.bam
-rw-r--r-- 1 baldoni.p  1.1M Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204737.bam
-rw-r--r-- 1 baldoni.p  945K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204738.bam
-rw-r--r-- 1 baldoni.p  908K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204739.bam
-rw-r--r-- 1 baldoni.p  832K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204740.bam
-rw-r--r-- 1 baldoni.p  805K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204741.bam
-rw-r--r-- 1 baldoni.p  735K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204742.bam
-rw-r--r-- 1 baldoni.p  767K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204743.bam
-rw-r--r-- 1 baldoni.p  669K Jun  9 10:28 scCUTnTag/sc_pseudoBulk/groups_aggregation/single_cells/SRX5204744.bam

All the bed files are located in another directory:

(base) [baldoni.p@med-n14 cutruntools2]$ find . -name "*.bed" | head
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204735.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204736.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204737.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204738.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204739.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204740.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204741.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204742.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204743.bed
./scCUTnTag/sc_aligned.aug10/dup.marked.clean/SRX5204744.bed

Could you please help?

Feature Request: Support for S3

It would be great if there was a way to leverage S3 for storing data.

KSEQ trouble

when running KSEQ on paired-end fastq files with a read length of 150 that have already been trimmed by trimmomatic i am getting an output of fastq files where a majority of reads have a length of 50bp. Is this right? apologies as this is my introduction to any form of RNA sequencing pipelines.

get_cuts_single_locus.sh missing

Hi,

I after running the example data, I got no error. But I cannot find the get_cuts_single_locus.sh file. Could you help?

paths in bulk-config.jason file

I noteiced in your bulk-config.json file your entry for java was:
"javabin": "/homes6/fulong/miniconda3/envs/dfci1/bin",
Where did the dfci conda environment come from ? I do not have this environment. Where should my javabind be directed to ?

Also, the memebin path is "memebin": "/homes6/fulong/miniconda3/envs/py3/bin",
I have a python3env conda environment, so I used that instead of py3.

And finally, for the bowtie index and genome.fa files. My guess is that you can put them anywhere you want. Is that true, or is there a particular folder that the bowtie index and genome.fa files should go ?

"bt2idx": "/gcdata/gcproj/fulong/Data/Genomes/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index",
"genome_sequence": "/gcdata/gcproj/fulong/Data/Genomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa",

Samview failed to read header

Hi - thank you for making your software available to use.

I'm having trouble running the bulk example data, where I receive the error 'samview failed to read header' during the early part of the pipeline. The intermediate file /~output-folder/aligned/GATA1_D7_30min.bam is empty.

Is this an issue you've seen before? I am running the pipeline with your premade conda environment on an SGE computing cluster.

Full code below - thank you

(cutruntools2) [qmy094@rescomp1 CUT-RUNTools-2.0]$ ./run_bulkModule.sh /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/JSONs/testconfig.json GATA1_D7_30min
==================================== Bulk data analysis pipeline will run ==============================================================
## Input FASTQ folder:            /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData
## Sample name:                   GATA1_D7_30min
## Workdir folder:                /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min
## Experiment name:               GATA1_D7_30min
## Experiment type:               CUT&RUN
## Reference genome:              hg38
## Spike-in genome:               FALSE
## Spike-in normalization:        FALSE
## Fragment 120 filtration:       TRUE
=================================================================================================================================
[info] Input file is GATA1_D7_30min_R1_001.fastq.gz and GATA1_D7_30min_R2_001.fastq.gz
Wed Jun 22 15:56:37 BST 2022
[info] Trimming file GATA1_D7_30min ...
Wed Jun 22 15:56:37 BST 2022
[info] Use Truseq adaptor as default 
[info] Second stage trimming GATA1_D7_30min ...
Wed Jun 22 15:56:38 BST 2022
[info] Aligning file GATA1_D7_30min to reference genome...
Wed Jun 22 15:56:38 BST 2022
[info] Bowtie2 command: --dovetail --phred33
[info] The dovetail mode is enabled [as parameter frag_120 is on]
[main_samview] fail to read the header from "-".
[info] FASTQ files won't be aligned to the spike-in genome
[info] Filtering unmapped fragments... GATA1_D7_30min.bam
Wed Jun 22 15:56:39 BST 2022
[main_samview] fail to read the header from "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/aligned/GATA1_D7_30min.bam".
[info] Sorting BAM... GATA1_D7_30min.bam
Wed Jun 22 15:56:39 BST 2022
INFO    2022-06-22 15:56:40     SortSam

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    SortSam -INPUT sorted/GATA1_D7_30min.step1.bam -OUTPUT sorted/GATA1_D7_30min.bam -SORT_ORDER coordinate -VALIDATION_STRINGENCY SILENT
**********


15:56:40.407 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs2/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/picard-2.8.0.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Jun 22 15:56:40 BST 2022] SortSam INPUT=sorted/GATA1_D7_30min.step1.bam OUTPUT=sorted/GATA1_D7_30min.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=SILENT    VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Jun 22 15:56:40 BST 2022] Executing as [email protected] on Linux 3.10.0-1160.66.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_172-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.7-SNAPSHOT
INFO    2022-06-22 15:56:40     SortSam Finished reading inputs, merging and writing to output now.
[Wed Jun 22 15:56:40 BST 2022] picard.sam.SortSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
[info] Marking duplicates... GATA1_D7_30min.bam
Wed Jun 22 15:56:40 BST 2022
[info] Removing duplicates... GATA1_D7_30min.bam
Wed Jun 22 15:56:49 BST 2022
[info] Filtering to <120bp... dup.marked and dedup BAMs
Wed Jun 22 15:56:49 BST 2022
[info] Creating bam index files... GATA1_D7_30min.bam
Wed Jun 22 15:56:49 BST 2022
[info] Reads shifting 
Wed Jun 22 15:56:49 BST 2022
[info] Your data won't be shifted as the experiment_type is specified as CUT&RUN... 
[info] Peak calling using MACS2... GATA1_D7_30min.bam
[info] Logs are stored in /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/logs
Wed Jun 22 15:56:49 BST 2022
[info] Peak calling with BAM file with NO duplications
[info] macs2 narrow peak calling
[info] macs2 broad peak calling
[info] Getting broad peak summits
Traceback (most recent call last):
  File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/get_summits_broadPeak.py", line 14, in <module>
    f = open(sys.argv[1])
FileNotFoundError: [Errno 2] No such file or directory: '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.broad/GATA1_D7_30min_peaks.broadPeak'
[info] SEACR stringent peak calling
Traceback (most recent call last):
  File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/change.bdg.py", line 12, in <module>
    f = open(sys.argv[1])
FileNotFoundError: [Errno 2] No such file or directory: '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/seacr/GATA1_D7_30min_treat_pileup.bdg'
Calling enriched regions without control file
Proceeding without normalization of control to experimental bedgraph
Using stringent threshold
Creating experimental AUC file: Wed Jun 22 15:57:01 BST 2022
Calculating optimal AUC threshold: Wed Jun 22 15:57:01 BST 2022
Using user-provided threshold: Wed Jun 22 15:57:01 BST 2022
Error in read.table(argsL$exp) : no lines available in input
Execution halted
Unable to access /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/seacr/GATA1_D7_30min_treat.stringent.bed
Traceback (most recent call last):
  File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/get_summits_seacr.py", line 14, in <module>
    f = open(sys.argv[1])
FileNotFoundError: [Errno 2] No such file or directory: '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/seacr/GATA1_D7_30min_treat.stringent.bed'
[info] Generating the normalized signal file with BigWig format... 
Wed Jun 22 15:57:03 BST 2022
cp: cannot stat '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.narrow/GATA1_D7_30min.cpm.norm.bw': No such file or directory
cp: cannot stat '/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.narrow/GATA1_D7_30min.cpm.norm.bw': No such file or directory
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output/GATA1_D7_30min/peakcalling/macs2.narrow/GATA1_D7_30min_peaks.narrowPeak
cat: GATA1_D7_30min_peaks.narrowPeak: No such file or directory
cat: GATA1_D7_30min_summits.bed: No such file or directory
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
Warning: the index file is older than the FASTA file.
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Log::Log4perl configuration looks suspicious: No loggers defined at /gpfs2/well/jknight/users/qmy094/software/conda/skylake/envs/meme/lib/site_perl/5.26.2/Log/Log4perl/Config.pm line 325.
Starting getsize: getsize random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa 1> $metrics
getsize ran successfully in 0.043017 seconds
Starting fasta-most: fasta-most -min 50 < random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa 1> $metrics
fasta-most ran successfully in 0.499825 seconds
Starting fasta-center: fasta-center -dna -len 100 < random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa 1> random1000/MEME_GATA1_D7_30min_shuf/seqs-centered
fasta-center ran successfully in 0.142205 seconds
Starting fasta-shuffle-letters: fasta-shuffle-letters random1000/MEME_GATA1_D7_30min_shuf/seqs-centered random1000/MEME_GATA1_D7_30min_shuf/seqs-shuffled -kmer 2 -tag -dinuc -dna -seed 1
fasta-shuffle-letters ran successfully in 0.026512 seconds
Starting fasta-get-markov: fasta-get-markov -nostatus -nosummary -dna -m 1 random1000/MEME_GATA1_D7_30min_shuf/GATA1_D7_30min_summits_padded.fa random1000/MEME_GATA1_D7_30min_shuf/background
fasta-get-markov ran successfully in 0.034755 seconds
Starting meme: meme random1000/MEME_GATA1_D7_30min_shuf/seqs-centered -oc random1000/MEME_GATA1_D7_30min_shuf/meme_out -mod zoops -nmotifs 10 -minw 6 -maxw 30 -bfile random1000/MEME_GATA1_D7_30min_shuf/background -dna -revcomp -nostatus
No sequences found in file `random1000/MEME_GATA1_D7_30min_shuf/seqs-centered'.  Check file format.
meme exited with error code 1
Starting dreme: dreme -verbosity 1 -oc random1000/MEME_GATA1_D7_30min_shuf/dreme_out -png -dna -p random1000/MEME_GATA1_D7_30min_shuf/seqs-centered -n random1000/MEME_GATA1_D7_30min_shuf/seqs-shuffled -m 10
  File "/well/jknight/users/qmy094/software/conda/skylake/envs/meme/bin/dreme", line 765
    print "Finding secondary RE in left flank..."
                                                ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Finding secondary RE in left flank...")?
dreme exited with error code 1
Starting meme-chip_html_to_tsv: meme-chip_html_to_tsv random1000/MEME_GATA1_D7_30min_shuf/meme-chip.html random1000/MEME_GATA1_D7_30min_shuf/summary.tsv "meme-chip -oc random1000/MEME_GATA1_D7_30min_shuf -dreme-m 10 -meme-nmotifs 10 random1000/padded.fa/GATA1_D7_30min_summits_padded.fa" 5.0.5 "Mon Mar 18 20\:12\:19 2019 -0700"
meme-chip_html_to_tsv ran successfully in 0.819111 seconds
[info] De Novo motifs can be found: random1000/MEME_GATA1_D7_30min_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
  File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/read.meme.py", line 94, in <module>
    dreme_matrices = read_dreme(this_dir + "/dreme_out/dreme.txt")
  File "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/read.meme.py", line 47, in read_dreme
    f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_GATA1_D7_30min_shuf/dreme_out/dreme.txt'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_GATA1_D7_30min_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
Warning: the index file is older than the FASTA file.
[info] Scaning the De Novo motifs for each peak
ls: cannot access random1000/MEME_GATA1_D7_30min_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/GATA1_D7_30min
# 
#        Congrats! The bulk data analysis is complete!

and here is my config.json

{
    "software_config": {
        "Rscriptbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "pythonbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "perlbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
        "javabin": "/apps/well/java/jdk1.8.0_latest/bin",
        "bowtie2bin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "samtoolsbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "macs2bin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "memebin": "/well/jknight/users/qmy094/software/conda/skylake/envs/meme/bin", 
        "bedopsbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "bedtoolsbin": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "path_deeptools": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
        "path_parallel": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin", 
        "path_tabix": "/well/jknight/users/qmy094/software/conda/skylake/envs/cutruntools2/bin",
        "bt2idx": "/gpfs2/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/assemblies/hg38", 
        "genome_sequence": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install/assemblies/hg38/hg38.fa", 
        "spike_in_bt2idx": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/assemblies/Escherichia_coli_K_12_DH10B", 
        "spike_in_sequence": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/assemblies/Escherichia_coli_K_12_DH10B/Escherichia_coli_K_12_DH10B.fna", 
        "extratoolsbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install", 
        "extrasettings": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install",
        "kseqbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install", 
        "adapterpath": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/adapters", 
        "trimmomaticbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install", 
        "picardbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install", 
        "picardjarfile": "picard-2.8.0.jar", 
        "trimmomaticjarfile": "trimmomatic-0.36.jar", 
        "makecutmatrixbin": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/install"
    }, 
    "input_output": {
        "fastq_directory": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData", 
        "workdir": "/well/jknight/users/qmy094/software/CUT-RUNTools-2.0/exampleData/output", 
        "fastq_sequence_length": "42", 
        "organism_build": "hg38",
        "spike_in": "FALSE",
        "spike_in_norm": "FALSE",
        "spikein_scale": "10000",
        "frag_120": "TRUE",
        "peak_caller": "macs2",
        "dup_peak_calling": "FALSE",
        "cores": "8",
        "experiment_type": "CUT&RUN"
    }, 
    "motif_finding": {
        "num_bp_from_summit": "100", 
        "num_peaks": "1000", 
        "total_peaks": "2000", 
        "motif_scanning_pval": "0.0005", 
        "num_motifs": "10"
    }
}

Can CUT-RUNTools be used with Arabidopsis ?

I am using the Arabidopsis bowtie2 index and the Arabdiopsis genome.fa and have the correct paths to these files in my bulk-config.json file, but when I run it, it quickly stops with just the error message: organism_build should be one of hg38, hg19, mm10 or mm9.
Is it possible to run CUT-RUNTools with Arabidopsis ?

I found a blacklist for Arabidopsis, and I can make a chromosome sizes file. If I tweak the run_bulkModule.sh script to use these files, would this work, or are there any other issues that would come up ?

fimo output

Hi fl-yu,
I have a question with respect to the output in the peak analysis folder. What is the input for generating the heatmap such as Fig. 2e in the original CUT-RUNTools paper? and the src/bulk/haystack_motifs.sh is the script to create it?

Thanks

Problems in “dreme-py3” and “Making cut point matrix”

Hello Fulong,
There are 2 error when I run bulk CUT&RUNTools2.1 pipeline with example data:

38.err.txt
38.out.txt

Could you give some advice to me?
Thanks for your kindness.
Yan

Couldn't run with 'frag_120: FALSE'

Hi,

Thanks for your effort to develop such a comprehensive and useful tool, but I always got a less number of peaks when using C&RTools than other peakcalling software.

I used to use Docker to run C&RTools2.0 and I found the BAM files generated by Bowtie2 are similar to BAM generated by myself manually. However, after the 'dedup', 'dedup.120bp' and 'dedup.120bp.shitf', the size of BAM file decreases a lot. So I try to set 'frag_120: FALSE' but encountered such error.

Any suggestions will be highly appreciated!
Ryan

gff2bed: line 132: convert2bed: command not found

Hi,

I am running this pipeline. MEME is working based on your previous discussion with @qiyubio. However, I have got this error.
/home/shikan/miniconda3/envs/cutruntools2/bin/gff2bed: line 132: convert2bed: command not found
I have checked under my cutruntools2 conda environment, gff2bed and conver2bed are existing. How do I fix this issue.

Thanks,
Shikan

validate.py throws TabError with python3.6.0, replaced tabs with spaces

Hello, while attempting to run the validate.py script in the installs directory with python 3.6.0, I got this tab error:

  File "/opt/CUT-RUNTools-2.0/install/validate.py", line 28
    return False
               ^
TabError: inconsistent use of tabs and spaces in indentation

I was able to fix it with this sed command to replace each tab with 4 spaces:

sed -i 's/\t/    /g' validate.py

Hope this is helpful.
Thanks,
Neil

Issues with 'jq not found' and genome path

Hi,
I cloned the repository and hopefully satisfied all the software installation requirements.
I ran ./install/validate.py config/bulk-config.json --ignore-input-output --software without any errors.
But when I run:
./run_bulkModule.sh config/bulk-config.json GATA1_D7_30min_chr11
I get:

./run_bulkModule.sh: line 20: jq: command not found
./run_bulkModule.sh: line 21: jq: command not found
./run_bulkModule.sh: line 22: jq: command not found
organism_build should be one of hg38, hg19, mm10 or mm9

Should I download and install jq separately?
Attached is my bulk-config.json file.
bulk-config.txt

Some tips appreciated, thanks!

No read in trimmed2 can align to the reference genome

Hi Fulong,
I am a PHD student of Prof. Liu nan, one of the co-authors of CUT&RUNtools 1.0. For testing whether the CUT&RUNtools 2.0 work, I downloaded the pair-end FASTQ file (GSE104550, 25bp per end) from the assays of H3K27me3 CUT&RUN. I disabled the frag_120 parameter ("frag_120": "FALSE",). Then I ran the bulk pipeline and found that there is no read in trimmed2 can be aligned to the reference: hg19. Additionally, I aligned the reads in the trimmed2 to ref genome manually and there is no alignment too. However, I found that if I remove the parameter "--very-sensitive-local", there would be some expected results. So, could you give some advice to me for this problem? I am looking forward your response.
Yan

Permission denied thrown in bulk-pipeline.sh

Line 84 of the bash script bulk-pipeline.sh seems spurious, and causes the bash script to output an error:

/cache/home/username/environments/cutruntools/repos/CUT-RUNTools-2.0/src/bulk/bulk-pipeline.sh: line 84: /scratch/username/projects/my_project/workdir/428/logs/428.spikein.bowtie2: Permission denied

I think this line should be removed?

advice on some features

Thank you for all the work you put on this pipeline.
I hope you can help me define some parameters. I am running the pipeline for bulk Cut and Tag samples that were sequenced pairs end with 150Cycle kit.
What would the fastq_sequence_length be?

What does frag_120 do ? if you do FALSE what happens to the bam file ? and if you do TRUE what happens ?
The initial step of adaptor trimming, trimomatic assumed my fastq files had Truseq adapters even after I specify CUT&Tag as the experiment_type. How can I change trimomatic to remove Nextera adaptors instead ?

meme problem [solved]

Hi, fl

I am creating a new issue here, I was able to bypass the bowtie2 index problem by changing the "hg" to "genome". Unfortunately, still having a little problem with the rest of the code. Please see below which I copied directly from my terminal, as there is no error message, however, I did reach the end telling me that the analysis was successful w/o any files in the fimo folder, my symster is MacOS Mojave version 10.14.6 not sure if this information is helpful. Thanks a lot!

[info] Generating the normalized signal file with BigWig format...
Sun Feb 7 21:47:45 EST 2021
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /Volumes/Backup/CUT-RUNTools-2.0-master/test/GATA1_D7_30min_chr11/peakcalling/macs2.narrow/GATA1_D7_30min_chr11_peaks.narrowPeak
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Log::Log4perl configuration looks suspicious: No loggers defined at /Users/kunhuaqin/miniconda3/envs/meme/lib/site_perl/5.26.2/Log/Log4perl/Config.pm line 325.
Starting getsize: getsize random1000/MEME_GATA1_D7_30min_chr11_shuf/GATA1_D7_30min_chr11_summits_padded.fa 1> $metrics
dyld: Library not loaded: @rpath/libicui18n.58.dylib
Referenced from: /Users/kq2012/miniconda3/envs/meme/bin/getsize
Reason: image not found
getsize process died with signal 6, without coredump
getsize failed me... at /Users/kq2012/miniconda3/envs/meme/bin//meme-chip line 740.
[info] De Novo motifs can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
File "/Volumes/Backup/CUT-RUNTools-2.0-master/install/read.meme.py", line 92, in
ss = read_summary(this_dir + "/summary.tsv")
File "/Volumes/Backup/CUT-RUNTools-2.0-master/install/read.meme.py", line 7, in read_summary
f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_GATA1_D7_30min_chr11_shuf/summary.tsv'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Scaning the De Novo motifs for each peak
ls: random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/GATA1_D7_30min_chr11

Congrats! The bulk data analysis is complete!

Unable to find validate.py

Dear team of CUT RUN tools 2.0

I'm trying to use your software to analyze my CR data. I'm running on MacOS Big Sur.
I created the conda env, managed the dependancies, ddl and build the index. However, I'm not able to find the validate.py script.
I did find it in the previous version of the program, but it doesn't seem to work with crtools2.0
I followed strictly the instructions in install.md after I cloned your repository.
I set all the variables in the json file.

Can you add a little precision on how I can validate the installation before pursuing ?

Thank you very much,
David

here's my env. my shell is zsh.

(cutruntools2.1) XXX@XXX CUT-RUNTools-2.0 % conda list

packages in environment at /Users/XXX/miniconda3/envs/cutruntools2.1:

Name Version Build Channel

_r-mutex 1.0.1 anacondar_1 conda-forge
bedops 2.4.39 h770b8ee_0 bioconda
bedtools 2.30.0 haa7f73a_1 bioconda
bowtie2 2.4.2 py36h6343656_2 bioconda
bwidget 1.9.14 h694c41f_0 conda-forge
bzip2 1.0.8 h0d85af4_4 conda-forge
c-ares 1.17.1 h0d85af4_1 conda-forge
ca-certificates 2020.12.5 h033912b_0 conda-forge
cairo 1.16.0 he43a7df_1008 conda-forge
cctools_osx-64 949.0.1 h6407bdd_21 conda-forge
certifi 2020.12.5 py36h79c6626_1 conda-forge
clang 11.1.0 h694c41f_0 conda-forge
clang-11 11.1.0 default_he082bbe_0 conda-forge
clang_osx-64 11.1.0 hb91bd55_1 conda-forge
clangxx 11.1.0 default_he082bbe_0 conda-forge
clangxx_osx-64 11.1.0 h7e1b574_1 conda-forge
compiler-rt 11.1.0 h654b07c_0 conda-forge
compiler-rt_osx-64 11.1.0 h8c5fa43_0 conda-forge
curl 7.76.1 h06286d4_1 conda-forge
cycler 0.10.0 py_2 conda-forge
deeptools 3.5.1 py_0 bioconda
deeptoolsintervals 0.1.9 py36ha714b87_3 bioconda
expat 2.2.10 h1c7c35f_0 conda-forge
fontconfig 2.13.1 h10f422b_1005 conda-forge
freetype 2.10.4 h4cff582_1 conda-forge
fribidi 1.0.10 hbcb3906_0 conda-forge
gettext 0.19.8.1 h7937167_1005 conda-forge
gfortran_impl_osx-64 9.3.0 h9cc0e5e_22 conda-forge
gfortran_osx-64 9.3.0 h18f7dce_14 conda-forge
ghostscript 9.18 1 bioconda/label/cf201901
gmp 6.2.1 h2e338ed_0 conda-forge
graphite2 1.3.13 h2e338ed_1001 conda-forge
gsl 2.6 h71c5fe9_2 conda-forge
harfbuzz 2.8.0 h159f659_1 conda-forge
htslib 1.12 hc38c3fb_1 bioconda
icu 68.1 h74dc148_0 conda-forge
isl 0.22.1 hb1e8313_2 conda-forge
jpeg 9d hbcb3906_0 conda-forge
kiwisolver 1.3.1 py36h615c93b_1 conda-forge
krb5 1.17.2 h60d9502_0 conda-forge
lcms2 2.12 h577c468_0 conda-forge
ld64_osx-64 530 he8994da_21 conda-forge
ldid 2.1.2 h7660a38_2 conda-forge
libblas 3.9.0 8_openblas conda-forge
libcblas 3.9.0 8_openblas conda-forge
libclang-cpp11.1 11.1.0 default_he082bbe_0 conda-forge
libcurl 7.76.1 h8ef9fac_1 conda-forge
libcxx 11.1.0 habf9029_0 conda-forge
libdeflate 1.7 h35c211d_5 conda-forge
libedit 3.1.20191231 h0678c8f_2 conda-forge
libev 4.33 haf1e3a3_1 conda-forge
libffi 3.3 h046ec9c_2 conda-forge
libgfortran 5.0.0 9_3_0_h6c81a4c_22 conda-forge
libgfortran-devel_osx-64 9.3.0 h6c81a4c_22 conda-forge
libgfortran5 9.3.0 h6c81a4c_22 conda-forge
libglib 2.68.1 hd556434_0 conda-forge
libiconv 1.16 haf1e3a3_0 conda-forge
liblapack 3.9.0 8_openblas conda-forge
libllvm11 11.1.0 hd011deb_2 conda-forge
libnghttp2 1.43.0 h07e645a_0 conda-forge
libopenblas 0.3.12 openmp_h54245bb_1 conda-forge
libpng 1.6.37 h7cec526_2 conda-forge
libssh2 1.9.0 h52ee1ee_6 conda-forge
libtiff 4.2.0 h355d032_0 conda-forge
libwebp-base 1.2.0 h0d85af4_2 conda-forge
libxml2 2.9.10 h93ec3fd_4 conda-forge
libxslt 1.1.33 h5739fc3_2 conda-forge
llvm-openmp 11.1.0 hda6cdc1_1 conda-forge
llvm-tools 11.1.0 hd011deb_2 conda-forge
lz4-c 1.9.3 h046ec9c_0 conda-forge
macs2 2.2.7.1 py36ha714b87_2 bioconda
make 4.3 h22f3db7_1 conda-forge
matplotlib-base 3.3.4 py36h4ea959b_0 conda-forge
meme 4.12.0 py36pl526hd869df4_2 bioconda/label/cf201901
mpc 1.1.0 ha57cd0f_1009 conda-forge
mpfr 4.0.2 h72d8aaf_1 conda-forge
ncurses 6.2 h2e338ed_4 conda-forge
numpy 1.19.5 py36h08dc641_1 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openjpeg 2.4.0 h6cbf5cd_0 conda-forge
openssl 1.1.1k h0d85af4_0 conda-forge
pango 1.48.4 ha05cd14_0 conda-forge
parallel 20210222 h694c41f_0 conda-forge
pcre 8.44 hb1e8313_0 conda-forge
pcre2 10.36 h5cf9962_1 conda-forge
perl 5.26.2 hbcb3906_1008 conda-forge
perl-app-cpanminus 1.7044 pl526_1 bioconda/label/cf201901
perl-carp 1.38 pl526_1 bioconda/label/cf201901
perl-cgi 4.40 pl526h470a237_0 bioconda/label/cf201901
perl-constant 1.33 pl526_1 bioconda/label/cf201901
perl-exporter 5.72 pl526_1 bioconda/label/cf201901
perl-extutils-makemaker 7.34 pl526_3 bioconda/label/cf201901
perl-file-path 2.15 pl526_0 bioconda/label/cf201901
perl-file-temp 0.2304 pl526_2 bioconda/label/cf201901
perl-html-parser 3.72 pl526h2d50403_4 bioconda/label/cf201901
perl-html-tagset 3.20 pl526_3 bioconda/label/cf201901
perl-html-template 2.97 pl526_1 bioconda/label/cf201901
perl-html-tree 5.07 pl526_0 bioconda/label/cf201901
perl-parent 0.236 pl526_1 bioconda/label/cf201901
perl-scalar-list-utils 1.45 pl526h470a237_3 bioconda/label/cf201901
perl-xml-namespacesupport 1.12 pl526_0 bioconda/label/cf201901
perl-xml-parser 2.44 pl526h3a4f0e9_6 bioconda/label/cf201901
perl-xml-sax 1.00 pl526_0 bioconda/label/cf201901
perl-xml-sax-base 1.09 pl526_0 bioconda/label/cf201901
perl-xml-sax-expat 0.51 pl526_2 bioconda/label/cf201901
perl-xml-simple 2.25 pl526_0 bioconda/label/cf201901
perl-xsloader 0.24 pl526_0 bioconda/label/cf201901
perl-yaml 1.27 pl526_0 bioconda/label/cf201901
pillow 8.1.2 py36h154fef6_1 conda-forge
pip 21.0.1 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 hbcb3906_0 conda-forge
plotly 4.14.3 pyh44b312d_0 conda-forge
py2bit 0.3.0 py36ha714b87_5 bioconda
pybigwig 0.3.18 py36hc3e6b37_1 bioconda
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pysam 0.16.0.1 py36h71aea8d_3 bioconda
python 3.6.13 h7728216_0_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.6 1_cp36m conda-forge
r-base 4.0.3 hb6e1b8c_8 conda-forge
readline 8.1 h05e3726_0 conda-forge
retrying 1.3.3 py_2 conda-forge
samtools 1.12 hfcfc997_1 bioconda
scipy 1.5.3 py36h04de62b_0 conda-forge
setuptools 49.6.0 py36h79c6626_3 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.35.4 h44b9ce1_0 conda-forge
tabix 0.2.6 ha92aebf_0 bioconda
tapi 1100.0.11 h9ce4665_0 conda-forge
tbb 2020.2 h940c156_4 conda-forge
tk 8.6.10 h0419947_1 conda-forge
tktable 2.10 h49f0cf7_3 conda-forge
tornado 6.1 py36h20b66c6_1 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
xz 5.2.5 haf1e3a3_1 conda-forge
yaml 0.2.5 haf1e3a3_0 conda-forge
zlib 1.2.11 h7795811_1010 conda-forge
zstd 1.4.9 h582d3a0_0 conda-forge

Install patched version of Atactk failed

Hi, I tried to install patched version of Atactk by following installation Instructions but failed. The follwing is the error information when running the atactk.install.sh script :

patch -p0 -N --dry-run --silent make_cut_matrix < make_cut_matrix.patch 2> /dev/null

2 out of 2 hunks FAILED

patch -p0 -N --dry-run --silent metrics.py < metrics.py.patch 2> /dev/null

3 out of 3 hunks FAILED

Could anyone please tell me how to fix this problem ? Thanks!

adaptor fasta

Dear all,

I couldn't find the adaptors of my samples in the adaptor directory. I used NEBNext® Multiplex Oligos for Illumina. Does it require a new fasta?

Thanks for your help in advance!

Best,
Yuling

Can't find index GATA1_D7_30min_chr11.bam

read.meme.py unable to find summery.tsv

`
==================================== Bulk data analysis pipeline will run ==============================================================

Input FASTQ folder: /public/home/mosta/cut_run/HEK293_Nov23_2020

Sample name: CLP1_293T_S2

Workdir folder: /public/home/mosta/cut_run/HEK293_Nov23_2020/results/

Experiment name:

Experiment type: CUT&RUN

Reference genome: hg19

Spike-in genome: FALSE

Spike-in normalization: FALSE

Fragment 120 filtration: FALSE

=================================================================================================================================
[info] Input file is CLP1_293T_S2_R1_001.fastq.gz and CLP1_293T_S2_R2_001.fastq.gz
Wed Sep 29 23:52:40 CST 2021
[info] Trimming file CLP1_293T_S2 ...
Wed Sep 29 23:52:51 CST 2021
[info] Use Truseq adaptor as default
[info] Second stage trimming CLP1_293T_S2 ...
Thu Sep 30 00:38:43 CST 2021
[info] Aligning file CLP1_293T_S2 to reference genome...
Thu Sep 30 01:09:09 CST 2021
[info] Bowtie2 command: --very-sensitive-local --phred33 -I 10 -X 700
[info] The dovetail mode is off [as parameter frag_120 is off]
[info] FASTQ files won't be aligned to the spike-in genome
[info] Filtering unmapped fragments... CLP1_293T_S2.bam
Thu Sep 30 01:25:56 CST 2021
[info] Sorting BAM... CLP1_293T_S2.bam
Thu Sep 30 01:38:37 CST 2021
INFO 2021-09-30 01:39:09 SortSam

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** SortSam -INPUT sorted/CLP1_293T_S2.step1.bam -OUTPUT sorted/CLP1_293T_S2.bam -SORT_ORDER coordinate -VALIDATION_STRINGENCY SILENT

01:39:37.011 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/home/mosta/CUT-RUNTools-2.0/install/picard-2.8.0.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Sep 30 01:39:37 CST 2021] SortSam INPUT=sorted/CLP1_293T_S2.step1.bam OUTPUT=sorted/CLP1_293T_S2.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=SILENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Sep 30 01:39:37 CST 2021] Executing as mosta@s006 on Linux 3.10.0-862.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b15; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.7-SNAPSHOT
INFO 2021-09-30 01:39:37 SortSam Seen many non-increasing record positions. Printing Read-names as well.
INFO 2021-09-30 01:41:35 SortSam Read 10,000,000 records. Elapsed time: 00:01:57s. Time for last 10,000,000: 117s. Last read position: chr6:6,079,001. Last read name: M01057:324:000000000-JD6K3:1:2103:22747:14904
INFO 2021-09-30 01:43:11 SortSam Read 20,000,000 records. Elapsed time: 00:03:34s. Time for last 10,000,000: 96s. Last read position: chr12:27,239,374. Last read name: M01057:324:000000000-JD6K3:1:1106:11883:14869
INFO 2021-09-30 01:43:59 SortSam Finished reading inputs, merging and writing to output now.
INFO 2021-09-30 01:48:17 SortSam Wrote 10,000,000 records from a sorting collection. Elapsed time: 00:08:40s. Time for last 10,000,000: 256s. Last read position: chr1:194,534,006
INFO 2021-09-30 01:51:58 SortSam Wrote 20,000,000 records from a sorting collection. Elapsed time: 00:12:21s. Time for last 10,000,000: 220s. Last read position: chr7:11,711,635
[Thu Sep 30 01:53:24 CST 2021] picard.sam.SortSam done. Elapsed time: 13.80 minutes.
Runtime.totalMemory()=8648654848
[info] Marking duplicates... CLP1_293T_S2.bam
Thu Sep 30 01:53:28 CST 2021
[info] Removing duplicates... CLP1_293T_S2.bam
Thu Sep 30 02:31:46 CST 2021
[info] Using all the qualified fragments NOT filtering <120bp... CLP1_293T_S2.bam
Thu Sep 30 02:39:16 CST 2021
[info] Creating bam index files... CLP1_293T_S2.bam
Thu Sep 30 02:39:16 CST 2021
[info] Reads shifting
Thu Sep 30 02:46:52 CST 2021
[info] Your data won't be shifted as the experiment_type is specified as CUT&RUN...
[info] Peak calling using MACS2... CLP1_293T_S2.bam
[info] Logs are stored in /public/home/mosta/cut_run/HEK293_Nov23_2020/results//logs
Thu Sep 30 02:46:53 CST 2021
[info] Peak calling with BAM file with NO duplications
[info] macs2 narrow peak calling
[info] macs2 broad peak calling
[info] Getting broad peak summits
[info] SEACR stringent peak calling
Calling enriched regions without control file
Proceeding without normalization of control to experimental bedgraph
Using stringent threshold
Creating experimental AUC file: Thu Sep 30 03:51:25 CST 2021
Calculating optimal AUC threshold: Thu Sep 30 03:51:27 CST 2021
Using user-provided threshold: Thu Sep 30 03:51:27 CST 2021
Creating thresholded feature file: Thu Sep 30 03:53:25 CST 2021
Empirical false discovery rate = 0.01
Merging nearby features and eliminating control-enriched features: Thu Sep 30 03:53:25 CST 2021
Removing temporary files: Thu Sep 30 03:53:25 CST 2021
Done: Thu Sep 30 03:53:25 CST 2021
[info] Generating the normalized signal file with BigWig format...
Thu Sep 30 03:53:26 CST 2021
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /public/home/mosta/cut_run/HEK293_Nov23_2020/results//peakcalling/macs2.narrow/CLP1_293T_S2_peaks.narrowPeak
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Unknown option: dreme-m
The sequences specified do not exist.

meme-chip [options] [-db ]*

Options:
-o

: output to the specified directory, failing if the directory exists
-oc : output to the specified directory, overwriting if the directory exists
-db : target database for use by Tomtom and CentriMo; if not present,
Tomtom and CentriMo are not run
-neg : negative (control) sequence file name;
the control sequences will be input to MEME, CentriMo and STREME,
and MEME will use the Differential Enrichment objective function;
sequences are assumed to originate from the same alphabet as the
primary sequence file and should be the same length as those;
default: no negative sequences are used for MEME
or CentriMo, and for STREME, the primary sequences
are shuffled to create the negative set
-psp-gen use the psp-gen program to create a position-specific
prior for use by MEME with its Classic objective function;
requires -neg; default: input control sequences directly
to MEME and use its Differential Enrichment objective function
-dna set the alphabet to DNA; this is the default
-rna set the alphabet to RNA
-[x]alph : alphabet file; when the x is specified the motif
databases are converted to the specified alphabet;
default: DNA
-dna2rna : input DNA sequences will be converted to RNA
-bfile : background file
-order : set the order of the Markov background model
that is generated from the sequences when a
background file is not given; default: 2
-seed : seed for the randomized selection of sequences
for MEME and the shuffling of sequences for STREME;
default: seed randomly
-minw : minimum motif width; default: 6
-maxw : maximum motif width; default: 15
-ccut : maximum size of a sequence before it is cut down
to a centered section; a value of 0 indicates the
sequences should not be cut down; default: 100
-group-thresh : primary threshold for clustering motifs; default: 0.05
-group-weak : secondary threshold for clustering motifs; default: 2*gthr
-filter-thresh : E-value threshold for including motifs; default: 0.05
-time : maximum time that this program has to run and
create output in; default: no limit
-desc : description of the job
-fdesc : file containing plain text description of the job
-old-clustering : pick cluster seed motifs based only on significance;
default: preferentially select discovered motifs as
clustering seeds even if there is a library motif
that appears more enriched
-noecho : don't echo the commands run
-help : display this help message
-version : print the version and exit

MEME Specific Options:
-meme-brief : reduce size of MEME output files if more than
: primary sequences
-meme-mod [oops|zoops|anr]: sites used in a single sequence
-meme-nmotifs : maximum number of motifs to find; default: 3
: if =0, MEME will not be run
-meme-minsites : minimum number of sites per motif
-meme-maxsites : maximum number of sites per motif
-meme-p : use parallel version with processors
-meme-pal : look for palindromes only
-meme-searchsize : the maximum portion of the primary sequences (in characters)
: used for motif search; MEME's running time increases as
: roughly the square of
-meme-nrand : MEME should not randomize sequence order

STREME Specific Options:
-streme-pvt : stop if hold-out set p-value greater than
-streme-nmotifs : maximum number of motifs to find; overrides -streme-pvt;
: if =0, STREME will not be run

CentriMo Specific Options:
-centrimo-local : compute enrichment of all regions (not only central)
-centrimo-score : set the minimum allowed match score
-centrimo-maxreg : set the maximum region size to be considered
-centrimo-ethresh : set the E-value threshold for reporting
-centrimo-noseq : don't store sequence IDs in the output
-centrimo-flip : reflect matches on reverse strand around center

SpaMo Specific Options:
-spamo-skip : don't run SpaMo

FIMO Specific Options:
-fimo-skip : don't run FIMO

[info] De Novo motifs can be found: random1000/MEME_CLP1_293T_S2_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
File "/public/home/mosta/CUT-RUNTools-2.0/install/read.meme.py", line 92, in
ss = read_summary(this_dir + "/summary.tsv")
File "/public/home/mosta/CUT-RUNTools-2.0/install/read.meme.py", line 7, in read_summary
f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_CLP1_293T_S2_shuf/summary.tsv'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_CLP1_293T_S2_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Scaning the De Novo motifs for each peak
ls: cannot access random1000/MEME_CLP1_293T_S2_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/CLP1_293T_S2

Congrats! The bulk data analysis is complete!

how to interpret the parameter "fastq_sequence_length"

Dear author,

Thank you for making this tool. Not really an issue, but want to ask a question here:

How to interpret the parameter "fastq_sequence_length" in the config file? It seems it is only used by the "kseq_test" step. The default value is 42, what does it mean?

Thanks!

MEME analysis issue

Hi,
I'm having issues with the MEME analysis for de novo motif finding.
Could you help me with what the issue is?

Thanks for the amazing tool!
slurm-97220.out.txt

Originally posted by @hchintalapudi in #14 (comment)

Bug in r-pkgs-install.r

any(! pkg %in% rownames(installed.packages())))

should be:

all(pkg %in% rownames(installed.packages()))

bowtie2 alignment genome and index config in bulk-pipeline.sh

On lines 52 and 58 of bulk-pipeline.sh, the index file for bowtie2 alignment is specified as -x $bt2idx/genome.

This causes alignment to fail, because there is no index file with the prefix genome. I'm using the mm10 reference genome, and the prefix should be -x $bt2idx/mm10. Should genome be a variable that comes from the config file instead?

spike_in: "TRUE"

Hi, I've been trying to normalize with spike in, setting "spike_in": "TRUE". This doesn't seem to work however, and there is no spike_in output directory created.

Should I be setting "spike_in" as TRUE? or something else?

Spikein_scale

Hi,

I have a question about spikein_scale. I have a cut_and_tag data that do not have spike-in genome. Also, for spike_in_align and spike_in_norm, I chose false. I see that the spikein_scale equation is spikein_scale/spikein_reads. How is the data normalized, if I don't have any spike_in reads?

Thanks,
Shikan

Windows installation issues; atactk patches installation; bowtie2 index [solved]

Hi, I'm a biologist with no data science background. I was trying to analyze my own CUT&RUN data, but have difficulty installing the CUT&RUNTools 2.0 pipeline. I followed the installation guide and installed Python, R, Java, Perl, GCC, and Anaconda, and tried to run the installation codes in Powershell Prompt from Anaconda (Windows 10). But whenever I run the codes which start with "source" (e.g. source atactk.install.sh), I get an error message like this:

(cutruntools2.1) PS C:\Users\jingnie> source atactk.install.sh
source : The term 'source' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try
again.
At line:1 char:1

source atactk.install.sh

  + CategoryInfo          : ObjectNotFound: (source:String) [], CommandNotFoundException
  + FullyQualifiedErrorId : CommandNotFoundException

`jq` not listed as dependency

My university cluster system did not have jq command installed. It was easy to install myself, but you should note this as a dependency in the install instructions.

dreme problem

Hello Fulong，I am runing the pipeline using the example data.
I got the strand error from the computing cluster when I run the command "run_bulkModule.sh ../CUT-RUNTools-2.0/config/bulk-config.json GATA1_D7_30min_chr11":

......
Starting dreme: dreme -verbosity 1 -oc random1000/MEME_GATA1_D7_30min_chr11_shuf/dreme_out -png -dna -p random1000/MEME_GATA1_D7_30mi
n_chr11_shuf/seqs-centered -n random1000/MEME_GATA1_D7_30min_chr11_shuf/seqs-shuffled -m 10
File "/public/home/liunangroup/liangyan/miniconda3/envs/meme/bin/dreme", line 765
print "Finding secondary RE in left flank...
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Finding secondary RE in left flank...")?
dreme exited with error code 1
Starting centrimo: centrimo -seqlen 201 -verbosity 1 -oc random1000/MEME_GATA1_D7_30min_chr11_shuf/centrimo_out -bfile random1000/MEME_GATA1_D7_30min_chr11_shuf/background random1000/MEME_GATA1_D7_30min_chr11_shuf/GATA1_D7_30min_chr11_summits_padded.fa random1000/MEME_GATA1_D7_30min_chr11_shuf/meme_out/meme.xml
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
Bad file name.
FATAL: Template does not contain data section.

centrimo exited with error code 1
Starting tomtom: tomtom -verbosity 1 -text -thresh 0.1 random1000/MEME_GATA1_D7_30min_chr11_shuf/combined.meme random1000/MEME_GATA1_D7_30min_chr11_shuf/combined.meme 1> random1000/MEME_GATA1_D7_30min_chr11_shuf/motif_alignment.txt
tomtom ran successfully in 0.103372 seconds
......

Additionally, I get this strand output:

......
ri Dec 24 13:43:02 CST 2021
[info] Your bigwig file won't be normalized with spike-in reads
[info] Input file is /public/home/liunangroup/liangyan/pipeline/CUT-RUNTest/GATA1_D7_30min_chr11/peakcalling/macs2.narrow/GATA1_D7_30min_chr11_peaks.narrowPeak
[info] Get randomized [1000] peaks from the top [2000] peaks...
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Start MEME analysis for de novo motif finding ...
[info] Up to 10 will be output ...
Log::Log4perl configuration looks suspicious: No loggers defined at /public/home/liunangroup/liangyan/miniconda3/envs/meme/lib/site_perl/5.26.2/Log/Log4perl/Config.pm line 325.
[info] De Novo motifs can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf ...
[info] Loading the De Novo motifs ...
Traceback (most recent call last):
File "/public/home/liunangroup/liangyan/pipeline/CUT-RUNTools-2.0/install/read.meme.py", line 94, in
dreme_matrices = read_dreme(this_dir + "/dreme_out/dreme.txt")
File "/public/home/liunangroup/liangyan/pipeline/CUT-RUNTools-2.0/install/read.meme.py", line 47, in read_dreme
f = open(n)
FileNotFoundError: [Errno 2] No such file or directory: 'random1000/MEME_GATA1_D7_30min_chr11_shuf/dreme_out/dreme.txt'
[info] The signficance cutoff of Fimo scaning is 0.0005...
[info] Motif files can be found: random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs
[info] Filtering the blacklist regions for the selected peak files
[info] Getting Fasta sequences
[info] Scaning the De Novo motifs for each peak
ls: cannot access random1000/MEME_GATA1_D7_30min_chr11_shuf/motifs: No such file or directory
[info] Output can be found: fimo.result/GATA1_D7_30min_chr11
......

I think the problem is in the text I marked, but how should I deal with it?
Thanks Yan.

SLURM opal error

Hello @fl-yu,

Do you have any suggestions on how to run this pipeline on an HPC using SLURM?

I keep on getting OMPI errors. For example: OPAL ERROR: Not initialized in file pmix2x_client.c at line 112

Any help is greatly appreciated.

Thank you,
Jesus

Questions about motif discovery section under bulk-pipeline.sh

Hi,

First of all, thank you for sharing the pipeline! I'm new to this analysis and this pipeline helps me a lot.

I've been running this pipeline section by section. While running the motif discovery section in bulk-pipeline.sh, I found that 'num_peaks' and 'total_peaks' don't seem to be assigned in the script. I was wondering if they're missing or are they somewhere in the pipeline that I missed?

Thank you!

About filteration of fragmeng length and reads shifting

Hi, Fulong
As expect, after reads shifting mapping result of CUT&Tag, the fragment length will reduced 9bp, thus there are fragments less than 120bp (frag_120=TRUE). I readed the code of bulk-pipeline.sh and was interested in why CUT&RUNTools2 filter fragmeng length before shifting instead of after reads shifting. Actually, the fragment after shifting is the real fragment, right?
Yan

Some Python scripts are not executable

The "validate.py" script is not executable so this step from the documentation fails:

./validate.py bulk-config.json --ignore-input-output --software

There are a few other scripts that are also set to "0644" within the install directory:

./install/git/atactk/atactk/data.py
./install/git/atactk/atactk/metrics.py
./install/git/atactk/atactk/util.py
./install/git/atactk/atactk/command.py
./install/git/atactk/build/lib/atactk/util.py
./install/git/atactk/build/lib/atactk/command.py
./install/git/atactk/build/lib/atactk/data.py
./install/git/atactk/build/lib/atactk/metrics.py
./install/validate.py
./install/fix_sequence.py
./install/read.meme.py

In contrast, these scripts are properly set to "0755":

./install/git/atactk/docs/conf.py
./install/git/atactk/setup.py
./install/git/atactk/tests/test_atactk.py
./install/get_summits_broadPeak.py
./install/get_summits_seacr.py
./install/check_coordinate.py
./install/change.bdg.py

macs2

required sample.bed

Thank you for sharing this great tool @fl-yu!

In the assemblies.install.sh header it states that we need a sample.bed in the current directory. Can you please elaborate a little more? Is this sample.bed file provided by the pipeline or do we provide a blank file named sample.bed?

I will be using hg38.

Here is the line of code from assemblies.install.sh that needs this file.
$bedtoolsbin/bedtools getfasta -fi ${organism_build}.fa -bed ../../sample.bed

Thank you,
Jesus

Can`t run the software at the beginning

Hey, I am using the CUT-RUNTools2.0 to do some tests and I encountered some problems at the beginning. I think I correctly all the required softwares and correctly set them in the bulk-config.json file. And I got an empty line following your guide which means the validate.py script runs without errors. However, when I try to do the bulk analysis, there is an error. Here is my code and error:
codes:
./run_bulkModule.sh bulk-config.json SRR891269
error:
./run_bulkModule.sh: line 89: /home/ryan/controlsoftware/CUT-RUNTools-2.0-master/install2/src/bulk/bulk-pipeline.sh: No such file or directory
I checked the installation directory where there was no such file or directory.

Do you have any idea about these issue? I would be really appreciate that if you could give me some advice. Thanks very much!
Ryan,

Questions about process debugging

Dear professor,

I tried to debug the run_bulkModule using ExampleData.
my config.json is:

    "bt2idx": "/histor/sun/wangtao/0_user/6_maofb/cutrun/0.workplace/1.ref/Homo_sapiens/index", 
    "genome_sequence": "/histor/sun/wangtao/0_user/6_maofb/cutrun/0.workplace/1.ref/Homo_sapiens/Homo_sapiens.GRCh38.dna.toplevel.fa.gz", 
    "spike_in_bt2idx": "/histor/sun/wangtao/0_user/6_maofb/cutrun/CUT-RUNTools-2.0/ensemble/index", 
    "spike_in_sequence": "/histor/sun/wangtao/0_user/6_maofb/cutrun/CUT-RUNTools-2.0/ensemble/Escherichia_coli_str_k_12_substr_mg1655_gca_000005845.ASM584v2.dna.chromosome.Chromosome.fa",

and my result is :
[info] Input file is GATA1_D7_30min_chr11_R1_001.fastq.gz and GATA1_D7_30min_chr11_R2_001.fastq.gz
星期三五月 4 22:28:40 CST 2022
[info] Trimming file GATA1_D7_30min_chr11 ...
星期三五月 4 22:28:40 CST 2022
[info] Use Truseq adaptor as default
[info] Second stage trimming GATA1_D7_30min_chr11 ...
星期三五月 4 22:28:40 CST 2022
[info] Aligning file GATA1_D7_30min_chr11 to reference genome...
星期三五月 4 22:28:40 CST 2022
[info] Bowtie2 command: --dovetail --phred33
[info] The dovetail mode is enabled [as parameter frag_120 is on]
[main_samview] fail to read the header from "-".
[info] FASTQ files won't be aligned to the spike-in genome
[info] Filtering unmapped fragments... GATA1_D7_30min_chr11.bam
星期三五月 4 22:28:41 CST 2022
[main_samview] fail to read the header from "/histor/sun/wangtao/0_user/6_maofb/cutrun/CUT-RUNTools-2.0/exampleData/bulk-example-test/GATA1_D7_30min_chr11/aligned/GATA1_D7_30min_chr11.bam".
[info] Sorting BAM... GATA1_D7_30min_chr11.bam
星期三五月 4 22:28:41 CST 2022

I really do not know why about this ?
[main_samview] fail to read the header from "-".

Alignment Error

Hey,

I am trying to run CUTRUNTools2.0 to test cut&run data and I have downloaded all the required software and make the bulk-config.json file right. Actually I could run the test but I always get the error at the alignment part, it will shows The bulk data analysis is complete but the output file is empty. And I put the error below.

I have tried several times and tried to modify the config file and adjust the environment, but all ddidn`t work.

Look forward to your advice,
Ryan

#the errors are listed below

**[info] FASTQ files won't be aligned to the spike-in genome

IOError: [Errno 2] No such file or directory: '/home/ryan/controlsoftware/cut_run_tag/fastq_directory_for_tool/output_cutruntools2.0/test1/peakcalling/seacr/test1_treat_pileup.bdg'
Calling enriched regions without control file
Proceeding without normalization of control to experimental bedgraph

IOError: [Errno 2] No such file or directory: '/home/ryan/controlsoftware/cut_run_tag/fastq_directory_for_tool/output_cutruntools2.0/test1/peakcalling/seacr/test1_treat.stringent.bed'

SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Finding secondary RE in left flank...")?
dreme exited with error code 1

IOError: [Errno 2] No such file or directory: 'random1000/MEME_test1_shuf/dreme_out/dreme.txt'**

error with line breaks in bulk-pipeline.sh

original code:

    if [ "$spikein_reads" == "0" ] || [ "$spike_in_norm" == "FALSE" ]
    then
        >&2 echo "[info] Your bigwig file won't be normalized with spike-in reads as you did not specify this parameter or the spike-in reads were 0.."
    else
        >&2 echo "[info] Your bigwig file will be normalized with spike-in reads"
        scale=$spikein_scale
        scale_factor=`printf "%.0f" $(echo "$scale / $spikein_reads"|bc)`
        >&2 echo scale_factor=$scale_factor
        bamCoverage --bam $bam_file -o $outdir/"$base_file".spikein_normalized.bw \
        --binSize 10
        --normalizeUsing cpm
        --effectiveGenomeSize $eGenomeSize
        --scaleFactor $scale_factor
        cp $outdir/"$base_file".spikein_normalized.bw $outdirbroad
        cp $outdir/"$base_file".spikein_normalized.bw $outdirseac
    fi

issue: line break was missing for the following lines:

bamCoverage --bam $bam_file -o $outdir/"$base_file".spikein_normalized.bw \
        --binSize 10
        --normalizeUsing cpm
        --effectiveGenomeSize $eGenomeSize
        --scaleFactor $scale_factor

effect: only binSize parameter is considered.

fl-yu / cut-runtools-2.0 Goto Github PK

cut-runtools-2.0's Introduction

Hi there 👋

cut-runtools-2.0's People

Contributors

Stargazers

Watchers

Forkers

cut-runtools-2.0's Issues

Congrats! The bulk data analysis is complete!

packages in environment at /Users/XXX/miniconda3/envs/cutruntools2.1:

Name Version Build Channel

Input FASTQ folder: /public/home/mosta/cut_run/HEK293_Nov23_2020

Sample name: CLP1_293T_S2

Workdir folder: /public/home/mosta/cut_run/HEK293_Nov23_2020/results/

Experiment name:

Experiment type: CUT&RUN

Reference genome: hg19

Spike-in genome: FALSE

Spike-in normalization: FALSE

Fragment 120 filtration: FALSE

Congrats! The bulk data analysis is complete!

Recommend Projects

Recommend Topics

Recommend Org