hillerlab / make_lastz_chains Goto Github PK
View Code? Open in Web Editor NEWPortable solution to generate genome alignment chains using lastz
License: MIT License
Portable solution to generate genome alignment chains using lastz
License: MIT License
I was running alignments for a pair of Drosophila genomes and human vs. mouse. Both runs failed at the doChainRun
step.
Commands used:
# for Drosophila pairs
make_chains.py Dme Dsi GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna GCF_016746395.2_Prin_Dsim_3.1_genomic.fna --executor_queuesize 16 --project_dir 220722_LastZ_chain_DmeDsi_1stAttempt > 220722_lsz_DmeDsi_1stAttempt.log 2>&1
# for human vs. mouse
make_chains.py h38 m39 GCF_000001405.40_GRCh38.p14_genomic.fna GCF_000001635.27_GRCm39_genomic.fna --executor_queuesize 32 --project_dir 220722_LastZ_chain_h38m39_1stAttempt > 220722_lsz_h38m39_1stAttempt.log 2>&1 #2
Attached are log files for both runs.
220722_lsz_h38m39_1stAttempt.log
220722_lsz_DmeDsi_1stAttempt.log
The complaints were that a certain chrom couldn't be found in .chrom.sizes
files:
# for Drosophila pairs
ERROR: file /export/home/TMP/771717715.1.unified/tmp.oHfdzTtayi/NW_001845981.psl was not bundled as the chrom could not be found in ./220722_LastZ_chain_DmeDsi_1stAttempt/Dme.chrom.sizes
# for human vs. mouse
ERROR: file /export/home/TMP/771851000.1.unified/tmp.apDCgOGofk/NC_000015.psl was not bundled as the chrom could not be found in ./220722_LastZ_chain_h38m39_1stAttempt/h38.chrom.sizes
But I can see both of them (with .*
suffixes) in *.chrom.sizes
.
$ grep "NW_001845981" 220722_LastZ_chain_DmeDsi_1stAttempt/Dme.chrom.sizes
NW_001845981.1 1643
$ grep "NC_000015" 220722_LastZ_chain_h38m39_1stAttempt/h38.chrom.sizes
NC_000015.10 101991189
Will having .*
suffixes in sequence names be a problem? Please help.
Thanks!
Hi again,
I saw the update to deal with the sequence IDs with "." (per issue #3):
Warning If your scaffold names are numbered, such as NC_00000.1, please consider removing scaffold numbers (rename NC_00000.1 to NC_00000 or NC_00000__1, for example). ... The pipeline will try to trim scaffold numbers automatically to process the data correctly. Afterward, it will rename the scaffolds back.
Since it appears that you implemented a mechanism to deal with the ".*" in sequence IDs, I downloaded the new version of the pipeline and ran it with the input genomes without modifying sequence IDs.
The pipeline started with creating a renamed .fasta
file and a .tsv
that contains the sequence ID conversion table. And at the end, the log says the pipeline is renaming chromosome names in the chain file.
However, the resulting chain file does not look right. Most of all, the chain score field is gone, making it impossible to salvage the results:
$ zgrep "chain" h38w5.m39w5.allfilled.chain.gz | head -n5 | column -t
chain NC_000014.9 NC_000014 107043718 + 24687985 NC_000078.7 NC_000078 120092757 + 44683557 114519917 1
chain NC_000004.12 NC_000004 190214555 + 1103677 NC_000071.7 NC_000071 151758149 + 33338229 105126044 2
chain NC_000002.12 NC_000002 242193529 + 96496148 NC_000067.7 NC_000067 195154279 + 36305135 93990572 3
chain NC_000001.11 NC_000001 248956422 + 930892 NC_000070.7 NC_000070 156860686 - 516431 53690008 4
chain NC_000001.11 NC_000001 248956422 + 68121445 NC_000069.7 NC_000069 159745316 - 100601 72842298 5
For comparison, below is the chain file using a previous version of make_lastz_chains
. Here, I supplied the input genomes with sequence IDs pre-modified:
$ zgrep "chain" h38w4.m39w4.all.chain.gz | head -n5 | column -t
chain 720570302 NC_000014__9 107043718 + 24687985 106824309 NC_000078__7 120092757 + 44683557 114419863 1
chain 598452391 NC_000004__12 190214555 + 1104636 88140396 NC_000071__7 151758149 + 33339160 105126044 2
chain 584782083 NC_000001__11 248956422 + 930892 58547094 NC_000070__7 156860686 - 516431 53690008 3
chain 578674369 NC_000002__12 242193529 + 96496148 241869966 NC_000067__7 195154279 + 36305135 93989195 4
chain 564776354 NC_000001__11 248956422 + 68121445 158184951 NC_000069__7 159745316 - 100601 72842298 5
Is there any other part of the pipeline updated since July 6th, 2022? If not, we will revert back to the older version (make_lastz_chains) and use genome files with the modified sequence IDs. The current version has an issue.
Thanks,
Dong-Ha
Hi,
I am attaching an error am getting while running make_lastz_chains on Slurm cluster. Can you please help me to fix the issue.
Thanks
Philge
make_lastz_error.txt
On RHEL8, nextflow 23.10.0, anaconda/3-2023.09 on a shared HPC cluster, running the below example from Running the pipeline, results in the below. Did I miss a step?
./make_chains.py target query test_data/test_reference.fa test_data/test_query.fa --pd test_out -f --chaining_memory 16
# Make Lastz Chains #
Version 2.0.8
Commit: 187e313afc10382fe44c96e47f27c4466d63e114
Branch: main
* found run_lastz.py at /path/to/me/make_lastz_chains/standalone_scripts/run_lastz.py
* found run_lastz_intermediate_layer.py at /path/to/me/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py
* found chain_gap_filler.py at /path/to/me/make_lastz_chains/standalone_scripts/chain_gap_filler.py
* found faToTwoBit at /path/to/me/make_lastz_chains/HL_kent_binaries/faToTwoBit
* found twoBitToFa at /path/to/me/make_lastz_chains/HL_kent_binaries/twoBitToFa
* found pslSortAcc at /path/to/me/make_lastz_chains/HL_kent_binaries/pslSortAcc
* found axtChain at /path/to/me/make_lastz_chains/HL_kent_binaries/axtChain
* found axtToPsl at /path/to/me/make_lastz_chains/HL_kent_binaries/axtToPsl
* found chainAntiRepeat at /path/to/me/make_lastz_chains/HL_kent_binaries/chainAntiRepeat
* found chainMergeSort at /path/to/me/make_lastz_chains/HL_kent_binaries/chainMergeSort
* found chainCleaner at /path/to/me/make_lastz_chains/HL_kent_binaries/chainCleaner
* found chainSort at /path/to/me/make_lastz_chains/HL_kent_binaries/chainSort
* found chainScore at /path/to/me/make_lastz_chains/HL_kent_binaries/chainScore
* found chainNet at /path/to/me/make_lastz_chains/HL_kent_binaries/chainNet
* found chainFilter at /path/to/me/make_lastz_chains/HL_kent_binaries/chainFilter
* found lastz at /path/to/me/make_lastz_chains/HL_kent_binaries/lastz
* found nextflow at /burg/opt/nextflow/23.10.0/nextflow
All necessary executables found.
Making chains for test_data/test_reference.fa and test_data/test_query.fa files, saving results to /path/to/me/make_lastz_chains/test_out
Pipeline started at 2023-12-14 10:35:02.145466
* Setting up genome sequences for target
genomeID: target
input sequence file: test_data/test_reference.fa
is 2bit: False
planned genome dir location: /path/to/me/make_lastz_chains/test_out/target.2bit
Initial fasta file test_data/test_reference.fa saved to /path/to/me/make_lastz_chains/test_out/target.2bit
For target (target) sequence file: /path/to/me/make_lastz_chains/test_out/target.2bit; chrom sizes saved to: /path/to/me/make_lastz_chains/test_out/target.chrom.sizes
* Setting up genome sequences for query
genomeID: query
input sequence file: test_data/test_query.fa
is 2bit: False
planned genome dir location: /path/to/me/make_lastz_chains/test_out/query.2bit
Initial fasta file test_data/test_query.fa saved to /path/to/me/make_lastz_chains/test_out/query.2bit
For query (query) sequence file: /path/to/me/make_lastz_chains/test_out/query.2bit; chrom sizes saved to: /path/to/me/make_lastz_chains/test_out/query.chrom.sizes
### Partition Step ###
# Partitioning for target
Saving partitions and creating 1 buckets for lastz output
In particular, 0 partitions for bigger chromosomes
And 1 buckets for smaller scaffolds
Saving target partitions to: /path/to/me/make_lastz_chains/test_out/target_partitions.txt
# Partitioning for query
Saving partitions and creating 1 buckets for lastz output
In particular, 0 partitions for bigger chromosomes
And 1 buckets for smaller scaffolds
Saving query partitions to: /path/to/me/make_lastz_chains/test_out/query_partitions.txt
Num. target partitions: 0
Num. query partitions: 0
Num. lastz jobs: 0
### Lastz Alignment Step ###
LASTZ: making jobs
LASTZ: saved 1 jobs to /path/to/me/make_lastz_chains/test_out/temp_lastz_run/lastz_joblist.txt
Parallel manager: pushing job /burg/opt/nextflow/23.10.0/nextflow /path/to/me/make_lastz_chains/parallelization/execute_joblist.nf --joblist /path/to/me/make_lastz_chains/test_out/temp_lastz_run/lastz_joblist.txt -c /path/to/me/make_lastz_chains/test_out/temp_lastz_run/lastz_config.nf
N E X T F L O W ~ version 23.10.0
Launching `/path/to/me/make_lastz_chains/parallelization/execute_joblist.nf` [boring_faggin] DSL2 - revision: 0483b29723
executor > local (1)
[c0/1be141] process > execute_jobs (1) [100%] 1 of 1 ✔
### Nextflow process lastz finished successfully
Found 1 output files from the LASTZ step
Please note that lastz_step.py does not produce output in case LASTZ could not find any alignment
### Concatenating Lastz Results (Cat) Step ###
Concatenating LASTZ output from 1 buckets
* skip bucket bucket_ref_bulk_1: nothing to concat
An error occurred while executing cat: Error! No non-empty files found at /path/to/me/make_lastz_chains/test_out/temp_concat_lastz_output. The failed operation label is: cat_step
Traceback (most recent call last):
File "/path/to/me/make_lastz_chains/modules/step_manager.py", line 70, in execute_steps
step_result = step_to_function[step](params, project_paths, step_executables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/me/make_lastz_chains/modules/pipeline_steps.py", line 58, in cat_step
do_cat(params, project_paths, executables)
File "/path/to/me/make_lastz_chains/steps_implementations/cat_step.py", line 51, in do_cat
has_non_empty_file(project_paths.cat_out_dirname, "cat_step")
File "/path/to/me/make_lastz_chains/modules/common.py", line 51, in has_non_empty_file
raise PipelineFileNotFoundError(err_msg)
modules.error_classes.PipelineFileNotFoundError: Error! No non-empty files found at /path/to/me/make_lastz_chains/test_out/temp_concat_lastz_output. The failed operation label is: cat_step
ls -l test_out/temp_concat_lastz_output
total 0
Hi ,
I have been facing a problem with make_chains.py. When I align Target genome with query genome A (contigs 186, genome lenght 600Mb) worked perfectly fine. But thenwhen I want to align a query B (contigs 1500, 700 mb) against the same target, I always found an error in step fillChains. Species A and B are closely related. I was wondering what is wrong. I am attaching the .log file.
Thanks in advance,
Diego
make_chains.log
This is a low importance issue, because the use case is rare and the workaround is easy.
If make_lastz_chains is called with --continue_from_step cat
, but the "cat" step has been run before, make_lastz_chains will fail with e.g.
### Trying to continue from step: cat
Making chains for /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit and /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/210-repeatmasked/GCA_004027375.1_MacSob_v1_BIUU_genomic.2bit files, saving results to /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out
Pipeline started at 2024-02-28 12:56:25.148054
* Setting up genome sequences for target
genomeID: hg38
input sequence file: /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit
is 2bit: True
planned genome dir location: /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out/target.2bit
Traceback (most recent call last):
File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 261, in <module>
main()
File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 257, in main
run_pipeline(args)
File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 233, in run_pipeline
setup_genome_sequences(args.target_genome,
File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/modules/project_setup_procedures.py", line 172, in setup_genome_sequences
os.symlink(arg_input_2bit, seq_dir)
FileExistsError: [Errno 17] File exists: '/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit' -> '/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out/target.2bit'
Because the "target.2bit" file was symlinked from the earlier run, and the os.symlink fails if the destination already exists.
The user workaround is to delete "target.2bit" which fixes this easily.
The developer fix is to check and remove the destination if it exists, or use a symbolic linking function which tolerates already existing destinations. Checking briefly, the os.symlink function does not have this option.
Dear Authors,
thank you very much for providing the pipeline! I would like to run the pipeline on different genomes together to compute whole genome alignment of different species to use them afterwards in TOGA.
Assume we have 4 assemblies: hg38, mm10, xs1 and ml2. Can I apply something like this:
./make_chains.py hg38 mm10 xs1 ml2 /path/to/hg38.fasta /path/to/mm10.fasta /path/to/xs1.fasta /path/to/ml2.fasta
Best,
Ahmad
Dear Prof @MichaelHiller
Thank you for developing such great software!
I'm having some problems with make_lastz_chain, everything seems to be working fine at first,
************ HgStepManager: executing step 'lastz' Wed Aug 31 22:44:00 2022.
doLastzClusterRun ....
testCleanState current step: doLastzClusterRun, /gpfs/home/liunyw/howler_monkey/make_lastz_chain/homSap-macFas/TEMP_run.lastz, /gpfs/home/liunyw/howler_monkey/make_lastz_chain/homSap-macFas/TEMP_run.lastz/lastz.done previous step: doPartition, /gpfs/home/liunyw/howler_monkey/make_lastz_chain/homSap-macFas/TEMP_run.lastz/partition.done
# chmod a+x /gpfs/home/liunyw/howler_monkey/make_lastz_chain/homSap-macFas/TEMP_run.lastz/doClusterRun.sh
# /gpfs/home/liunyw/howler_monkey/make_lastz_chain/homSap-macFas/TEMP_run.lastz/doClusterRun.sh
+ cd /gpfs/home/liunyw/howler_monkey/make_lastz_chain/homSap-macFas/TEMP_run.lastz
+ gensub2 homSap.lst macFas.lst gsub jobList
+ parallel_executor.py lastz_homSapmacFas jobList -q day --memoryMb 10000 -e pbs --co None -p mem3T --eq 10
N E X T F L O W ~ version 21.10.6
Launching `/gpfs/home/liunyw/howler_monkey/make_lastz_chain/homSap-macFas/TEMP_run.lastz/lastz_homSapmacFas/script.nf` [peaceful_booth] - revision: 01f1105dc4
but when the program submits the task, it keeps giving errors
executor > pbs (12)
[bb/ba7c51] process > execute_jobs (13) [ 0%] 4 of 2446, failed: 4, retries: 4
[70/224df0] NOTE: Process `execute_jobs (26)` terminated with an error exit status (127) -- Execution is retried (1)
[a4/c418e8] NOTE: Process `execute_jobs (1)` terminated with an error exit status (127) -- Execution is retried (1)
[06/393f6f] NOTE: Process `execute_jobs (2)` terminated with an error exit status (127) -- Execution is retried (1)
[19/e38b9b] NOTE: Process `execute_jobs (8)` terminated with an error exit status (127) -- Execution is retried (1)
executor > pbs (13)
[d1/24b0e5] process > execute_jobs (3) [ 0%] 10 of 2452, failed: 10, retries: 10
[70/224df0] NOTE: Process `execute_jobs (26)` terminated with an error exit status (127) -- Execution is retried (1)
[a4/c418e8] NOTE: Process `execute_jobs (1)` terminated with an error exit status (127) -- Execution is retried (1)
[06/393f6f] NOTE: Process `execute_jobs (2)` terminated with an error exit status (127) -- Execution is retried (1)
[19/e38b9b] NOTE: Process `execute_jobs (8)` terminated with an error exit status (127) -- Execution is retried (1)
[ae/c82b60] NOTE: Process `execute_jobs (10)` terminated with an error exit status (127) -- Execution is retried (1)
[a8/693d83] NOTE: Process `execute_jobs (4)` terminated with an error exit status (127) -- Execution is retried (1)
[88/aff74e] NOTE: Process `execute_jobs (15)` terminated with an error exit status (127) -- Execution is retried (1)
[4c/e9351d] NOTE: Process `execute_jobs (11)` terminated with an error exit status (127) -- Execution is retried (1)
[40/8918e1] NOTE: Process `execute_jobs (16)` terminated with an error exit status (127) -- Execution is retried (1)
[0e/991e02] NOTE: Process `execute_jobs (20)` terminated with an error exit status (127) -- Execution is retried (1)
Could you help me?
Thank you very much
Xiaolin
Hi,
I use this pipeline to create genome alignment file with local mode. Here is my script
make_chains.py human macAss hg38.2bit macAss.2bit --project_dir human_macAss --executor_queuesize 40
It ran fine at first with no problems. However, I've noticed that I seem to be experiencing some problems, reporting errors as follows
executor > local (1158)
[cc/0029ca] process > execute_jobs (1225) [ 59%] 1118 of 1875
executor > local (1159)
[3f/96e7bf] process > execute_jobs (1081) [ 59%] 1119 of 1875
executor > local (1160)
[cf/920417] process > execute_jobs (1082) [ 59%] 1119 of 1875
executor > local (1160)
[bb/e4ed9a] process > execute_jobs (26) [ 59%] 1120 of 1876, failed: 1, ret...
[bb/e4ed9a] NOTE: Process `execute_jobs (26)` failed -- Execution is retried (1)
executor > local (1161)
[91/7d94e9] process > execute_jobs (1083) [ 59%] 1121 of 1876, failed: 1, ret...
[bb/e4ed9a] NOTE: Process `execute_jobs (26)` failed -- Execution is retried (1)
executor > local (1161)
[91/7d94e9] process > execute_jobs (1083) [ 59%] 1121 of 1876, failed: 1, ret...
[bb/e4ed9a] NOTE: Process `execute_jobs (26)` failed -- Execution is retried (1)
[78/143ab2] process > execute_jobs (1085) [ 59%] 1123 of 1876, failed: 1, ret...
executor > local (1163)
[78/143ab2] process > execute_jobs (1085) [ 59%] 1123 of 1876, failed: 1, ret...
This is supposed to happen in the lastz step, and I noticed that these errors were reported after the ************ HgStepManager: executing step 'lastz' Sun Oct 15 09:08:53 2023.
hint.
And there are 4 errors happening now,
[35/91e12f] process > execute_jobs (1394) [ 72%] 1370 of 1879, failed: 4, ret...
executor > local (1411)
[64/039d77] process > execute_jobs (1395) [ 72%] 1371 of 1879, failed: 4, ret...
executor > local (1411)
[64/039d77] process > execute_jobs (1395) [ 72%] 1371 of 1879, failed: 4, ret...
executor > local (1412)
[c3/9f78eb] process > execute_jobs (1396) [ 73%] 1372 of 1879, failed: 4, ret...
what should I do to fix this?
Thank you in advance for your help.
Yawen
Hi!
I'm trying to run make_chains.py for betta fish and human, but I had errors in the last several jobs (~90% completed) of step 'lastz'. The error is like this
NOTE: Process execute_jobs (3909)
terminated with an error exit status (1) -- Execution is retried (3)
As you suggeset in other similar issues, I applied 4-round RepeatModeler and RepeatMasker for the betta fish genome and download the human genome from https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/ (Which is soft-masked). I also set the --seq1_chunk 50000000 --seq2_chunk 10000000 to reduce the chunk size. But the errors still occurs.
Could you help me figure out why? In addition, is there a way to rerun the failed jobs instead of running the lastz from the beginning?
Thanks!
Hongbing
I am trying to run this tool to get genome alignment chains (very useful, many thanks). It runs without any complains, and I can see in the log file that the program is completed. But, unfortunately I dont see the final chain file as output in the output directory I specify. Am I missing anything here ?
Here is what I am doing,
/Tools/make_lastz_chains/make_chains.py target_id query_id target.fa query.fa --project_dir Lastz_run1
It runs for a day and completed without any issue. I just dont see the output file. Any thoughts what may be wrong ?
Hi @kirilenkobm @MichaelHiller,
I wget
https://github.com/hillerlab/make_lastz_chains/archive/refs/heads/main.zip
installed all dependencies
LASTZ: making jobs
LASTZ: saved 968 jobs to /home/vlamba/make_genome-chaining-Feb3/test/temp_lastz_run/lastz_joblist.txt
Parallel manager: pushing job /share/apps/bioinformatics/nextflow/20.10.0/nextflow /scrfs/storage/vlamba/home/make_lastz_chains-main/parallelization/execute_joblist.nf --joblist /home/vlamba/make_genome-chaining-Feb3/test/temp_lastz_run/lastz_joblist.txt -c /home/vlamba/make_genome-chaining-Feb3/test/temp_lastz_run/lastz_config.nf
Found 8 output files from the LASTZ step
Please note that lastz_step.py does not produce output in case LASTZ could not find any alignment
Concatenating LASTZ output from 8 buckets
Then I switch to V1 make_lastz_chains-1.0.0
and with test
data I got following error:
[d1/4aa5ff] NOTE: Process execute_jobs (334) terminated with an error exit status (127) -- Execution is retried (2)
[b5/4f2f5d] NOTE: Process execute_jobs (332) terminated with an error exit status (127) -- Execution is retried (2)
[9c/4a16b7] NOTE: Process execute_jobs (625) terminated with an error exit status (127) -- Execution is retried (3)
[c7/851fa8] NOTE: Process execute_jobs (678) terminated with an error exit status (127) -- Execution is retried (2)
[ea/b72980] NOTE: Process execute_jobs (719) terminated with an error exit status (127) -- Execution is retried (2)
Error executing process > 'execute_jobs (67)'
Caused by:
Process execute_jobs (67) terminated with an error exit status (127)
Command executed:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/runRepeatFiller.sh /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/jobs/infillChain_158
Command exit status:
127
Command output:
..calling RepeatFiller:
Command error:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/runRepeatFiller.sh: line 12: --workdir: command not found
Work dir:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/fillChain_targetquery/work/21/08e1e7d44488e6d8c672adaff8864f
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
/home/vlamba/python3.14/lib/python3.9/site-packages/py_nf/py_nf.py:404: UserWarning: Nextflow pipeline fillChain_targetquery failed! Execute function returns 1.
warnings.warn(msg)
Uncaught exception from user code:
Command failed:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/doFillChain.sh
HgAutomate::run('/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/...') called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/HgRemoteScript.pm line 117
HgRemoteScript::execute('HgRemoteScript=HASH(0xc3cb78)') called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/doLastzChain.pl line 735
main::doFillChains() called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/HgStepManager.pm line 169
HgStepManager::execute('HgStepManager=HASH(0xc39a18)') called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/doLastzChain.pl line 877
Error!!! Output file /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/target.query.allfilled.chain.gz not found!
The pipeline crashed. Please contact developers by creating an issue at:
https://github.com/hillerlab/make_lastz_chains
I would really appreciate any help/suggestion.
Thank you so much for your time
Hello,
I'm trying to create chain alignments for whole primate genomes (novel sequences) against the vervet reference genome. I removed the spaces and tabs from the fasta file while allowed the process to being but now the pipeline is crashing at doPartition.bash
dwnloads/make_lastz_chains/make_chains.py --pd chain_files --force_def --DEF chain_files/DEF sabaeus neglectus chlSab1/vervet_nospace.fa GuenH/assembly.megabubbles.ns.fasta
Here's the log file: do partition log file.pdf
Please let me know what suggestions you have to fix this error.
Hi, is there any parameter to set the maximum number of threads used?
For example limiting the maximum number of threads used to 40
Hi,
I've been utilizing the make_lastz_chains tool with the following command:
/public/home/zhaohang/soft/make_lastz_chains-2.0.7/make_chains.py target query ${refgenome_softmasked} ${quegenome_softmasked} --pd chain_out -f --chaining_memory 100 --cluster_executor slurm --cluster_queue smp01,amd,low --keep_temp
I'm encountering an issue where some of the lastz subprocesses are timing out and failing to complete.
Could this be due to the large size of the genome or the high proportion of repetitive sequences? What would be the recommended approach to mitigate this issue? Should I reduce the chunk size or extend the runtime of the subprocesses, and if so, could you provide guidance on how to adjust these parameters effectively?
HgAutomate::run("TEMP_run.lastz"...) called at make_lastz_chains/doLastzChains/HgRemoteScript.pm line 117
HgRemoteScript::execute(HgRemoteScript=HASH(0x557801b9a758)) called at make_lastz_chains/doLastzChains/doLastzChain.pl line 370
main::doPartition() called at make_lastz_chains/doLastzChains/HgStepManager.pm line 169
HgStepManager::execute(HgStepManager=HASH(0x557801b45280)) called at make_lastz_chains/doLastzChains/doLastzChain.pl line 877
Hello again,
With your help, I was able to get chain alignments for human-mouse, human-zebrafish, etc.
I used the NCBI soft-masked genomes, and it seems like the repeat-masking was indeed not enough. I can see blocks of many dense and short diagonal lines, i.e. alignments from repeats that escaped the masking. I am re-doing the alignments using UCSC soft-masked (RepeatMasker + TRF?) genomes, plus genomes masked with RepeatModeler + RepeatMasker (as @MichaelHiller suggested).
We are interested in identifying the "best path" alignments, where the two genomes could be more or less 1:1 covered as continuously as possible. (This could become 1:many or many:many if either of both genomes has whole genome duplications, but I would like to think about it later...).
So here are a few questions:
lastz
manual, it seems like there are options (e.g. --gfextend --chain --gapped
in Fig 2f ) and steps (chaining, interpolation by --inner=
, etc.) that could help longer and "cleaner" alignments. I wonder whether these steps and options were already applied. For example, is FILL_CHAIN=1
essentially performing interpolation?Thanks again!
Dong-Ha
Hi !
I get this error if I use 'slurm' to submit the job,
but I don't get it if I run the command directly on another server.
The programme was running normally for some time before this error occurred
The version of my nextflow is "version 23.10.0.5889",
I am trying to downgrade nextflow below 22.12.0, as mentioned in "#18".
The following is a portion of the run log file
“
executor > local (627)
[77/314080] process > execute_jobs (618) [ 25%] 620 of 2414
executor > local (627)
[77/314080] process > execute_jobs (618) [ 25%] 621 of 2414
executor > local (628)
[33/b87af4] process > execute_jobs (620) [ 25%] 621 of 2414
executor > local (629)
[a0/766938] process > execute_jobs (605) [ 25%] 622 of 2415, failed: 1, retri...
[a0/766938] NOTE: Process execute_jobs (605)
terminated with an error exit status (1) -- Execution is retried (1)
executor > local (630)
[37/c0e27f] process > execute_jobs (622) [ 25%] 623 of 2416, failed: 2, retri...
[5f/02fc3c] NOTE: Process execute_jobs (606)
terminated with an error exit status (1) -- Execution is retried (1)
executor > local (631)
[8f/b9b0ef] process > execute_jobs (624) [ 25%] 624 of 2416, failed: 2, retri...
executor > local (632)
[67/ff4d54] process > execute_jobs (623) [ 25%] 625 of 2416, failed: 2, retri...
”
Hi @kirilenkobm,
I've been trying to run make_lastz_chains on my dataset and I ran the following command:
./make_chains.py homSap gorGor ./raw-genome/homSap.sm.fa ./raw-genome/gorGor.sm.fa --executor pbs --executor_queuesize 100 --project_dir homSap-gorGor
However I have encountered an error message.
************ HgStepManager: executing step 'lastz' Sat May 7 20:48:40 2022.
doLastzClusterRun ....
testCleanState current step: doLastzClusterRun, /gpfs/home/liunyw/howler_monkey/gene_loss/homSap-gorGor/TEMP_run.lastz, /gpfs/home/liunyw/howler_monkey/gene_loss/homSap-gorGor/TEMP_run.lastz/lastz.done previous step: doPartition, /gpfs/home/liunyw/howler_monkey/gene_loss/homSap-gorGor/TEMP_run.lastz/partition.done
# chmod a+x /gpfs/home/liunyw/howler_monkey/gene_loss/homSap-gorGor/TEMP_run.lastz/doClusterRun.sh
# /gpfs/home/liunyw/howler_monkey/gene_loss/homSap-gorGor/TEMP_run.lastz/doClusterRun.sh
+ cd /gpfs/home/liunyw/howler_monkey/gene_loss/homSap-gorGor/TEMP_run.lastz
+ gensub2 homSap.lst gorGor.lst gsub jobList
+ parallel_executor.py lastz_homSapgorGor jobList -q day --memoryMb 10000 -e pbs --co None -p batch --eq 100
Traceback (most recent call last):
File "/gpfs/home/liunyw/biosoft/make_lastz_chains/HL_scripts/parallel_executor.py", line 176, in <module>
main()
File "/gpfs/home/liunyw/biosoft/make_lastz_chains/HL_scripts/parallel_executor.py", line 155, in main
queue=_q_arg,
File "/gpfs/home/liunyw/soft/python3/lib/python3.6/site-packages/py_nf/py_nf.py", line 125, in __init__
self.__check_executor()
File "/gpfs/home/liunyw/soft/python3/lib/python3.6/site-packages/py_nf/py_nf.py", line 216, in __check_executor
raise NotImplementedError(msg)
NotImplementedError: Executor pbs is not supported, abort
This looks like the --executor
parameter is not set properly. However, what confuses me is that the Executors of cluster I am using is pbs and I have used process.executor = 'pbs'
in my TOGA test data(nextflow_config_files/call_cesar_config_template.nf) and it worked.
Thank you
Yawen
Hi, Bogdan and Michael @kirilenkobm @MichaelHiller,
This was originally a comment on issue #9. Then, I realize it's a separate issue.
When I tried human vs. Chimpanzee and human vs. Macaca mulatta, the runs stalled at the doChainRun
step. All runs started to fail after trying a couple of the 25 doChainRun.csh
jobs (which, to my understanding, = axtChain | chainAntiRepeat to convert 25 psl
chunks to 25 chained axt
chunks). It started to fail, retry, and so on, and did not appear to move.
Please see this log file for an example of the failed runs: 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0__failed.log
Could you suggest a parameter I can change to make the doChainRun
step work?
More specifically:
--chaining_memory
(default 50000) help doChainRun
step?--chaining_memory
parameter? Is it total RAM (in MB?) assigned to each of the 25 jobs? Can I use the total RAM allowed for the entire node (e.g. 192000 for 192GB)?psl
files if this happens because each chunk is too big?Thanks!
Dong-Ha
Could you please tell if something is wrong with this command as it is running even after 4 days. It started with running in a cluster occupying a total of around 500 single core jobs but around 125 jobs are still running.
make_chains.py --project_dir OUTPUT --executor slurm --cluster_parameters '-A AB-123456' --executor_partition defq --force_def chicken bird chicken.2bit bird.2bit
Wasn't sure if I should ask this here or on the Genome Alignment Tools repo, but how do these repos fit together in a lastz/chain/maf/multiz pipeline?
I've used make_lastz_chains to create a
target.query.allfilled.chain.gz file
And put that directly into TOGA (not sure if that's correct) to get files like:
query_annotation.bed
But now I'd like to get a MAF file and I'm not sure wether I should use a bunch of tools consecutively:
preNet -> chainNet -> netSyntenic -> netClass -> netToAxt -> axtToMaf
Or if I should use the scripts from the Genome Alignment Tools repo
Create a net file somehow? -> FilterChains_Net_FilterNets.perl -> NetFilterNonNested.perl -> netToAxt -> axtToMaf
Is there a way these tools are supposed to fit together?
Hello,
when running make_chains.py on two 700MB genomes, the temporary files were stored in /dev/mapper/cl-root, which caused the storage of /dev/mapper/cl-root to be filled up and eventually resulted in the termination of make_chains.py. Could you please advise on how to modify the temporary file generation path for make_chains.py? Thank you very much for your help.
##Centos7
Hi authors,
I ran the ./make_chains.py Cot Dis /home/vlamba/target.fa /storage/vlamba/data/Genomes-noto/NOTO-genome/query.fa --project_dir /home/vlamba/CotDis_chaining -f --chaining_memory 30
and got the following error:
`### Nextflow process lastz finished successfully
Found 8 output files from the LASTZ step
Please note that lastz_step.py does not produce output in case LASTZ could not find any alignment
Concatenating LASTZ output from 8 buckets
Could you please suggest If I did something wrong?
Thank you
Some doLastzClusterRun jobs exceed the hardcoded process.time of two days.
make_lastz_chains/constants.py
Line 110 in 187e313
The default behavior for these jobs seems to be to rerun. However, the result appears to be an infinitely rerunning failing job. (Maybe there is an upper limit to the number of reruns, but for two day--long jobs I haven't seen this reach an upper limit).
An easy solution is to allow customizable JOB_TIME_REQ
by setting 48h as the default and allowing user overrides by command line option.
I suppose it is also possible to split the genomes into smaller chunks, but I do not know enough about genome-genome alignments to say if that might affect the resulting alignments. If it might affect alignment results, I would prefer increasing JOB_TIME_REQ
.
This issue is related to #43 and perhaps #48.
Personally, a very long JOB_TIME_REQ
is fine for me, since I run the Nextflow process with local executor within a single large cluster job.
Dear Developers,
Than you very much for your software.
I tried to make chains file and the following message occurred:
/home/popov/make_lastz_chains/make_chains.py hg38 my_gen /home/popov/Human_genome_38/GRCh38.primary_assembly.genome.ordered.fasta /home/popov/primary.genome.scf.fasta.scaffolds.fa --project_dir /home/popov/chains_first_attempt --executor slurm --executor_partition genetics
Project directory: /home/popov/chains_first_attempt
Target path: /home/popov/chains_first_attempt/hg38.2bit | chrom sizes: /home/popov/chains_first_attempt/hg38.chrom.sizes
Query path: /home/popov/chains_first_attempt/my_gen.2bit | query sizes: /home/popov/chains_first_attempt/my_gen.chrom.sizes
Calling: /home/popov/chains_first_attempt/master_script.sh...
BIN: /home/popov/make_lastz_chains/doLastzChains
which: no RepeatFiller.py in (/home/popov/make_lastz_chains/GenomeAlignmentTools/src:/home/popov/make_lastz_chains/kent_binaries:/home/popov/make_lastz_chains/HL_kent_binaries:/home/popov/make_lastz_chains/HL_scripts:/home/popov/make_lastz_chains/doLastzChains:/home/popov/miniconda3/envs/togaenv/bin:/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.3/bin:/opt/ohpc/pub/compiler/gcc/8.3.0/bin:/opt/ohpc/pub/utils/prun/1.3:/opt/ohpc/pub/utils/autotools/bin:/opt/ohpc/pub/bin:/usr/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/popov/.local/bin:/home/popov/bin)
PARAMETERS:
clusterRunDir /home/popov/chains_first_attempt
chainMinScore 1000
chainLinearGap loose
maxNumLastzJobs 6000
numFillJobs 1000
verbose 1
debug 0
/home/popov/chains_first_attempt/DEF looks OK!
tDb=hg38
qDb=my_gen
seq1dir=/home/popov/chains_first_attempt/hg38.2bit
seq2dir=/home/popov/chains_first_attempt/my_gen.2bit
Will run RepeatFiller.
Will run chainCleaner with parameters: -LRfoldThreshold=2.5 -doPairs -LRfoldThresholdPairs=10 -maxPairDistance=10000 -maxSuspectScore=100000 -minBrokenChainScore=75000
max number of lastz cluster jobs: 6000
The /home/popov/make_lastz_chains/doLastzChains/doLastzChain.pl run will be performed in directory /home/popov/chains_first_attempt. All temp files will be written there.
HgStepManager: executing from step 'partition' through step 'cleanChains'.
************ HgStepManager: executing step 'partition' Thu Sep 7 17:58:05 2023.
doPartition ....
doPartition: seq2MaxLength ....
doPartition: seq2MaxLength = 57195878
doPartition: partitionTargetCmd partitionSequence.pl 175000000 0 /home/popov/chains_first_attempt/hg38.2bit /home/popov/chains_first_attempt/hg38.chrom.sizes -xdir xdir.sh -rawDir /home/popov/chains_first_attempt/TEMP_psl 4000 -lstDir tParts > hg38.lst
doPartition: partitionQueryCmd partitionSequence.pl 50000000 10000 /home/popov/chains_first_attempt/my_gen.2bit /home/popov/chains_first_attempt/my_gen.chrom.sizes 10000 -lstDir qParts > my_gen.lst
content of /home/popov/chains_first_attempt/TEMP_run.lastz/doPartition.bash
#chmod a+x /home/popov/chains_first_attempt/TEMP_run.lastz/doPartition.bash
#/home/popov/chains_first_attempt/TEMP_run.lastz/doPartition.bash
Can you please tell what could be the reason of this problem and how to solve it?
Idea for the future: add a module that accumulates detailed statistics for each pipeline step.
Like, as how many chains we get for each chromosome after axtChain, how many blocks of what length, how many new blocks we get at the fill chain step, and other information that may be relevant for QC.
If the reference has many short chromosomes/scaffolds, it results in a tremendous amount of last jobs (hundreds of thousands).
The partitioning step must handle such cases.
For example, by merging such chromosomes/scaffolds into "buckets", processed as a single unit by run_lastz.py
Hi again,
We have been running make_lastz_chains
for combinations of species pairs, masking options, and LastZ parameter sets. The pipeline usually runs smooth, except a couple of times when the comparison was heavy (larger and more closely related genomes).
On one occasion, the primary lastz
alignment step (=total 1,825 jobs) was finished, but in a later step (doChainRun
), the pipeline was stalled, failing and retrying 25 jobs. Here is the command used (run on a node with 32-core, with 4GB RAM per core):
echo -e "FILL_CHAIN=0" > DEF_noFillChain # we were skipping RepeatFiller for this run
make_chains.py h38w5 cPTRv2w5 h38_primaryAssembly.WMt98.0.fa cPTRv2.WMt98.0.fa \
--DEF DEF_noFillChain --executor_queuesize 32 \
--project_dir 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0 \
> 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0.log 2>&1
And the log file: 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0__failed.log
We can see the following steps were done:
$ ls -ltrh 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0/TEMP_*/*.done
-rw-r--r-- 1 ohd3 contig 0 Oct 10 09:36 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0/TEMP_run.lastz/partition.done
-rw-r--r-- 1 ohd3 contig 0 Oct 11 23:16 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0/TEMP_run.lastz/lastz.done
-rw-r--r-- 1 ohd3 contig 0 Oct 11 23:31 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0/TEMP_run.cat/cat.done
Hence, we wanted to resume the run on a node with more RAM (8GB per core). We first tried with --resume
and received this complaint (path simplified):
Confusion: ./221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0/DEF already exists
Please set --force_def to override
So we tried again with the following:
echo -e "FILL_CHAIN=0" > DEF_noFillChain # we were skipping RepeatFiller for this run
make_chains.py h38w5 cPTRv2w5 h38_primaryAssembly.WMt98.0.fa cPTRv2.WMt98.0.fa \
--resume --force_def \
--DEF DEF_noFillChain --executor_queuesize 32 \
--project_dir 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0 \
> 221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0_resume.log 2>&1
Now the complaint included this line (paths simplified):
doPartition: looks like doPartition was already successful (./221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0/TEMP_run.lastz/partition.done exists).
Either -continue {some later step} or move aside/remove ./221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0/TEMP_run.lastz/ and run again.
Questions:
TEMP_run.lastz
folder may make the pipeline run the 1,825 lastz
jobs again, we would like to try -continue {some later step}
. What should we use as the {some later step}
? Will it be doChainRun
? Is there a list of steps where we can resume the pipeline?--chaining_memory CHAINING_MEMORY
to, say, 100000 or 200000 help? The default here seems to be 50000 (MB? per core?).Thanks a lot!
Dong-Ha
P.S.
In case needed, here are the full resume log files:
221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0_resume__1stAttempt.log
221009_LastZ_chain_h38w5cPTRv2w5_mlcV2F0_resume__2ndAttempt.log
Hello again,
I am now exploring the first chain
output I obtained from a successful run :)
I am curious how we can get basic stats for the alignments:
.psl
or .axt
) and the chains (.chain
)?I can extract and process the headers of the chain
file to answer some of the above questions, but I wonder if there are already tools to do so.
Plus, especially for item 4, we may need the .psl
(or .axt
?) alignment since .chain
appears to have dropped the information on the similarity scores of alignments. Is there a way to retrieve a concatenated .psl
alignment file?
Here are the files and folders in the output directory after the run:
$ ls -1tr 220725_LastZ_chain_DmeDsi_2ndAttempt
Dme.2bit
Dme.chrom.sizes
Dsi.2bit
Dsi.chrom.sizes
DEF
make_chains_py_params.json
master_script.sh
TEMP_psl
TEMP_run.lastz
TEMP_pslParts
TEMP_run.cat
TEMP_run.fillChain
Dme.Dsi.allfilled.chain.gz
TEMP_axtChain
cleanUp.csh
Thanks!
I think there might be some missing requirements
Hi I am trying to chain the chicken and turkey genomes but I am getting this error :
/nas5/aluzuriaganeira/birds/TOGA/make_lastz_chains/chicken_turkey/temp_lastz_run/work/33/98c1845c4ef249c5f4bae4eb8e5517/.command.sh: line 2: /nas5/aluzuriaganeira/birds/TOGA/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py: Argument list too long
Any ideas on how to fix it, I am running it in a cluster with 50 cpus and 80GB of memory so I don't think is a problem of resources.
Hi, Bogdan and Michael @kirilenkobm @MichaelHiller,
Thanks again for developing this pipeline and also for responding to our requests. We have successfully used the pipeline for species pairs with MASH distances ranging from 0.1 to 0.27, with the default parameters.
In the previous ticket #10, the issue was runtime and RAM usage for close species (e.g., MASH distance <0.1, such as human vs. primates). How about more distant species pairs, e.g., human vs. zebrafish with MASH distance 0.29?
The default make_lastz_chains
parameters are already quite sensitive. But I'd like to know if there is a way to tweak it even more for distant species, aiming to retrieve as much synteny information for orthologs as possible.
UCSC has these example parameter sets:
chainNear="-minScore=5000 -linearGap=medium"
chainMedium="-minScore=3000 -linearGap=medium"
chainFar="-minScore=5000 -linearGap=loose"
lastzNear="C=0 E=150 H=0 K=4500 L=3000 M=254 O=600 Q=/scratch/data/blastz/human_chimp.v2.q T=2 Y=15000"
lastzMedium="C=0 E=30 H=0 K=3000 L=3000 M=50 O=400 T=1 Y=9400"
lastzFar="C=0 E=30 H=2000 K=2200 L=6000 M=50 O=400 Q=/scratch/data/blastz/HoxD55.q T=2 Y=3400"
make_lastz_chains
can accept different K, L, H, and Y options, and as Michael pointed out in an email with us, K and L might be the most important. I will try with K=2200 and L=6000 ( make_lastz_chains
default: K=2400 and L=3000)
Could you also add options to control the following:
make_lastz_chains
already uses -linearGap=loose
as default).Plus, I would appreciate any other suggestions for distant species, with the aim of retrieving as much synteny information for orthologs as possible.
Thanks!
Dong-Ha
Hi Authors
I tried the V1 of this pipeline but got the following error.
[d1/4aa5ff] NOTE: Process execute_jobs (334)
terminated with an error exit status (127) -- Execution is retried (2)
[b5/4f2f5d] NOTE: Process execute_jobs (332)
terminated with an error exit status (127) -- Execution is retried (2)
[9c/4a16b7] NOTE: Process execute_jobs (625)
terminated with an error exit status (127) -- Execution is retried (3)
[c7/851fa8] NOTE: Process execute_jobs (678)
terminated with an error exit status (127) -- Execution is retried (2)
[ea/b72980] NOTE: Process execute_jobs (719)
terminated with an error exit status (127) -- Execution is retried (2)
Error executing process > 'execute_jobs (67)'
Caused by:
Process execute_jobs (67)
terminated with an error exit status (127)
Command executed:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/runRepeatFiller.sh /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/jobs/infillChain_158
Command exit status:
127
Command output:
..calling RepeatFiller:
Command error:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/runRepeatFiller.sh: line 12: --workdir: command not found
Work dir:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/fillChain_targetquery/work/21/08e1e7d44488e6d8c672adaff8864f
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
/home/vlamba/python3.14/lib/python3.9/site-packages/py_nf/py_nf.py:404: UserWarning: Nextflow pipeline fillChain_targetquery failed! Execute function returns 1.
warnings.warn(msg)
Uncaught exception from user code:
Command failed:
/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/doFillChain.sh
HgAutomate::run('/scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/...') called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/HgRemoteScript.pm line 117
HgRemoteScript::execute('HgRemoteScript=HASH(0xc3cb78)') called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/doLastzChain.pl line 735
main::doFillChains() called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/HgStepManager.pm line 169
HgStepManager::execute('HgStepManager=HASH(0xc39a18)') called at /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/doLastzChains/doLastzChain.pl line 877
Error!!! Output file /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/target.query.allfilled.chain.gz not found!
The pipeline crashed. Please contact developers by creating an issue at:
https://github.com/hillerlab/make_lastz_chains
Hi
I want to do chainself. Can this pipeline do chainself?
Could you please help with?
N E X T F L O W ~ version 21.10.1
Launching `XXXXX/TEMP_run.lastz/lastz_chickenNB/script.nf` [maniac_yonath] - revision: 20fae9dc0e
[- ] process > execute_jobs -
executor > local (17)
[37/e70339] process > execute_jobs (4) [ 0%] 0 of 190
executor > local (49)
[b9/89be59] process > execute_jobs (19) [ 0%] 0 of 225
executor > local (50)
[2a/0acf5d] process > execute_jobs (46) [ 0%] 1 of 226, failed: 1, retries: 1
[2a/0acf5d] NOTE: Process `execute_jobs (46)` terminated with an error exit status (2) -- Execution is retried (1)
executor > local (51)
[3c/cbb0c0] process > execute_jobs (51) [ 0%] 2 of 227, failed: 2, retries: 2
[d2/aff088] NOTE: Process `execute_jobs (64)` terminated with an error exit status (2) -- Execution is retried (1)
Hi
I met a error about nextflow. I used nextflow version 20 in ubuntu. Could you help me?
N E X T F L O W ~ version 20.10.0
Launching /software/make_lastz_chains/Tele2Ppup/TEMP_run.lastz/lastz_PpupTele/script.nf
[romantic_albattani] - revision: d48f32974e
/tmp/nxf-13131159217981153035
/software/miniconda3/envs/NF/lib/python3.8/site-packages/py_nf/py_nf.py:404: UserWarning: Nextflow pipeline lastz_PpupTele failed! Execute function returns 1.
The pipeline is planned to work with both fasta and twoBit formats as input.
However, the 2.0.5
could not handle 2bit input.
To be fixed in the 2.0.6
Hello,
I encountered an issue when running the test data with the following command:
make_chains.py target query test_data/test_reference.fa test_data/test_query.fa --pd test_out -f --chaining_memory 16
The error I received is as follows:
### Nextflow process chain_run finished successfully
An error occurred while executing chain_run: Error! No non-empty files found at /home/zhaohang/soft/make_lastz_chains-2.0.6/test_out/temp_chain_run/chain. The failed operation label is: chain_run
Traceback (most recent call last):
File "/home/zhaohang/soft/make_lastz_chains-2.0.6/modules/step_manager.py", line 70, in execute_steps
step_result = step_to_function[step](params, project_paths, step_executables)
File "/home/zhaohang/soft/make_lastz_chains-2.0.6/modules/pipeline_steps.py", line 64, in chain_run_step
do_chain_run(params, project_paths, executables)
File "/home/zhaohang/soft/make_lastz_chains-2.0.6/steps_implementations/chain_run_step.py", line 113, in do_chain_run
has_non_empty_file(project_paths.chain_output_dir, "chain_run")
File "/home/zhaohang/soft/make_lastz_chains-2.0.6/modules/common.py", line 51, in has_non_empty_file
raise PipelineFileNotFoundError(err_msg)
modules.error_classes.PipelineFileNotFoundError: Error! No non-empty files found at /home/zhaohang/soft/make_lastz_chains-2.0.6/test_out/temp_chain_run/chain. The failed operation label is: chain_run
Any assistance on this would be greatly appreciated.
Thank you!
Hi!
I was running the command: make_chains.py MALLARD QUERY_GENOME GCF_015476345.1_ZJU1.0_genomic.fna GCA_907165065.1_bCapEur3.1_genomic.fna --project_dir /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME --force_def --chaining_memory 200000 --seq1_chunk 50000000 --seq2_chunk 10000000
With error output:
Target path: /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.2bit | chrom sizes: /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.chrom.sizes
Query path: /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/QUERY_GENOME.2bit | query sizes: /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/QUERY_GENOME.chrom.sizes
Calling: /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/master_script.sh...
BIN: /beegfs/store4/chenyangkang/software/make_lastz_chains/doLastzChains
PARAMETERS:
clusterRunDir /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME
chainMinScore 1000
chainLinearGap loose
maxNumLastzJobs 6000
numFillJobs 1000
verbose 1
debug 0
/beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/DEF looks OK!
tDb=MALLARD
qDb=QUERY_GENOME
seq1dir=/beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.2bit
seq2dir=/beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/QUERY_GENOME.2bit
Will run RepeatFiller.
Will run chainCleaner with parameters: -LRfoldThreshold=2.5 -doPairs -LRfoldThresholdPairs=10 -maxPairDistance=10000 -maxSuspectScore=100000 -minBrokenChainScore=75000
max number of lastz cluster jobs: 6000
The /beegfs/store4/chenyangkang/software/make_lastz_chains/doLastzChains/doLastzChain.pl run will be performed in directory /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME. All temp files will be written there.
HgStepManager: executing from step 'partition' through step 'cleanChains'.
************ HgStepManager: executing step 'partition' Mon Jun 5 19:59:39 2023.
doPartition ....
doPartition: seq2MaxLength ....
doPartition: seq2MaxLength = 126318510
doPartition: partitionTargetCmd partitionSequence.pl 50000000 0 /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.2bit /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.chrom.sizes -xdir xdir.sh -rawDir /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/TEMP_psl 4000 -lstDir tParts > MALLARD.lst
doPartition: partitionQueryCmd partitionSequence.pl 10000000 10000 /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/QUERY_GENOME.2bit /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/QUERY_GENOME.chrom.sizes 10000 -lstDir qParts > QUERY_GENOME.lst
content of /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/TEMP_run.lastz/doPartition.bash
# chmod a+x /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/TEMP_run.lastz/doPartition.bash
# /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/TEMP_run.lastz/doPartition.bash
+ cd /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/TEMP_run.lastz
+ partitionSequence.pl 50000000 0 /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.2bit /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.chrom.sizes -xdir xdir.sh -rawDir /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/TEMP_psl 4000 -lstDir tParts
lstDir tParts must be empty, but seems to have files (part001.lst ...)
Uncaught exception from user code:
Command failed:
/beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/TEMP_run.lastz/doPartition.bash
HgAutomate::run('/beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_projec...') called at /beegfs/store4/chenyangkang/software/make_lastz_chains/doLastzChains/HgRemoteScript.pm line 117
HgRemoteScript::execute('HgRemoteScript=HASH(0x20cb858)') called at /beegfs/store4/chenyangkang/software/make_lastz_chains/doLastzChains/doLastzChain.pl line 370
main::doPartition() called at /beegfs/store4/chenyangkang/software/make_lastz_chains/doLastzChains/HgStepManager.pm line 169
HgStepManager::execute('HgStepManager=HASH(0x20c8e20)') called at /beegfs/store4/chenyangkang/software/make_lastz_chains/doLastzChains/doLastzChain.pl line 877
Error!!! Output file /beegfs/store4/chenyangkang/06.ebird_data/43.Phenology_project_files/data/chains/QUERY_GENOME/MALLARD.QUERY_GENOME.allfilled.chain.gz not found!
The pipeline crashed. Please contact developers by creating an issue at:
https://github.com/hillerlab/make_lastz_chains
The header of the sequence is quite simple:
>BCE_CHR_109
>BCE_CHR_110
>BCE_CHR_111
or
>ZJU_CHR_748
>ZJU_CHR_749
>ZJU_CHR_750
Any hint? Thanks!
Yangkang
If the target_name
/query_name
has the same value as the basename of target_genome
/query_genome
, then the pipeline fails with the "no non-empty files" non-specific error:
### Concatenating Lastz Results (Cat) Step ###
Concatenating LASTZ output from 1 buckets
* skip bucket bucket_ref__chrX_in_0_156040895: nothing to concat
An error occurred while executing cat: Error! No non-empty files found at ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/temp_concat_lastz_output. The failed operation label is: cat_step
Traceback (most recent call last):
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/modules/step_manager.py", line 70, in execute_steps
step_result = step_to_function[step](params, project_paths, step_executables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/modules/pipeline_steps.py", line 58, in cat_step
do_cat(params, project_paths, executables)
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/steps_implementations/cat_step.py", line 51, in do_cat
has_non_empty_file(project_paths.cat_out_dirname, "cat_step")
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/modules/common.py", line 51, in has_non_empty_file
raise PipelineFileNotFoundError(err_msg)
modules.error_classes.PipelineFileNotFoundError: Error! No non-empty files found at ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/temp_concat_lastz_output. The failed operation label is: cat_step
In fact, the initial error seems to be from Lastz. Tracing the chains_joblist commands, I see the command,
~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit:chrX:0-156040895 ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/query.2bit:chrX:0-50000000 ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/pipeline_parameters.json ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/temp_lastz_psl_output/bucket_ref__chrX_in_0_156040895/chrX_chrX__1.psl ~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py --output_format psl --axt_to_psl ~/repos/toga-pipeline/.snakemake/conda/d1516d7b42018368bb384d7b98ada0c3_/bin/axtToPsl
Which, when run, gives a traceback point to Lastz:
Traceback (most recent call last):
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 197, in call_lastz
lastz_out = subprocess.check_output(cmd, shell=True, stderr=subprocess.PIPE).decode("utf-8")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/repos/toga-pipeline/.snakemake/conda/d1516d7b42018368bb384d7b98ada0c3_/lib/python3.12/subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/repos/toga-pipeline/.snakemake/conda/d1516d7b42018368bb384d7b98ada0c3_/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'lastz "~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit/chrX[1,156040895][multiple]" "~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/query.2bit/chrX[1,50000000][multiple]" Y=9400 H=2000 L=3000 K=2400 --traceback=800.0M --format=axt+' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 336, in <module>
main()
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 317, in main
lastz_output = call_lastz(cmd)
^^^^^^^^^^^^^^^
File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 201, in call_lastz
raise LastzProcessError(
LastzProcessError: Lastz command failed with exit code 1. Error message: FAILURE: fopen_or_die failed to open "~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit/chrX" for "rb"
When I checked ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit/chrX
, I found that ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit
is a soft-link to a file, which is probably why Lastz failed, because it thinks that is a directory (?).
When I changed the target_name
/query_name
such that it was different from the basename of the target_genome
/query_genome
files (and deleted all the intermediate files), this error resolved. I regret that I do not have time to investigate further and provide a more detailed report.
Hi,
I have been running make_lastz_chain with several species (all closely related sps, >50MYA divergence) and it works great. However, for only 3 species I have had an error in the last jobes (>90% completed) of step 'lastz'. I was wondering if you could identify the error in these particular sps. (I am running all in def.)
Thanks in advance,
Diego
Hello! I am trying to make genome alignment chains using your pipeline, but I keep getting an error related to the nextflow submission process. It fails to submit the joblist that it produces, giving me this error message:
N E X T F L O W ~ version 23.04.0 Launching
.../TOGA/make_lastz_chains/parallelization/execute_joblist.nf`
ERROR ~ Error executing process > 'execute_jobs (12)'
Caused by:
Failed to submit process to grid scheduler for execution
Command executed:
sbatch .command.run
Command exit status:
1
Command output:
sbatch: error: invalid partition specified: batch
sbatch: error: Batch job submission failed: Invalid partition name specified`
I looked into the execute_joblist.nf file, and it calls a config file for submissions. I believe editing the partition in the config file will probably fix this process, but I am not sure where to find it. Any help with this would be much appreciated!
Details about input data:
I softmasked the genomes (around 30-40% of genome is soft-masked), and then turned them into 2bit files using faToTwoBit.
Hi @kirilenkobm,
Since the response of Michael in a previous issue, I successfully masked 3 dog genomes available (35-43%) and the lastz step was done within the expected time (~ 3 days on a 32-core machine). Sadly, I am stuck at the fillChains step with various error exit status. I started running make_lastz_chains with ROS_Cfam_1.0 (NCBI reference; 43% masked) against hg38 as follows:
./make_chains.py "human" "dog" "hg38_masked.fa" "ROS_Cfam_1.0.masked.fa" --queue_executor 32 --project_dir test
I re-run the job like 3 times after the first error with --continue_arg "fillChains"
.
This is my log file: make_chains.log
Any advice?
Regards,
Alejandro
Hi everyone,
Do you think I can run make lastz chains in fragmented genomes? One has 757 scaffolds and the other has 60,750. I already used RAGTAG to improve the assembly and they are already masked. I am using a 32 CPU server to run the analysis and I am planning to run TOGA too.
Thanks in advance
Hello, got an error at last Clean chains step: ### Clean Chains Step ###
Chains were filled: using /users/hpctestuser/mprylutskyi/makechains/make_lastz_chains/deschrambler_test_files/temp_chain_run/human.cow.filled.chain.gz as input
Chain to be cleaned saved to: /users/hpctestuser/mprylutskyi/makechains/make_lastz_chains/deschrambler_test_files/temp_chain_run/human.cow.before_cleaning.chain.gz
An error occurred while executing clean_chains: 'str' object has no attribute 'removesuffix'
I would be grateful for any help
Inspired by:
hillerlab/TOGA#109
Hello, @kirilenkobm and @MichaelHiller,
Thanks for creating this tool - I was looking for an option to run lastz
genome-to-genome alignment on multiple threads and ended up here.
I was submitting a job to a single SGE node that has 16 CPUs. Since I am using only one machine and not going to qsub
from the node, I used the default ( --executor local
) option (--executor sge
switched back to local anyway when it realized qsub
is not allowed in the node).
Nextflow documentation says
The local executor is used by default. It runs the pipeline processes in the computer where Nextflow is launched. The processes are parallelised by spawning multiple threads and by taking advantage of multi-cores architecture provided by the CPU
But looking at the temp. folders and files during the run, I am not sure whether all available threads are being used. It seems like the query sequence is divided to 4 chunks (this might be decided by the chunk size settings rather than number of threads, I guess) and being aligned to the target. But the 4 chunks appears to be aligned one after another, rather than in parallel.
...
The following was the command line I used:
make_chains.py Dme Dsi Dme_genomic.fna Dsi_genomic.fna --project_dir DmeDsi_1stAttempt
I also tried to convey that there are 16 CPUs, but this ran exactly the same way (four chunks of query being processed one after another):
make_chains.py Dme Dsi Dme_genomic.fna Dsi_genomic.fna --executor local --cluster_parameters "cpus=16" --project_dir DmeDsi_1stAttempt2
Is there a way to run this pipeline on multiple threads in parallel on a local machine? Or is each of the four chunks somehow already being aligned using all available CPU threads?
Thanks again!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.