Comments (21)
Just to debug: Have you tried calling the tool by using its full path? I.e.
/home/xin/.nextflow/assets/sanger-pathogens/companion/bin/update_references.lua
instead of however you called it before (so no relative paths)?
Also, what GenomeTools (gt
) version are you using? Can you try calling the tool as:
gt /home/xin/.nextflow/assets/sanger-pathogens/companion/bin/update_references.lua
and see what happens?
from companion.
Thanks for you prompt reply.
xin@compare-vm-1:/analysis/xin/parasite/bin$ gt /home/xin/.nextflow/assets/sanger-pathogens/companion/bin/update_references.lua
tool '/home/xin/.nextflow/assets/sanger-pathogens/companion/bin/update_references.lua' not found; option -help lists possible tools
xin@compare-vm-1:/analysis/xin/parasite/bin$ ls -l /home/xin/.nextflow/assets/sanger-pathogens/companion/bin/update_references.lua
-rwxrwxr-x 1 xin xin 14960 Jun 4 17:17 /home/xin/.nextflow/assets/sanger-pathogens/companion/bin/update_references.lua
gt -version
gt (GenomeTools) 0.6.5 (2018-06-04 10:46:00)
Copyright (c) 2003-2007 Gordon Gremme [email protected]
Copyright (c) 2003-2007 Center for Bioinformatics, University of Hamburg
See LICENSE file or http://genometools.org/license.html for license details.
Used compiler: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Compile flags: -Wall -Os -I/homes/xin/Downloads/genometools-0.6.5/src -I/homes/xin/Downloads/genometools-0.6.5/obj
from companion.
Thanks! First of all, please install a newer GenomeTools version, the one you seem to use is quite old and does not support some of the features the Lua script needs to run. GenomeTools is at version 1.5.10 currently (http://genometools.org/pub/)
from companion.
Many thanks for your support!
from companion.
@satta
I am considering download the reference genomes from ftp://ftp.sanger.ac.uk/pub/project/pathogens/companion/CryptoDB.org/. There is references.json and references-in.json files, but no config file.
Can they be directly put into the command line "nextflow run sanger-pathogens/companion" or update_references.lua still need to be run to import the reference data?
from companion.
update_references.lua
is only needed to build new references from basic sequence+annotation files. The pre-compiled files should be usable directly by downloading them and using their location in the Companion workflow's config file as the value of the ref_dir
variable. For example:
ref_dir = "/path/to/my/companion/CryptoDB.org"
ref_species = "Cryptosporidium_parvum_Iowa_II"
etc.
from companion.
There is WEIGHT_FILE = "" in the config file. Is this field requested? if yes, how can I get the weight_file for such as Cryptosporidium_parvum_Iowa_II?
from companion.
If you don't have a kinetoplastid genome (or anything else with weird polycistronic stuff) just use the plasmodium weight file.
from companion.
@satta
Got the following error while running companion:
xin@compare-vm-1:/analysis/xin/parasite/bin$ ./nextflow run sanger-pathogens/companion -profile /analysis/xin/parasite/data/companion/CryptoDB.org/Cryptosporidium_parvum_Iowa_II.config.txt
N E X T F L O W ~ version 0.29.1
Launching `sanger-pathogens/companion` [clever_dalembert] - revision: db91c7dc11 [master]
Unknown configuration profile: '/analysis/xin/parasite/data/companion/CryptoDB.org/Cryptosporidium_parvum_Iowa_II.config.txt'
Is there any problem of the config file? /analysis/xin/parasite/data/companion/CryptoDB.org/Cryptosporidium_parvum_Iowa_II.config.txt was attache:
Cryptosporidium_parvum_Iowa_II.config.txt
from companion.
Without looking at the file, shouldn't it be -c
and not -profile
?
Example:
nextflow run -c Cryptosporidium_parvum_Iowa_II.config.txt sanger-pathogens/companion -profile docker
from companion.
Thanks! But got the following error:
.command.stub: line 45: ps: command not found
ZOE ERROR (from /usr/lib/snap/snap): error opening file (/usr/share/snap/Zoe/HMM/snap.hmm)
- ps is available in my server:
xin@compare-vm-1:~/.nextflow/assets/sanger-pathogens/companion$ which ps
/bin/ps - There is NO /usr/share/snap/ in my server, but I am not ROOT, so even I install snap, I can not put it into /usr/share/
Full output is in the following:
nextflow run sanger-pathogens/companion -c /analysis/xin/parasite/data/Cryptosporidium_parvum_Iowa_II.config.txt -profile docker
N E X T F L O W ~ version 0.30.0
Launching sanger-pathogens/companion
[loving_torricelli] - revision: db91c7d [master]
C O M P A N I O N ~ version 1.0.2
query : /analysis/xin/parasite/working_dir/assembly/scaffolds.fasta
reference : Cryptosporidium_parvum_Iowa_II
reference directory : /analysis/xin/parasite/data/companion/CryptoDB.org
WARN: Access to undefined parameter dist_dir
-- Initialise it to a default value eg. params.dist_dir = some_value
[warm up] executor > local
WARN: The operator first
is useless when applied to a value channel which returns a single value by definition -- check channel ncrna_cmindex
WARN: Access to undefined parameter TRANSCRIPT_FILE
-- Initialise it to a default value eg. params.TRANSCRIPT_FILE = some_value
[f3/bf5099] Submitted process > press_ncRNA_cms
WARN: The operator first
is useless when applied to a value channel which returns a single value by definition -- check channel pseudochr_last_index
[84/96dc71] Submitted process > truncate_input_headers
[43/f15da4] Submitted process > exonerate_empty_hints
[8f/8cf2d2] Submitted process > ratt_make_ref_embl
[29/28e63b] Submitted process > transcript_empty_hints
[6c/084117] Submitted process > pseudogene_indexing
[3f/6b4fbf] Submitted process > make_ref_input_for_orthomcl
[df/a31d6e] Submitted process > sanitize_input
[11/47014d] Submitted process > merge_hints
[9c/5c455f] Submitted process > contiguate_pseudochromosomes
[d6/8c6913] Submitted process > predict_tRNA
[26/634d2a] Submitted process > make_distribution_seqs
[59/5e1a05] Submitted process > run_augustus_contigs
[f8/310cfd] Submitted process > run_snap
WARN: Access to undefined parameter print_paths
-- Initialise it to a default value eg. params.print_paths = some_value
[33/a13d6b] Submitted process > run_ratt
ERROR ~ Error executing process > 'run_snap'
Caused by:
Process run_snap
terminated with an error exit status (2)
Command executed:
echo '##gff-version 3' > snap.gff3
snap -gff -quiet snap.hmm pseudo.pseudochr.fasta > snap.tmp
snap_gff_to_gff3.lua snap.tmp > snap.tmp.2
if [ -s 1 ]; then
gt gff3 -sort -tidy -retainids snap.tmp.2 > snap.gff3;
fi
Command exit status:
2
Command output:
(empty)
Command error:
.command.stub: line 45: ps: command not found
ZOE ERROR (from /usr/lib/snap/snap): error opening file (/usr/share/snap/Zoe/HMM/snap.hmm)
ZOE library version 2013-02-16
Work dir:
/home/xin/.nextflow/assets/sanger-pathogens/companion/work/f8/310cfde2451caac375658810e06b5f
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
-- Check '.nextflow.log' file for details
WARN: Killing pending tasks (2)
from companion.
For the following steps, which steps are requested for annotate parasite genomes: run_exonerate, run_snap, run_ratt, do_contiguation, do_circos, do_pseudo, make_embl, use_reference, fix_polycistrons, truncate_input_headers?
from companion.
That depends on what you want as output and what features you want in your annotation:
- run_exonerate: use this if you want to use reference protein alignments to aid in gene model building
- run_snap: also run SNAP to find genes (I would disable this, it might also help with your earlier error)
- run_ratt: use this if you want to try and map reference gene models directly on the new sequence
- do_contiguation: use this if you want to order your input sequence to be like the reference chromosomes -- this will assign chromosome names as in the reference and obviously requires that the reference is good anough to have chromosomes
- do_circos: generate Circos plots with reference vs. new genome matches, gene locations, etc. - see Companion paper or website for example
- do_pseudo: try to detect pseudogenes
- make_embl: create EMBL files as output (e.g. to speed up ENA submission)
- use_reference: if this is false then only ab initio detection is done
- fix_polycistrons: needed to make sure that polycistronic translation units are not broken by interspersed antisense genes (only important for kinetoplastids)
- truncate_input_headers: fixes some issues with input data, shouldn't hurt keeping this on
Please keep in mind that some of the options only work when supported by the reference (e.g. do_contiguation) or by some of the tools used in the workflow (e.g. run_snap).
from companion.
As for the ps
part: Nextflow jobs are run in a Docker container, so you don't need to have all the dependent software for Companion installed. No idea why SNAP won't run but as I stated above, just disable it for now. There are no pre-generated gene models for Cryptosporidium in the container anyway, it's optimized for kinetoplastids at the moment. Using SNAP would probably need some extra development work and preparation of the correct models.
from companion.
@satta
Many thanks for your reply. Unfortunately some other error appear:
Command error:
.command.stub: line 45: ps: command not found
warning: line 1 in file "stdin" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: line 1 in file "stdin" does not contain 9 tab (\t) separated fields
Full output is in the following:
N E X T F L O W ~ version 0.30.0
Launching sanger-pathogens/companion
[furious_khorana] - revision: db91c7d [master]
C O M P A N I O N ~ version 1.0.2
query : /analysis/xin/parasite/working_dir/assembly/scaffolds.fasta
reference : Cryptosporidium_parvum_Iowa_II
reference directory : /analysis/xin/parasite/data/companion/CryptoDB.org
WARN: Access to undefined parameter dist_dir
-- Initialise it to a default value eg. params.dist_dir = some_value
[warm up] executor > local
WARN: The operator first
is useless when applied to a value channel which returns a single value by definition -- check channel ncrna_cmindex
WARN: Access to undefined parameter TRANSCRIPT_FILE
-- Initialise it to a default value eg. params.TRANSCRIPT_FILE = some_value
[2a/16e905] Submitted process > truncate_input_headers
[6a/b6c1d4] Submitted process > press_ncRNA_cms
WARN: The into
operator should be used to connect two or more target channels -- consider to replace it with .set { integrated_gff3_processed }
WARN: The operator first
is useless when applied to a value channel which returns a single value by definition -- check channel pseudochr_last_index
WARN: The operator first
is useless when applied to a value channel which returns a single value by definition -- check channel core_comp_circos_chr
WARN: The operator first
is useless when applied to a value channel which returns a single value by definition -- check channel core_comp_circos_bin
[72/d923fb] Submitted process > exonerate_empty_hints
[b2/c73b4e] Submitted process > ratt_make_ref_embl
[d2/cc41d4] Submitted process > transcript_empty_hints
[18/0c296c] Submitted process > make_empty_snap
[fb/ecaf05] Submitted process > pseudogene_indexing
[da/bbc4fe] Submitted process > make_ref_input_for_orthomcl
[b3/b03478] Submitted process > make_empty_circos_clusters
[d6/97a69b] Submitted process > sanitize_input
[9b/3998bf] Submitted process > merge_hints
[5a/f383f6] Submitted process > contiguate_pseudochromosomes
[6b/2acfa2] Submitted process > predict_tRNA
[f9/673055] Submitted process > make_distribution_seqs
[42/f3900c] Submitted process > run_augustus_contigs
WARN: Access to undefined parameter print_paths
-- Initialise it to a default value eg. params.print_paths = some_value
[1a/c7a5d1] Submitted process > run_ratt
[e7/1cb383] Submitted process > pseudogene_last (1)
[db/23430c] Submitted process > predict_ncRNA (1)
[f9/b71129] Submitted process > run_augustus_pseudo
[7a/7c4e7f] Submitted process > pseudogene_last (2)
[41/630edd] Submitted process > pseudogene_last (3)
[79/90216d] Submitted process > predict_ncRNA (2)
[6b/6e8ed1] Submitted process > blast_for_circos
[3f/1f0634] Submitted process > ratt_to_gff3
[06/6b3916] Submitted process > merge_genemodels
[07/69c9f8] Submitted process > integrate_genemodels
[32/219bc3] Submitted process > remove_exons
[a3/ff1c95] Submitted process > pseudogene_calling (1)
[d1/4c488b] Submitted process > merge_ncrnas (1)
[25/1057d1] Submitted process > merge_structural (1)
[57/68f989] Submitted process > add_gap_features (1)
[b0/ffb5db] Submitted process > split_splice_models_at_gaps (1)
[14/7891f7] Submitted process > add_polypeptides (1)
ERROR ~ Error executing process > 'add_polypeptides (1)'
Caused by:
Process add_polypeptides (1)
terminated with an error exit status (1)
Command executed:
create_polypeptides.lua input.gff3 "CM0004(%w+)" | gt gff3 -sort -retainids -tidy > output.gff3
Command exit status:
1
Command output:
(empty)
Command error:
.command.stub: line 45: ps: command not found
warning: line 1 in file "stdin" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: line 1 in file "stdin" does not contain 9 tab (\t) separated fields
Work dir:
/home/xin/work/14/7891f764130fb085fb9af9c0251d8f
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
-- Check '.nextflow.log' file for details
from companion.
Looks like it got much further this time :)
Difficult to debug without access to the data though... can you provide the contents of your work directory so I can understand what is the matter with the GFF3? If you want you can only give me /home/xin/work/14/7891f764130fb085fb9af9c0251d8f
but please make sure all files symlinked from outside this directory are included.
from companion.
only 2 files under ~/work/14/7891f764130fb085fb9af9c0251d8f, and one of them is empty, the other file was attached:
lrwxrwxrwx 1 xin xin 64 Jun 7 19:01 input.gff3 ->
/home/xin/work/b0/ffb5db1c4a5a8d99ba967179826917/merged_out.gff3
-rw-r--r-- 1 xin xin 0 Jun 7 19:01 output.gff3
/home/xin/work/b0/ffb5db1c4a5a8d99ba967179826917/merged_out.gff3 attached:
merged_out.gff3.txt
from companion.
OK there are indeed annotations in there. I'll need to redo the part of Companion that failed for you, which will take some time as I am not working full time on Companion anymore.
from companion.
@satta
">>I'll need to redo the part of Companion that failed for you"
How is it going? We are trying to built a parasite analysis pipeline and hopefully can use companion as the tool of annotation. Thanks.
from companion.
lua5.3 ../../bin/update_references.lua
lua5.3: ../../bin/update_references.lua:20: attempt to index a nil value (global 'gt')
stack traceback:
../../bin/update_references.lua:20: in main chunk
[C]: in ?
from companion.
when I run "../../bin/update_references.lua", error outputted:
"
gt: error: could not execute script ../../bin/update_references.lua:268: bad argument #1 to 'pairs' (table expected, got nil)
"
from companion.
Related Issues (20)
- Proactive sanitization of input headers with special characters
- Option for filtering gene models with introns as pseudogenes in kinetoplastids
- Download option for table content
- ENA validation and ID assignment
- Allow optional alphanumeric random locus tags
- Add track with contig placements to Circos plots
- 'Finish line' fixes towards ENA validity
- allow pipeline to complete when no genes are annotated at all
- Make sure Docker Hub builds working images
- Stability improvements
- fix Circos drawing
- use whole genome as RATT input, not just chromosomes
- use new Docker hub container
- pseudogene and chromosome handling
- skip Pfam hits with invalid converted ranges
- Latest work
- small fixes
- Robustness improvement
- Latest work
- do not treat lowercase input sequences as repeat masked in LAST
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from companion.