The companion from sanger-pathogens

Download option for table content

It should be possible to download the content of all browseable tables in the Companion results in CSV/XLS/... format. This chould probably be easy with a DataTables plugin.

ENA style EMBL output must contain reference information

It looks like the latest version of the EMBL validator (1.1.138) is checking for the correct formatting of publication details in the header of the EMBL file now. It reports this error below otherwise.

ERROR: Submitter references are mandatory in EMBL-BANK entries line: 1 of 1.embl

Updating the publication details to look like this made it go away:

RN   [1]
RA   Pathogen Genomics;
RT   "Draft assembly annotated with Prokka";
RL   Submitted (13-Jul-2016) to the INSDC.

Companion needs to make sure that valid reference information is generated in submission compatible EMBL, inserting publication stubs if necessary.

sub-optimal performance on fungal genome

Hello,

Not sure who is monitoring these issues now @satta has left Sanger, but I wasn't sure where else to send it.

Sascha showed me how to run companion on the command line using a fungidb reference. I have attached the config I used.
crypto.config.txt
It took me some time to come around to quality check the annotation and there seems to be a bit of a problem. I ran a couple of my own samples through and there were a large number of pseudo-genes (3372 of total 5237 without exonerate, 2727 of 5514 with exonerate). These were not only just slightly miscalled as pseudo-genes, when I took the protein output of companion and blasted it vs the reference proteome, only 3515 of 5514 proteins had 60% reciprocal coverage (i.e. 60% of the query protein was covered by a hit which covered 60% of the reference protein).

As another quality check, I ran the reference genome fasta through companion, which is identical to the sequence in teh fungidb and should have a very very similar annotation. there were 2693 pseudo-genes and only 3811 of 5681 proteins had 60% reciprocal coverage vs the reference proteome.

I was just wondering if I could get some pointers as to where to start de-bugging. Perhaps in the RATT parameters, as the fact that given an identical reference genome, there are still lots of pseudo-genes called indicates the transfer is not working well?

Best,

Phil

Use preferred product name in EMBL output

Currently the ENA submission EMBL output uses the first product name encountered in the input GFF. However, if transferred from a reference there might be a preferred product name designated in the product attribute. If such a preferred product is specified, it should be used as the product used for submission.

GTF file not correct format or does not contain any exon feature

Hello,

Could you help me check why I am getting this error message?

I have verified the correct format of my GTF file and it does have exon features.

This is a sample of my gtf file:

TRINITY_GG_1_c0_g1 Trinity_gene exon 1 890 0 + . gene_id "TRINITY_GG_1_c0_g1"; transcript_id "TRINITY_GG_1_c0_g1_i1";

TRINITY_GG_1_c0_g2 Trinity_gene exon 1 680 0 + . gene_id "TRINITY_GG_1_c0_g2"; transcript_id "TRINITY_GG_1_c0_g2_i1";

TRINITY_GG_1_c0_g3 Trinity_gene exon 1 3214 0 + . gene_id "TRINITY_GG_1_c0_g3"; transcript_id "TRINITY_GG_1_c0_g3_i1";

TRINITY_GG_1_c0_g4 Trinity_gene exon 1 1847 0 + . gene_id "TRINITY_GG_1_c0_g4"; transcript_id "TRINITY_GG_1_c0_g4_i1";

TRINITY_GG_2_c0_g1 Trinity_gene exon 1 277 0 + . gene_id "TRINITY_GG_2_c0_g1"; transcript_id "TRINITY_GG_2_c0_g1_i1";

TRINITY_GG_2_c0_g2 Trinity_gene exon 1 1887 0 + . gene_id "TRINITY_GG_2_c0_g2"; transcript_id "TRINITY_GG_2_c0_g2_i1";

TRINITY_GG_3_c0_g1 Trinity_gene exon 1 307 0 + . gene_id "TRINITY_GG_3_c0_g1"; transcript_id "TRINITY_GG_3_c0_g1_i1";

TRINITY_GG_4_c0_g1 Trinity_gene exon 1 1951 0 + . gene_id "TRINITY_GG_4_c0_g1"; transcript_id "TRINITY_GG_4_c0_g1_i1";

TRINITY_GG_5_c0_g1 Trinity_gene exon 1 2366 0 + . gene_id "TRINITY_GG_5_c0_g1"; transcript_id "TRINITY_GG_5_c0_g1_i1";

TRINITY_GG_6_c0_g1 Trinity_gene exon 1 3319 0 + . gene_id "TRINITY_GG_6_c0_g1"; transcript_id "TRINITY_GG_6_c0_g1_i1";

TRINITY_GG_7_c0_g1 Trinity_gene exon 1 218 0 + . gene_id "TRINITY_GG_7_c0_g1"; transcript_id "TRINITY_GG_7_c0_g1_i1";

TRINITY_GG_7_c0_g2 Trinity_gene exon 1 1404 0 + . gene_id "TRINITY_GG_7_c0_g2"; transcript_id "TRINITY_GG_7_c0_g2_i1";

TRINITY_GG_7_c0_g3 Trinity_gene exon 1 874 0 + . gene_id "TRINITY_GG_7_c0_g3"; transcript_id "TRINITY_GG_7_c0_g3_i1";

How to generate the file ".../example-data/references/_all/all_orthomcl.out"

Would you tell me how to generate ".../example-data/references/_all/all_orthomcl.out" ?

When I run my data, error appears that it can not load "/example-data/references/_all/all_orthomcl.out". Because in my directory, there is no "_all/all_orthomcl.out".

Allow limiting the maximum length of product calls

In some circumstances, e.g. transposon pol genes, there might be many protein domain hits in one predicted gene. Currently their descriptions are all joined by '/' to produce a product description, which might become very long and hence not suitable for database submission. It should be possible to limit the maximal number of domains (or characters?) to include in the autogenerated product description.

gff files do not contain exons

As far as I know it is standard to have a feature called "exon" as the lowest level in a gtf/gff file. However, companion only outputs CDS and no exons. This leads to some problems in the downstream analysis (e.g. with STAR or zUMIs). Could you please add the option to also output exon features?

Docker build fail

Fixed docker build (builds fine in trstickland/companion on docker hub)

Missing SVG Perl module

The Circos task is returning the following error message

Error executing process > 'circos_run_chrs (1)'

Caused by:
  Missing output file(s): 'image.png' expected by process: circos_run_chrs (1)

Command executed:

  circos  -conf /opt/data/circos/circos.debian.conf -param image/file=image.png                  -param chromosomes='LmjF.01;LDON_01'                 -param chromosomes_reverse=LmjF.01

Command exit status:
  0

Command output:

  *** REQUIRED MODULE(S) MISSING OR OUT-OF-DATE ***

  You are missing one or more Perl modules, require newer versions, or some modules failed to load. Use CPAN to install it as described in this tutorial

  http://www.circos.ca/documentation/tutorials/configuration/perl_and_modules

  missing SVG
    error Can't locate SVG.pm in @INC (you may need to install the SVG module) (@INC contains: /usr/bin/lib /usr/bin/../lib /usr/bin /opt/ORTHOMCLV1.4/ /opt/RATT/ /opt/ABACAS2/ /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.20.2 /usr/local/share/perl/5.20.2 /usr/lib/x86_64-linux-gnu/perl5/5.20 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.20 /usr/share/perl/5.20 /usr/local/lib/site_perl .) at (eval 60) line 1.

Work dir:
  /Users/pditommaso/scratch/de/24295ce116a4f938ae3f764954e4bd

missing pseudochr.fasta.gz for Leishmania in Public server (is Companion still supported)

Dear Companion team,

I would like to use your public server for annotating some assemblies from leishmania that I have generated. However, it seems to me that the server does not actually contain the training sets mentioned in the webpage list. I have use a Leishmania mexicana downloaded from NCBI and the job fails, despite a lot testing of trial and error. The error obtained is the below. I have emailed the contact address given in the website but no one replies. Is it still supported? I appreciate your input.

/www/companion/annot-web/app/workers/hard_worker.rb:34:in add_result_file': required file not present: /www/companion/annot-web/public/jobs/74f8275b60bbaeba61562e29/pseudochr.fasta.gz (RuntimeError) /www/companion/annot-web/app/workers/hard_worker.rb:142:in perform'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/processor.rb:150:in execute_job' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/processor.rb:132:in block (2 levels) in process'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/chain.rb:127:in block in invoke' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-status-0.6.0/lib/sidekiq-status/server_middleware.rb:37:in call'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/server/active_record.rb:6:in call'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/server/retry_jobs.rb:74:in call'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/server/logging.rb:11:in block in call'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/logging.rb:30:in with_context' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/server/logging.rb:7:in call'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/chain.rb:129:in block in invoke' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/middleware/chain.rb:132:in invoke'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/processor.rb:127:in block in process' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/processor.rb:166:in stats'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/processor.rb:126:in process' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/processor.rb:79:in process_one'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/processor.rb:67:in run' /www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/util.rb:16:in watchdog'
/www/companion/annot-web/vendor/ruby/2.3.0/gems/sidekiq-4.0.2/lib/sidekiq/util.rb:24:in `block in safe_thread'

Many thanks,
Ivan

Bioconda

Hi,
Would you be interested in creating a Bioconda for companion?

Michal

SNAP error

Hello everyone.!!

I did not understand the error that Companion is presenting and I did not find the motivation for it. What should I do to fix it?

Error executing process > 'run_snap'

Caused by:
Process run_snap terminated with an error exit status (2)

Command executed:

echo '##gff-version 3' > snap.gff3
snap -gff -quiet snap.hmm pseudo.pseudochr.fasta > snap.tmp
snap_gff_to_gff3.lua snap.tmp > snap.tmp.2
if [ -s 1 ]; then
gt gff3 -sort -tidy -retainids snap.tmp.2 > snap.gff3;
fi

Command exit status:
2

Command output:
(empty)

Command error:
ZOE ERROR (from snap): error opening file ((null)/HMM/snap.hmm)
ZOE library version 2013-02-16

Work dir:
/home/wanessa/work/6b/94be813b401ee67df7d4ab7bbc0277

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

How to generate the file ".../example-data/references/_all/all_orthomcl.out"

Would you tell me how to generate ".../example-data/references/_all/all_orthomcl.out" ?

When I run my data, error appears that it can not load "/example-data/references/_all/all_orthomcl.out". Because in my directory, there is no "_all/all_orthomcl.out".

SNAP error

Hello everyone.!!

I did not understand the error that Companion is presenting and I did not find the motivation for it. What should I do to fix it?

Error executing process > 'run_snap'

Caused by:
Process run_snap terminated with an error exit status (2)

Command executed:

echo '##gff-version 3' > snap.gff3
snap -gff -quiet snap.hmm pseudo.pseudochr.fasta > snap.tmp
snap_gff_to_gff3.lua snap.tmp > snap.tmp.2
if [ -s 1 ]; then
gt gff3 -sort -tidy -retainids snap.tmp.2 > snap.gff3;
fi

Command exit status:
2

Command output:
(empty)

Command error:
ZOE ERROR (from snap): error opening file ((null)/HMM/snap.hmm)
ZOE library version 2013-02-16

Work dir:
/home/wanessa/work/6b/94be813b401ee67df7d4ab7bbc0277

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

update readme format

docker problems

Hello,

I found the following error when trying the example pipeline with nextflow run . -profile docker

N E X T F L O W  ~  version 19.08.1-edge
Launching `./annot.nf` [stoic_majorana] - revision: 72bf3b3c14

C O M P A N I O N  ~  version 1.0.2
query               : /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/example-data/L_donovani.1.fasta
reference           : LmjF.1
reference directory : /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/example-data/references
output directory    : /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/example-output

executor >  local (1)
[5e/668b71] process > truncate_input_headers [100%] 1 of 1, failed: 1 ✘
[-        ] process > sanitize_input         -
Error executing process > 'truncate_input_headers'

Caused by:
  Process `truncate_input_headers` terminated with an error exit status (127)

Command executed:

  truncate_header.lua < L_donovani.1.fasta > truncated.fasta

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: truncate_header.lua: command not found

Work dir:
  /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/work/5e/668b7104c74fafeb84d53a88b50cdb

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Also, I read in the nextflow doc page that the pipeline should be run with nextflow run annot.nf -with-docker 8e10a66f7203 where 8e10a66f7203 is the ID of the companion docker image. However, to test the example pipeline I have to delete most of the file because 3 arguments can't be null.

I'll appreciate you if you could corroborate or help me further to run the full pipeline.
Thanks and best regards

sudo

Hi, looks fascinating.

Nextflow installation worked fine for me without sudo rights.

However the docker pull required a "sudo docker pull xxxx", else I got an error message like this "dial unix /var/run/docker.sock: no such file or directory.
Are you trying to connect to a TLS-enabled daemon without TLS?"

Ubuntu 15.10 System.

can'r run tutorial's example "nextflow run script7.nf -with-singularity nextflow/rnaseq-nf"

Hello,

I can't run the tutorial's example: nextflow run script7.nf -with-singularity nextflow/rnaseq-nf

I get this error message:

Error executing process > 'fastqc (FASTQC on lung)'

Caused by:
  Process `fastqc (FASTQC on lung)` terminated with an error exit status (1)

Command executed:

  fastqc.sh "lung" "lung_1.fq lung_2.fq"

Command exit status:
  1

Command output:
  (empty)

Command error:
  /bin/bash: line 0: cd: /home/remi/Documents/Programming/nextflow-tutorial/work/f0/046b9bd50e095d683c155edc968eec: No such file or directory
  /bin/bash: .command.sh: No such file or directory

Work dir:
  /home/remi/Documents/Programming/nextflow-tutorial/work/f0/046b9bd50e095d683c155edc968eec

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

My nextfolw.config file is quite simple:

process.container = 'nextflow/rnaseq-nf'
docker.runOptions='-u $(id -u):$(id -g)'

The script is simple too

/*
 * pipeline input parameters
 */
params.reads = "$baseDir/data/ggal/gut_{1,2}.fq"
params.transcript = "$baseDir/data/ggal/transcriptome.fa"
params.multiqc = "$baseDir/multiqc"
params.outdir = "results"

log.info """\
         R N A S E Q - N F   P I P E L I N E
         ===================================
         transcriptome: ${params.transcript}
         reads        : ${params.reads}
         outdir       : ${params.outdir}
         """
         .stripIndent()


/*
 * define the `index` process that create a binary index
 * given the transcriptome file
 */
process index {

    input:
    path transcriptome from params.transcript

    output:
    path 'index' into index_ch

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}


Channel
    .fromFilePairs( params.reads, checkIfExists:true )
    .into { read_pairs_ch; read_pairs2_ch }

/*
 * Run Salmon to perform the quantification of expression using
 * the index and the matched read files
 */
process quantification {

    input:
    path index from index_ch
    tuple val(pair_id), path(reads) from read_pairs_ch

    output:
    path(pair_id) into quant_ch

    script:
    """
    salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
    """
}

/*
 * Run fastQC to check quality of reads files
 */
process fastqc {
    tag "FASTQC on $sample_id"
    publishDir params.outdir, mode:'copy'

    input:
    tuple val(sample_id), path(reads) from read_pairs2_ch

    output:
    path("fastqc_${sample_id}_logs") into fastqc_ch

    script:
    """
    fastqc.sh "$sample_id" "$reads"
    """
}

/*
 * Create a report using multiQC for the quantification
 * and fastqc processes
 */
process multiqc {
    publishDir params.outdir, mode:'copy'

    input:
    path('*') from quant_ch.mix(fastqc_ch).collect()

    output:
    path('multiqc_report.html')

    script:
    """
    multiqc .
    """
}

workflow.onComplete {
	log.info ( workflow.success ? "\nDone! Open the following report in your browser --> $params.outdir/multiqc_report.html\n" : "Oops .. something went wrong" )
}

Note that no script of the tutorials seems to work (scrip6.nf, script5.nf...).

However, execution with the docker options, sudo nextflow run script6.nf -resume -with-docker --reads 'data/ggal/*_{1,2}.fq', works.

I am new to all these technologies. Does someone know what the problem is ? Can it be a problem in the installation or configuration of singularity ?

Doc rewrite

Documentation tweak (README file), could you review see if it makes sense before merging, please?

Extremely long runtime due to OrthoMCL

I am running companion locally on a distributed cluster environment. While most steps can be easily parallelized and are completed within a few hours, the step nf-run_orthomcl runs for about 3 days on rather small (30MB) genome. Is this run time expected? I have been looking through the code, but it seems the tool is inherently unable to even use multiple cores.

Running companion through the web interface however does not nearly take as long, so I am wondering whether it can be optimized after all. I am using a custom genome locally, which is not available on the web version.

Thanks for your contribution to the field!

Bioconda

Hi,
Would you be interested in creating a Bioconda for companion?

Michal

error from update_references.lua

@satta
While running /home/xin/.nextflow/assets/sanger-pathogens/companion/bin/update_references.lua
I continuously got the following error message: "tool './bin/update_references.lua' not found; option -help lists possible tools" . What is the problem?

Allow optional alphanumeric random locus tags

For some use cases (i.e. genomes with many expected changes), it might be useful to assign non-sequential, randomly chosen alphanumeric locus tags, combined with a fixed prefix. Example:

PKB237_CG98
PKB237_S141
PKB237_DMSL
PKB237_020C

etc.
It would make sense to allow multiple locus tag schemes in the pipeline, either through plugin code or through a config file switch.

First attempt to run test job fails

I'm going to try looking into this, but in the meantime though I'd report it. Here's the output from my test run after following the documentation.

$ nextflow run companion -profile docker
N E X T F L O W ~ version 19.10.0
Launching companion/annot.nf [prickly_mcclintock] - revision: 63734f6d0c

C O M P A N I O N ~ version 1.0.2
query : /opt/companion/example-data/L_donovani.1.fasta
reference : LmjF.1
reference directory : /opt/companion/example-data/references
output directory : /opt/companion/example-output

[- ] process > truncate_input_headers -
[- ] process > truncate_input_headers -
[- ] process > sanitize_input -
executor > local (2)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [ 0%] 0 of 1
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[- ] process > exonerate_empty_hints -
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
executor > local (2)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
executor > local (3)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [ 0%] 0 of 1
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[e2/7fdb92] process > exonerate_empty_hints [ 0%] 0 of 1
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
[- ] process > run_augustus_pseudo -
[- ] process > run_augustus_contigs -
[- ] process > run_snap -
[- ] process > merge_genemodels -
[- ] process > integrate_genemodels -
[- ] process > fix_polycistrons -
[- ] process > remove_exons -
[- ] process > pseudogene_indexing -
[- ] process > pseudogene_last -
[- ] process > pseudogene_calling -
[- ] process > merge_structural -
[- ] process > add_gap_features -
executor > local (3)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [100%] 1 of 1, failed: 1 ✘
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[e2/7fdb92] process > exonerate_empty_hints [ 0%] 0 of 1
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
[- ] process > run_augustus_pseudo -
[- ] process > run_augustus_contigs -
[- ] process > run_snap -
[- ] process > merge_genemodels -
[- ] process > integrate_genemodels -
[- ] process > fix_polycistrons -
[- ] process > remove_exons -
[- ] process > pseudogene_indexing -
[- ] process > pseudogene_last -
[- ] process > pseudogene_calling -
[- ] process > merge_structural -
[- ] process > add_gap_features -
executor > local (3)
[d4/a2f1c2] process > truncate_input_headers [100%] 1 of 1 ✔
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [100%] 1 of 1, failed: 1 ✘
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[e2/7fdb92] process > exonerate_empty_hints [100%] 1 of 1, failed: 1
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
[- ] process > run_augustus_pseudo -
[- ] process > run_augustus_contigs -
[- ] process > run_snap -
[- ] process > merge_genemodels -
[- ] process > integrate_genemodels -
[- ] process > fix_polycistrons -
[- ] process > remove_exons -
[- ] process > pseudogene_indexing -
[- ] process > pseudogene_last -
[- ] process > pseudogene_calling -
[- ] process > merge_structural -
[- ] process > add_gap_features -
[- ] process > split_splice_models_at_gaps -
[- ] process > add_polypeptides -
[- ] process > get_proteins_for_orthomcl -
[- ] process > make_ref_input_for_orthomcl -
[- ] process > make_target_input_for_orthomcl -
[- ] process > blast_for_orthomcl_formatdb -
[- ] process > blast_for_orthomcl -
[- ] process > run_orthomcl -
[- ] process > annotate_orthologs -
[- ] process > run_pfam -
[- ] process > pfam_to_gff3 -
[- ] process > annotate_pfam -
[- ] process > make_distribution_gff -
[- ] process > make_distribution_gaf -
[- ] process > make_distribution_seqs -
[- ] process > make_genome_stats -
[- ] process > reference_compare -
[- ] process > make_tree -
[- ] process > blast_for_circos -
[- ] process > make_circos_inputs -
[- ] process > circos_run_chrs -
[- ] process > circos_run_bin -
[- ] process > merge_gff3_for_gff3toembl -
[- ] process > make_embl -
[- ] process > make_report -
[- ] process > make_genelist -
[- ] process > add_products_to_protein_fasta -
WARN: The operator first is useless when applied to a value channel which returns a single value by definition -- check channel ncrna_cmindex
WARN: Access to undefined parameter TRANSCRIPT_FILE -- Initialise it to a default value eg. params.TRANSCRIPT_FILE = some_value
WARN: The operator first is useless when applied to a value channel which returns a single value by definition -- check channel pseudochr_last_index
WARN: Killing pending tasks (1)
Error executing process > 'press_ncRNA_cms'

Caused by:
Process press_ncRNA_cms terminated with an error exit status (1)

Command executed:

cp /opt/data/cm/rnas.cm ./models.cm
cmpress -F models.cm

Command exit status:
1

Command output:
(empty)

Command error:
cp: cannot stat '/opt/data/cm/rnas.cm': No such file or directory

Work dir:
/opt/work/4e/8398aa9b63844d3e9e2108bcaf01e0

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

travis CI now pulls rather than builds the docker image

First attempt to run test job fails

I'm going to try looking into this, but in the meantime though I'd report it. Here's the output from my test run after following the documentation.

$ nextflow run companion -profile docker
N E X T F L O W ~ version 19.10.0
Launching companion/annot.nf [prickly_mcclintock] - revision: 63734f6d0c

C O M P A N I O N ~ version 1.0.2
query : /opt/companion/example-data/L_donovani.1.fasta
reference : LmjF.1
reference directory : /opt/companion/example-data/references
output directory : /opt/companion/example-output

[- ] process > truncate_input_headers -
[- ] process > truncate_input_headers -
[- ] process > sanitize_input -
executor > local (2)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [ 0%] 0 of 1
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[- ] process > exonerate_empty_hints -
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
executor > local (2)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
executor > local (3)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [ 0%] 0 of 1
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[e2/7fdb92] process > exonerate_empty_hints [ 0%] 0 of 1
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
[- ] process > run_augustus_pseudo -
[- ] process > run_augustus_contigs -
[- ] process > run_snap -
[- ] process > merge_genemodels -
[- ] process > integrate_genemodels -
[- ] process > fix_polycistrons -
[- ] process > remove_exons -
[- ] process > pseudogene_indexing -
[- ] process > pseudogene_last -
[- ] process > pseudogene_calling -
[- ] process > merge_structural -
[- ] process > add_gap_features -
executor > local (3)
[d4/a2f1c2] process > truncate_input_headers [ 0%] 0 of 1
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [100%] 1 of 1, failed: 1 ✘
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[e2/7fdb92] process > exonerate_empty_hints [ 0%] 0 of 1
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
[- ] process > run_augustus_pseudo -
[- ] process > run_augustus_contigs -
[- ] process > run_snap -
[- ] process > merge_genemodels -
[- ] process > integrate_genemodels -
[- ] process > fix_polycistrons -
[- ] process > remove_exons -
[- ] process > pseudogene_indexing -
[- ] process > pseudogene_last -
[- ] process > pseudogene_calling -
[- ] process > merge_structural -
[- ] process > add_gap_features -
executor > local (3)
[d4/a2f1c2] process > truncate_input_headers [100%] 1 of 1 ✔
[- ] process > sanitize_input -
[- ] process > contiguate_pseudochromosomes -
[- ] process > predict_tRNA -
[4e/8398aa] process > press_ncRNA_cms [100%] 1 of 1, failed: 1 ✘
[- ] process > predict_ncRNA -
[- ] process > merge_ncrnas -
[e2/7fdb92] process > exonerate_empty_hints [100%] 1 of 1, failed: 1
[- ] process > ratt_make_ref_embl -
[- ] process > run_ratt -
[- ] process > ratt_to_gff3 -
[- ] process > transcript_empty_hints -
[- ] process > merge_hints -
[- ] process > run_augustus_pseudo -
[- ] process > run_augustus_contigs -
[- ] process > run_snap -
[- ] process > merge_genemodels -
[- ] process > integrate_genemodels -
[- ] process > fix_polycistrons -
[- ] process > remove_exons -
[- ] process > pseudogene_indexing -
[- ] process > pseudogene_last -
[- ] process > pseudogene_calling -
[- ] process > merge_structural -
[- ] process > add_gap_features -
[- ] process > split_splice_models_at_gaps -
[- ] process > add_polypeptides -
[- ] process > get_proteins_for_orthomcl -
[- ] process > make_ref_input_for_orthomcl -
[- ] process > make_target_input_for_orthomcl -
[- ] process > blast_for_orthomcl_formatdb -
[- ] process > blast_for_orthomcl -
[- ] process > run_orthomcl -
[- ] process > annotate_orthologs -
[- ] process > run_pfam -
[- ] process > pfam_to_gff3 -
[- ] process > annotate_pfam -
[- ] process > make_distribution_gff -
[- ] process > make_distribution_gaf -
[- ] process > make_distribution_seqs -
[- ] process > make_genome_stats -
[- ] process > reference_compare -
[- ] process > make_tree -
[- ] process > blast_for_circos -
[- ] process > make_circos_inputs -
[- ] process > circos_run_chrs -
[- ] process > circos_run_bin -
[- ] process > merge_gff3_for_gff3toembl -
[- ] process > make_embl -
[- ] process > make_report -
[- ] process > make_genelist -
[- ] process > add_products_to_protein_fasta -
WARN: The operator first is useless when applied to a value channel which returns a single value by definition -- check channel ncrna_cmindex
WARN: Access to undefined parameter TRANSCRIPT_FILE -- Initialise it to a default value eg. params.TRANSCRIPT_FILE = some_value
WARN: The operator first is useless when applied to a value channel which returns a single value by definition -- check channel pseudochr_last_index
WARN: Killing pending tasks (1)
Error executing process > 'press_ncRNA_cms'

Caused by:
Process press_ncRNA_cms terminated with an error exit status (1)

Command executed:

cp /opt/data/cm/rnas.cm ./models.cm
cmpress -F models.cm

Command exit status:
1

Command output:
(empty)

Command error:
cp: cannot stat '/opt/data/cm/rnas.cm': No such file or directory

Work dir:
/opt/work/4e/8398aa9b63844d3e9e2108bcaf01e0

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Trstickland patch sangerpathogens pull

travis build failing to build docker image, OrthoMCL download times out though build OK locally; so switch to pulling docker image from dockerhub (and hope the build there doesn't break...(

Added changelog

missing main.nf

curl -L -o companion-master.zip https://github.com/sanger-pathogens/companion/archive/master.zip &&
unzip companion-master.zip &&
mv companion
docker pull sangerpathogens/companion

and this is my error message

Mar-26 09:24:44.142 [main] DEBUG nextflow.scm.AssetManager - Project manifest does not exist: ProjectLeish/nextflow.config (No such file or directory)

travis build fixed

The travis build now works OK; it uses the docker image for the dependencies, and runs the tests within the container. There's also a bug fix (augustus paths) in the Dockerfile, though the tests pass fine with the old Dockerfile too.

gff files do not contain exons

As far as I know it is standard to have a feature called "exon" as the lowest level in a gtf/gff file. However, companion only outputs CDS and no exons. This leads to some problems in the downstream analysis (e.g. with STAR or zUMIs). Could you please add the option to also output exon features?

No such file or directory for .fasta (when using Singularity)

I am trying to run Companion using Singularity instead of docker, but it seems that the paths are not mounted correctly.

This is my config:

env {
    GT_RETAINIDS = "yes"
    AUGUSTUS_CONFIG_PATH = "/opt/data/augustus"
    FILTER_SHORT_PARTIALS_RULE = "/opt/data/filters/filter_short_partials.lua"
    PFAM = "/opt/pfam/Pfam-A.hmm"
    PFAM2GO = "/opt/data/pfam2go/pfam2go.txt"
    RATT_CONFIG = "/opt/RATT/RATT.config_euk_NoPseudo_SpliceSite"
}

params.GO_OBO = "/opt/go.obo"
params.NCRNA_MODELS = "/opt/data/cm/rnas.cm"
params.CIRCOS_CONFIG_FILE = "/opt/data/circos/circos.debian.conf"
params.CIRCOS_BIN_CONFIG_FILE = "/opt/data/circos/circos.bin.debian.conf"
params.SPECFILE = "/opt/data/speck/output_check.lua"
params.AUGUSTUS_EXTRINSIC_CFG = "/opt/data/augustus/extrinsic.cfg"

process {
    container = 'sangerpathogens/companion:latest'
}

singularity {
    enabled = true
}

executor {
    name = 'local'
    queueSize = 2
    pollInterval = '3sec'
}

I can start the pipeline, but I receive the following error:

Error executing process > 'truncate_input_headers'

Caused by:
  Process `truncate_input_headers` terminated with an error exit status (1)

Command executed:

  truncate_header.lua < phased.1_scaffolds_FINAL.fasta > truncated.fasta

Command exit status:
  1

Command output:
  (empty)

Command error:
  .command.sh: line 2: phased.1_scaffolds_FINAL.fasta: No such file or directory

Work dir:
  /work/project/ladsie_002/work/77/1854982bdacdd60fbe447554ab153b

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

However, when I look into the path, the file does indeed exist:

$ ll /work/project/ladsie_002/work/77/1854982bdacdd60fbe447554ab153b
total 1
lrwxrwxrwx 1 bbrink users 76 30. Aug 13:44 phased.1_scaffolds_FINAL.fasta -> /work/project/ladsie_002/companion/input/pleo/phased.1_scaffolds_FINAL.fasta

I tried to find a solution for this, but I was unable to do so. Do you have any insights for me?

Extremely long runtime due to OrthoMCL

I am running companion locally on a distributed cluster environment. While most steps can be easily parallelized and are completed within a few hours, the step nf-run_orthomcl runs for about 3 days on rather small (30MB) genome. Is this run time expected? I have been looking through the code, but it seems the tool is inherently unable to even use multiple cores.

Running companion through the web interface however does not nearly take as long, so I am wondering whether it can be optimized after all. I am using a custom genome locally, which is not available on the web version.

Thanks for your contribution to the field!

docker problems

Hello,

I found the following error when trying the example pipeline with nextflow run . -profile docker

N E X T F L O W  ~  version 19.08.1-edge
Launching `./annot.nf` [stoic_majorana] - revision: 72bf3b3c14

C O M P A N I O N  ~  version 1.0.2
query               : /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/example-data/L_donovani.1.fasta
reference           : LmjF.1
reference directory : /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/example-data/references
output directory    : /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/example-output

executor >  local (1)
[5e/668b71] process > truncate_input_headers [100%] 1 of 1, failed: 1 ✘
[-        ] process > sanitize_input         -
Error executing process > 'truncate_input_headers'

Caused by:
  Process `truncate_input_headers` terminated with an error exit status (127)

Command executed:

  truncate_header.lua < L_donovani.1.fasta > truncated.fasta

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: truncate_header.lua: command not found

Work dir:
  /shared/D2/opt/repositories/git-reps/companion.sanger-pathogens/work/5e/668b7104c74fafeb84d53a88b50cdb

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Also, I read in the nextflow doc page that the pipeline should be run with nextflow run annot.nf -with-docker 8e10a66f7203 where 8e10a66f7203 is the ID of the companion docker image. However, to test the example pipeline I have to delete most of the file because 3 arguments can't be null.

I'll appreciate you if you could corroborate or help me further to run the full pipeline.
Thanks and best regards

Add track with contig placements to Circos plots

It could be useful to a user to see what contigs map to a target pseudochromosome. It should be evaluated whether Circos allows to label such regions with contig IDs in a separate track.

Option for filtering gene models with introns as pseudogenes in kinetoplastids

Genes with introns are currently allowed for kinetoplastids, but only a couple of genes per species have introns. The rest of predicted models with introns could be pseudogenes or frameshifted models due to errors in sequencing (typical for PacBio assemblies). It would be good to offer the option of filtering models with introns as pseudogenes.

snap wasn't in PATH; symlink from /usr/local/bin

missing main.nf

curl -L -o companion-master.zip https://github.com/sanger-pathogens/companion/archive/master.zip &&
unzip companion-master.zip &&
mv companion
docker pull sangerpathogens/companion

and this is my error message

Mar-26 09:24:44.142 [main] DEBUG nextflow.scm.AssetManager - Project manifest does not exist: ProjectLeish/nextflow.config (No such file or directory)

No such file or directory for .fasta (when using Singularity)

I am trying to run Companion using Singularity instead of docker, but it seems that the paths are not mounted correctly.

This is my config:

env {
    GT_RETAINIDS = "yes"
    AUGUSTUS_CONFIG_PATH = "/opt/data/augustus"
    FILTER_SHORT_PARTIALS_RULE = "/opt/data/filters/filter_short_partials.lua"
    PFAM = "/opt/pfam/Pfam-A.hmm"
    PFAM2GO = "/opt/data/pfam2go/pfam2go.txt"
    RATT_CONFIG = "/opt/RATT/RATT.config_euk_NoPseudo_SpliceSite"
}

params.GO_OBO = "/opt/go.obo"
params.NCRNA_MODELS = "/opt/data/cm/rnas.cm"
params.CIRCOS_CONFIG_FILE = "/opt/data/circos/circos.debian.conf"
params.CIRCOS_BIN_CONFIG_FILE = "/opt/data/circos/circos.bin.debian.conf"
params.SPECFILE = "/opt/data/speck/output_check.lua"
params.AUGUSTUS_EXTRINSIC_CFG = "/opt/data/augustus/extrinsic.cfg"

process {
    container = 'sangerpathogens/companion:latest'
}

singularity {
    enabled = true
}

executor {
    name = 'local'
    queueSize = 2
    pollInterval = '3sec'
}

I can start the pipeline, but I receive the following error:

Error executing process > 'truncate_input_headers'

Caused by:
  Process `truncate_input_headers` terminated with an error exit status (1)

Command executed:

  truncate_header.lua < phased.1_scaffolds_FINAL.fasta > truncated.fasta

Command exit status:
  1

Command output:
  (empty)

Command error:
  .command.sh: line 2: phased.1_scaffolds_FINAL.fasta: No such file or directory

Work dir:
  /work/project/ladsie_002/work/77/1854982bdacdd60fbe447554ab153b

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

However, when I look into the path, the file does indeed exist:

$ ll /work/project/ladsie_002/work/77/1854982bdacdd60fbe447554ab153b
total 1
lrwxrwxrwx 1 bbrink users 76 30. Aug 13:44 phased.1_scaffolds_FINAL.fasta -> /work/project/ladsie_002/companion/input/pleo/phased.1_scaffolds_FINAL.fasta

I tried to find a solution for this, but I was unable to do so. Do you have any insights for me?

adding JBrowse output

Hi,
any plans to add JBrowse out?

Thank you in advance.

Mic

Final (?) documentation rewrite