sanger-pathogens / roary Goto Github PK
View Code? Open in Web Editor NEWRapid large-scale prokaryote pan genome analysis
Home Page: http://sanger-pathogens.github.io/Roary
License: Other
Rapid large-scale prokaryote pan genome analysis
Home Page: http://sanger-pathogens.github.io/Roary
License: Other
Is is missing FastTree or fastml or exonerate or R or kraken?
2015/11/17 11:54:58 Looking for 'awk' - found /usr/bin/awk
2015/11/17 11:54:58 Looking for 'bedtools' - found /bio/linuxbrew/bin/bedtools
2015/11/17 11:54:58 Determined bedtools version is 2.24
2015/11/17 11:54:58 Looking for 'blastp' - found /bio/linuxbrew/bin/blastp
2015/11/17 11:54:58 Determined blastp version is 2.2.31
2015/11/17 11:54:58 Looking for 'cd-hit' - found /bio/linuxbrew/bin/cd-hit
2015/11/17 11:54:58 Determined cd-hit version is 4.6
2015/11/17 11:54:58 Optional tool 'cdhit' not found in your $PATH
2015/11/17 11:54:58 Looking for 'grep' - found /usr/bin/grep
2015/11/17 11:54:58 Looking for 'mafft' - found /bio/linuxbrew/bin/mafft
2015/11/17 11:54:58 Determined mafft version is 7.221
2015/11/17 11:54:58 Looking for 'makeblastdb' - found /bio/linuxbrew/bin/makeblastdb
2015/11/17 11:54:58 Determined makeblastdb version is 2.2.31
2015/11/17 11:54:58 Looking for 'mcl' - found /bio/linuxbrew/bin/mcl
2015/11/17 11:54:58 Determined mcl version is 14-137
2015/11/17 11:54:58 Looking for 'parallel' - found /bio/linuxbrew/bin/parallel
2015/11/17 11:54:58 Determined parallel version is 20150922
2015/11/17 11:54:59 Looking for 'prank' - found /bio/linuxbrew/bin/prank
2015/11/17 11:55:00 Determined prank version is 140603
2015/11/17 11:55:00 Looking for 'sed' - found /bio/linuxbrew/bin/sed
Roary version 3.5.1
Hi,
on our cluster setup we are using perl version 5.10.1, which gives the following errors when running
roary
or
perl ~/PATH_TO_ROARY/roary
Type of arg 1 to values must be hash (not hash element) at [...]/perl5/lib/perl5/Bio/Roary/AnnotateGroups.pm line 106, near "} )"
BEGIN not safe after errors--compilation aborted at [...]/perl5/lib/perl5/Bio/Roary/AnnotateGroups.pm line 312.
Compilation failed in require at [...]/perl5/lib/perl5/Bio/Roary.pm line 15.
BEGIN failed--compilation aborted at [...]/perl5/lib/perl5/Bio/Roary.pm line 15.
Compilation failed in require at [...]/perl5/lib/perl5/Bio/Roary/CommandLine/Roary.pm line 8.
BEGIN failed--compilation aborted at [...]/perl5/lib/perl5/Bio/Roary/CommandLine/Roary.pm line 8.
Compilation failed in require at [...]/perl5/bin/roary line 12.
BEGIN failed--compilation aborted at [...]/perl5/bin/roary line 12.
Thanks a lot,
Marco
It would be good if the -a option checked the deps and then continued on and did the clustering IF there were GFF parameters etc.
Hi!
I have downloaded roary 3.2.7 using homebrew on my Mac OSX Yosemite. It seemed to have installed properly but when I run there seems to be some problem with an intermediate file not being created/found. I ran it in verbose mode to see if I could get more clues but cannot see what is the root of the problem. Do you know what could be the problem? I append the command line output below.
Best wishes
Kaisa
roary -e -v *.gff
Please cite Roary if you use any of the results it produces:
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill (2015), "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics,
doi: http://doi.org/10.1093/bioinformatics/btv421
2015/09/23 11:26:57 Fixing input GFF files
2015/09/23 11:27:03 Extracting proteins from GFF files
Extracting proteins from 513A.gff
Extracting proteins from PC2022III.gff
Extracting proteins from PC2777IV.gff
Extracting proteins from PC3053II.gff
Extracting proteins from PC3517II.gff
Extracting proteins from PC3714II.gff
Extracting proteins from PC390II.gff
Extracting proteins from PC3939II.gff
Extracting proteins from PC3997IV.gff
Extracting proteins from PC4226IV.gff
Extracting proteins from PC4580III.gff
Extracting proteins from PC4597II.gff
Extracting proteins from PC5099IV.gff
Extracting proteins from PC5538III.gff
Extracting proteins from PC5587platt.gff
Extracting proteins from PC5587u.gff
Extracting proteins from W1090330.gff
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
Cluster with MCL
2015/09/23 11:51:37 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr --output_multifasta_files -i _gff_files -f _fasta_files -t 11 --dont_create_rplots -v -j Local --processors 1 --group_limit 50000 -cd 99
2015/09/23 11:51:37 Reinflate clusters
Cant open file: _uninflated_mcl_groups
KaisaTiMac:kaisa$ ls
513A.gff PC3714II.gff.proteome.faa PC4597II.gff W1090330.gff.proteome.faa
513A.gff.proteome.faa PC390II.gff PC4597II.gff.proteome.faa _clustered
PC2022III.gff PC390II.gff.proteome.faa PC5099IV.gff _clustered.clstr
PC2022III.gff.proteome.faa PC3939II.gff PC5099IV.gff.proteome.faa _combined_files
PC2777IV.gff PC3939II.gff.proteome.faa PC5538III.gff _combined_files.groups
PC2777IV.gff.proteome.faa PC3997IV.gff PC5538III.gff.proteome.faa _fasta_files
PC3053II.gff PC3997IV.gff.proteome.faa PC5587platt.gff _gff_files
PC3053II.gff.proteome.faa PC4226IV.gff PC5587platt.gff.proteome.faa blast_identity_frequency.Rtab
PC3517II.gff PC4226IV.gff.proteome.faa PC5587u.gff
PC3517II.gff.proteome.faa PC4580III.gff PC5587u.gff.proteome.faa
PC3714II.gff PC4580III.gff.proteome.faa W1090330.gff
Some idea of average or min/max length of proteins in each cluster would be helpful
Even perhaps %id (amino?) level.
Torst
Very cool tool!
I notice that after the alignment of the core genome is complete the content of the output directory 'pan_genome_sequences' containing the complete set of pan genome .fa.aln files gets deleted.
I could put these individual .fa.aln files to good use, is it possible to stop this occuring?
Cheers
Dan
Hi. Great software. Unfortunately, when running on the server with no X11, obtaining the PNG plots runs into the following problem:
Error in .External2(C_X11, paste("png::", filename, sep = ""), g$width, :
unable to start device PNG
In addition: Warning message:
In png("test.png") : unable to open connection to X11 display ''
The same does not occur if the plot is a PDF or bitmap:
bitmap(filename,"png16m")
This would still generate a readable PNG.
Or:
pdf(filename)
Thank you.
Anders.
Mafft is installed and in PATH.
I deleted the original bin/roary from previous install too.
It looks like the FASTA file comparison fails - lower case vs uppercase?
t/Bio/Roary/CommandLine/QueryRoary.t .................... ok
t/Bio/Roary/CommandLine/Roary.t ......................... 46/? Unknown option: mafft
Unknown option: mafft
Unknown option: mafft
Unknown option: mafft
t/Bio/Roary/CommandLine/Roary.t ......................... 48/?
# Failed test 'Actual and expected output match for '-j Local --dont_delete_files --dont_split_groups --output_multifasta_files --mafft --dont_delete_files t/data/real_data_1.gff t/data/real_data_2.gff''
# at t/lib/TestHelper.pm line 75.
# +---+--------------------------------------------------------------+--------------------------------------------------------------+
# | |Got |Expected |
# | Ln| | |
# +---+--------------------------------------------------------------+--------------------------------------------------------------+
# | 1|>11111_1#11_04119 |>11111_1#11_04119
# * 2|ATGAATAAAACAACTGAGTATATTGACGCACTGCTGCTTTCTGAACGTGAGAAAGCGGCA |atgaataaaacaactgagtatattgacgcactgctgctttctgaacgtgagaaagcggca *```
Test Summary Report
-------------------
t/Bio/Roary/CommandLine/Roary.t (Wstat: 256 Tests: 50 Failed: 1)
Failed test: 49
Non-zero exit status: 1
Files=48, Tests=704, 140 wallclock secs ( 0.17 usr 0.05 sys + 73.77 cusr 26.66 csys = 100.65 CPU)
Result: FAIL
Failed 1/48 test programs. 1/704 subtests failed.
make: *** [test_dynamic] Error 255
AJPAGE/Bio-Roary-3.2.4.tar.gz
/bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
reports AJPAGE/Bio-Roary-3.2.4.tar.gz
Stopping: 'install' failed for 'Bio::Roary'.
Failed during this command:
AJPAGE/Bio-Roary-3.2.4.tar.gz : make_test NO
[/bio/linuxbrew/bin/mcxdeblast] all secondary elements were also seen as primary elements (check ok)
cannot remove directory for split_groups: Directory not empty at /bio/perl5/lib/perl5/Bio/Roary/SplitGroups.pm line 167.
The files generated by the "query_pan_genome -a difference --input_set_one 1.gff,2.gff --input_set_two 3.gff,4.gff,5.gff -g clustered_proteins" (set_difference_unique_set_one/two_statistics.csv) do not have the information in the annotation column, which the gene_presence_absence.csv does have. It would be helpful if they do, is that possible?
Installation using -
Installation - With bundled binaries instructions.
You list -
cpanm Array::Utils BioPerl Exception::Class File::Find::Rule File::Grep File::Slurp::Tiny Graph Moose Moose::Role Text::CSV
On a fresh install of Ubuntu 14.04 you also need -
Log::Log4perl
File::Which
MAFFT not here:
ok(scalar PATH->Whence($_), "$_ in PATH") for qw(blastp makeblastdb mcl mcxdeblast bedtools prank parallel);
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.gff.gz
Would it be possible to get Roary to check for the installation of it's non perl dependencies, like MAFFT, exonerate etc and report which ones are missing?
I note in the manual you use "cpan -f" to force install.
I didn't use it - and this happened:
Do I need to force?
AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz
/bin/make -- OK
Running make test
PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/Bio/*.t t/Bio/RetrieveAssemblies/*.t
t/Bio/RetrieveAssemblies.t ................ 1/? Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 2/?
# Failed test 'Expected file for command -q Mycobacterium -a -f gff PRJEB8877 exists: downloaded_files/CVMX01.1.gbff.gz.gff'
# at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 3/?
# Failed test 'Expected file for command -q Mycobacterium -f fasta PRJEB8877 exists: downloaded_files/CVMX01.1.fsa_nt.gz'
# at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 4/?
# Failed test 'Expected file for command -q Mycobacterium -f gff PRJEB8877 exists: downloaded_files/CVMX01.1.gbff.gz.gff'
# at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 5/?
# Failed test 'Expected file for command -q Mycobacterium -o my_dir PRJEB8877 exists: my_dir/CVMX01.1.gbff.gz'
# at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 6/?
# Failed test 'Expected file for command -q Mycobacterium PRJEB8877 exists: downloaded_files/CVMX01.1.gbff.gz'
# at t/lib/TestHelper.pm line 31.
# Looks like you failed 5 tests of 6.
t/Bio/RetrieveAssemblies.t ................ Dubious, test returned 5 (wstat 1280, 0x500)
Failed 5/6 subtests
t/Bio/RetrieveAssemblies/AccessionFile.t .. ok
t/Bio/RetrieveAssemblies/RefWeak.t ........ ok
t/Bio/RetrieveAssemblies/WGS.t ............ ok
t/requires_external.t ..................... ok
Test Summary Report
-------------------
t/Bio/RetrieveAssemblies.t (Wstat: 1280 Tests: 6 Failed: 5)
Failed tests: 2-6
Non-zero exit status: 5
Files=5, Tests=33, 95 wallclock secs ( 0.02 usr 0.01 sys + 1.40 cusr 0.20 csys = 1.63 CPU)
Result: FAIL
Failed 1/5 test programs. 5/33 subtests failed.
make: *** [test_dynamic] Error 255
AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz
/bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
reports AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz
Stopping: 'install' failed for 'A/AJ/AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz'.
This is not an issue, just a request for a future release. It would be great if there was an additional output embl that mapped the core genes to the core_gene_alignment.aln file. Alternatively, this could be output in the gene_presence_absence.csv (e.g. core gene = yes/no, position in aln = 1..1469).
Hi there,
When starting up, Roaring does not check whether tools it needs are in its path, leading to incomprehensible (for users) error reports. So, instead of somthing like "Could not run 'bedtools', is it installed? Is it in your $PATH?" the user gets (e.g).:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file '/scratch2/tmp/homebachtmp/tmp/compgenomes/tm9omL9Afb/bpg16.gff.proteome.faa.intermediate.extracted.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:449
STACK: Bio::Root::IO::_initialize_io /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:270
STACK: Bio::SeqIO::_initialize /usr/local/share/perl/5.10.0/Bio/SeqIO.pm:499
STACK: Bio::SeqIO::fasta::_initialize /usr/local/share/perl/5.10.0/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.0/Bio/SeqIO.pm:375
STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.0/Bio/SeqIO.pm:421
STACK: Bio::Roary::ExtractProteomeFromGFF::_fastatranslate /opt/biosw/roary/lib/Bio/Roary/ExtractProteomeFromGFF.pm:135
STACK: Bio::Roary::ExtractProteomeFromGFF::_convert_nucleotide_to_protein /opt/biosw/roary/lib/Bio/Roary/ExtractProteomeFromGFF.pm:149
STACK: Bio::Roary::ExtractProteomeFromGFF::fasta_file /opt/biosw/roary/lib/Bio/Roary/ExtractProteomeFromGFF.pm:40
STACK: Bio::Roary::CommandLine::ExtractProteomeFromGff::run /opt/biosw/roary/lib/Bio/Roary/CommandLine/ExtractProteomeFromGff.pm:95
STACK: /opt/biosw/bin/extract_proteome_from_gff:19
prokka (from tseeman) has a pretty flexible runtime tool checker which could be adapted for Roary in no time.
I just asked 1 hour agon on Twitter about a tool to find presence/absence of genes in bacterial genomes and got a pointer to Roary. What I see on GitHub looks fantastic both in terms of presentation and code.
I have a couple of bugs / observations for which I will open separate tickets, please excuse that spamming but I think Roary is simply too good to not make it even better.
This is occuring on a fairly old machine (Kubuntu 9.10) and I need to install a lot by hand to keep it running with current software. That might explain a couple of oddities I report which you would not expect on newer distributions or when installed vie simple apt-get, homebrew or similar. Still, having Roary quickly point to the obvious reason for fails will make life easier for a couple of people.
Best,
Bastien
Hi,
I'm using Roary with a bunch of bacterial genomes; some have been annotated with prokka, some others not. A genbank file is available for all of them. I've converted all the genbank files to gff3 using the bcbio gff writer (https://github.com/chapmanb/bcbb/tree/master/gff), which to the best of my knowledge produces valid GFF3 files.
When running using the prokka generated gff files the program runs smoothly; when running with the gff files derived from the genbank file, the program halts with the following error:
BLAST Database error: No alias or index file found for protein database [/home/user/workspace/Roary/bin/UcWJpjcOru/output_contigs] in search path [/home/user/workspace/Roary/bin::]
Some files are however still produced, like the gene_presence_absence.csv one, even though the genomes columns do not contain the locus_tag but either nothing or the EC_number (see below). A more detailed documentation on the expected GFF format (order of the anotations for instance) would maybe help?
Thanks a lot,
Marco
Example of annotation from prokka:
gnl|Prokka|GENOME02_contig000001 Prodigal:2.6 CDS 42 578 . + 0 ID=GENOME02_00001;inference=ab initio prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical protein;protein_id=gnl|Prokka|GENOME02_00001
Example annotation from the gff file converted from the prokka genbank file:
GENOME02_contig000001 feature CDS 42 578 . + 0 codon_start=1;inference=ab initio prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical protein;protein_id=Prokka:GENOME02_00001;transl_table=11;translation=MIAEIFQGGFVVFQQQFSKVHFEAATTHNAHHHDVGGFTAESEGRNLPAAQTQTFREVVQGVSRIFTIFQFEANRRDAFVRATRTDELIRPQFGDFIRQISGNLVRGVLYFGIAFTTEAQEFIVLCNYLTRRAGEVDGKSTNLTTQVVNVEHQFLRQRFFVTPDNPAAAQRSQTEFMA
gene_presence_absence.csv produced from the prokka gff files (6036 lines):
"group_4797","","hypothetical protein","3","3","1","","","","","","GENOME02_00001","GENOME03_01386","GENOME04_00768"
gene_presence_absence.csv produced from the gff files derived from the genbank (2472 lines):
"group_1","","","1","1","1","","","","","","","","EC_number=2.7.2.11"
/software/pathogen/external/apps/usr/local/bin/Rscript
should be
/usr/bin/env Rscript
Hi there,
maybe this is due to preliminary status of Roary.
The prepackaged download .tar.gz file on http://sanger-pathogens.github.io/Roary/ apparently comes with binaries for Linux and OSX. However, if the user does not actively take them up in the $PATH, Roary will not use them (and fail if there are no other versions of them in the $PATH).
Test whether a tool is in the current PATH (see #214) and if not, make sure the prepackaged version is used.
If the -e option is used, prank seg faults if all the sequences are the same.
FYI, i noticed that in the folder roary is run from that there is an tradtional tmpdir()
folder made, but also lots of temp files (some with underscore, some .faa etc) which are not in that tmpdir.
Not sure if this is deliberate or not.
The conversion to presence/absence of accessory genes to Newick files goes wrong since the update to 3.5.1. The .fa file contains only C's, and hence there is no difference visible.
This is tested with datasets that previously gave clear differences.
Summary file:
Core genes (99% <= strains <= 100%): 1461
Soft core genes (95% <= strains < 99%): 765
Shell genes (15% <= strains < 95%): 1389
Cloud genes (0% <= strains < 15%): 7657
Total genes: 11272
So branch lengths of 0 are not expected.
The Newick file looks like this:
(L1_Lm_10KSM:0.0,L1_Lm_11KSM:0.0,L1_Lm_13KSM:0.0,L1_Lm_15KSM:0.0,L1_Lm_4KSM:0.0,L1_Lm_6KSM:0.0,L1_Lm_8KSM:0.0,L1_Lm_BHU1:0.0,L1_Lm_BHU2:0.0,L1_Lm_BHU3:0.0, etc
The .fa file says:
L1_Lm_10KSM
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
L1_Lm_11KSM
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
L1_Lm_13KSM
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
(etc)
Hi there,
this could get messy and I do not know whether you want to go down this path.
The binaries prepackaged in the .tar.gz downloadable on http://sanger-pathogens.github.io/Roary/ are dynamically linked. Which means they will fail on older installations with, e.g., messages like these:
bach@trinity:/opt/biosw/roary/binaries/linux$ ./bedtools
./bedtools: /lib/libc.so.6: version GLIBC_2.15' not found (required by ./bedtools) ./bedtools: /lib/libc.so.6: version
GLIBC_2.14' not found (required by ./bedtools)
I just saw problems with GLIBC, but maybe there may be others lurking in the background. Users not always have the possibility to upgrade their machines or patch in newer versions of GLIBC.
You could provide statically linked executables in the download (I do that for MIRA), but this might become a messy thing for OSX.
I've written a homebrew package for Roary in my tap.
https://github.com/tseemann/homebrew-bioinformatics-linux/blob/master/roary.rb
It doesn't install Roary itself (as Brew doesn't do Perl) but it installs its dependencies and checks that the Bio::Roary Perl module is installed.
The accessory_binary_genes.fa.newick addition is great. One thing I noticed is that the name of each entry contains the full path name, i.e. /usr/name/yadda1/yadda2/roary/cxgf68xy/sample1, /usr/name/yadda1/yadda2/roary/cxgf68xy/sample2, etc.
I would prefer only the names without the path, is inclusion of the full pathname by design?
cpan Bio::Roary
Reading '/bio/perl5/.cpan/Metadata'
Database was generated on Sun, 31 May 2015 22:41:02 GMT
Bio::Roary is up to date (undef).
I asked for version, it was a success, not a failure :)
Important for brew test results.
% roary --version
3.5.1
% echo $?
255
Also, roary -a
returns 2
which is non-zero and indicates error.
Important for pipelines that check before they run.
Hello,
after asking IT to install roary into local cluster I have been stuck with the following error:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file 'nameX.gff.proteome.faa.intermediate.extracted.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/Root/Root.pm:449
STACK: Bio::Root::IO::_initialize_io /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/Root/IO.pm:270
STACK: Bio::SeqIO::_initialize /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO.pm:499
STACK: Bio::SeqIO::fasta::_initialize /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO.pm:375
STACK: Bio::SeqIO::new /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO.pm:421
STACK: Bio::Roary::ExtractProteomeFromGFF::_fastatranslate /opt/apps/roary/3.2.7/lib/Bio/Roary/ExtractProteomeFromGFF.pm:138
STACK: Bio::Roary::ExtractProteomeFromGFF::_convert_nucleotide_to_protein /opt/apps/roary/3.2.7/lib/Bio/Roary/ExtractProteomeFromGFF.pm:152
STACK: Bio::Roary::ExtractProteomeFromGFF::fasta_file /opt/apps/roary/3.2.7/lib/Bio/Roary/ExtractProteomeFromGFF.pm:43
STACK: Bio::Roary::CommandLine::ExtractProteomeFromGff::run /opt/apps/roary/3.2.7/lib/Bio/Roary/CommandLine/ExtractProteomeFromGff.pm:86
I am sure my input files is ok, they finished fine in my laptop with older roary (2.0.0).
Any ideas of what to check are very welcome!
Thank you,
Nadejda
CPAN installs something with version 2.2.3 but not git release for it.
Hi,
The binaries supplied with roary complain about GLIBC:
/lib64/libc.so.6: version `GLIBC_2.14' not found
/usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found
Can the binaries supplied with the package be re-compiled against the lowest possible GLIBC please. We have 2.12 but, I am sure there are others that run older versions on the GLIBC
OLD
# Triple memory for worst case senario
$memory_required *= 5;
NEW?
# Pentuple memory for worst case sCenario
$memory_required *= 5;
Hi. I downloaded Roary on our iridis (SSH). I am having a problem while trying to run (roary *.gff) on command line. The error looks something like that
EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open PROKKA_SG1.gff.proteome.faa.intermediate.extracted.fa: No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /local/software/perl-modules/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::Root::IO::_initialize_io /local/software/perl-modules/share/perl5/Bio/Root/IO.pm:351
STACK: Bio::SeqIO::_initialize /local/software/perl-modules/share/perl5/Bio/SeqIO.pm:491
STACK: Bio::SeqIO::fasta::_initialize /local/software/perl-modules/share/perl5/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /local/software/perl-modules/share/perl5/Bio/SeqIO.pm:372
STACK: Bio::SeqIO::new /local/software/perl-modules/share/perl5/Bio/SeqIO.pm:413
STACK: Bio::Roary::ExtractProteomeFromGFF::_fastatranslate /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/ExtractProteomeFromGFF.pm:138
STACK: Bio::Roary::ExtractProteomeFromGFF::_convert_nucleotide_to_protein /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/ExtractProteomeFromGFF.pm:152
STACK: Bio::Roary::ExtractProteomeFromGFF::fasta_file /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/ExtractProteomeFromGFF.pm:43
STACK: Bio::Roary::CommandLine::ExtractProteomeFromGff::run /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/CommandLine/ExtractProteomeFromGff.pm:79
I have Muscle and revtrans.py (v1.4) installed and in the PATH, but the .aln file produced contains only N's. The GFF's are from Prokka and work fine for creating the gene lists, but somewhere it goes wrong in the reverse translation.
Any thoughts?
The commandline is: roary -v -i 90 -e *.gff.
Edit: the temporary files are created fine, so the revtrans.py works fine, it is just in the conversion of all the temporary files to the final alignment which goes wrong
All the binary dependencies are in Homebrew Science. The main issue is Perl modules. Homebrew only checks to see if they are installed. You can install them your ones in custom places etc, but the dependencies are trickier.
Have you considered fatpack?
http://search.cpan.org/~mstrout/App-FatPacker-0.010003/lib/App/FatPacker.pm
I'm happy to help get it into Brew. If all the Perl deps were already installed, could i run it from the untarred file somehow? Or customize where it installs it?
Hi
I got this error when I try to create a core alignment
Thanks
I didn't have much luck getting Marco's contrib/ plots script to work, so I ended up writing a basic SVG plotter for roary output. It's going to be a permanent part of Nullarbor but you can add it to your contrib/ folder if you think others would benefit.
https://raw.githubusercontent.com/tseemann/nullarbor/master/bin/roary2svg.pl
It's pretty basic... SVG sucks at fonts, so it's more confusing than it needs to be, and it may not even work 100% for crazy pan-genomes, not sure yet!
The --taxacol is so it's easy to adapt when you break the .csv file structure ;-)
I just use "convert foo.svg foo.png" to get an image if i need it.
Below is some details:
Usage: /home/tseemann/git/nullarbor/bin/roary2svg.pl [options] gene_presence_absence.csv > pan_genome.svg
--help This help.
--verbose! Verbose output (default '0').
--width=i Canvas width (default '1024').
--height=i Row height (and ~ font height) (default '20').
--taxacol=i Column in gpa.csv where taxa begin (default '14').
--panonly! Only non-core genes (default '0').
is it possible to use query_pan_genome with a percentage, just as roary? For example, when comparing two groups, be able to set thresholds like "max 30% present in group 1, min 70% present in group 2" or vice versa. Maybe it's already there and I am missing it.
Thanks :)
Hi
I used Roary 3.2.7 to find the orthologs in a collection of 650 e.coli isolates (commandline: roary -e --mafft *.gff) To my surprise identical sequences are placed in different clusters. ( I have reproduced this error with Roary 3.3.4 in a smaller set of 11 genomes, here only two out of 4000 genes are duplicates, so it appears it may be fixed, however not completely.)
Total genes: 40335 according to summary_statistics
cat pan_genome_reference.fa |fasta2tab |cut -f 3 |wc -l
40335
cat pan_genome_reference.fa |fasta2tab |cut -f 3 |sort |uniq |wc -l
38458
This not a few OGs, these are almost 2000 OGs with identical nucleotide sequence being split up.
I see other even weirder errors as well, e.g with ~1700 neisseria genomes:
Total genes: 46351 according to summary_statistics
cat pan_genome_reference.fa |fasta2tab |cut -f 3 |wc -l
158976 (far to high)
cat pan_genome_reference.fa |fasta2tab |cut -f 3 |sort|uniq |wc -l
53305 (triplication of the sequences??? and still more than 46351)
or with 15 or so staphylococcus genomes:
Total genes: 4684 according to summary_statistics
cat mrsa/pan_genome_reference.fa |fasta2tab |cut -f 3 |wc -l
11696
cat mrsa/pan_genome_reference.fa |fasta2tab |cut -f 3 | sort|uniq |wc -l
7199 (duplicate sequences)
cat mrsa/pan_genome_reference.fa |fasta2tab |cut -f 1 | sort|uniq |wc -l
7202 (duplicate names as well?? the protein names are unique in my genome files)
In some cases there are duplicate genes in the pan genome reference file, sometimes there are genes with duplicate sequences. The numbers don't add up with the summary statistics file or the gene presence absence matrix. Also, by manually checking I found genes with exactly the same sequence placed in two different clusters in the gene_presence_absence file.
These are serious errors and I feel some release testing should have been implemented (especially as this is published software). Although the 3.3.4 release appears to fix some of the issues, it is still not correct. Could you please take if you can reproduce this issue? Here's the 11 genomes set:
http://klif.vet.uu.nl/mrsa/
(roary -e --mafft *.gff results in 4683 OGs, with 2 identical sequences as separate OGs)
fasta2tab transforms fasta file into a tab delimited file. This is just some perl code.
perl -e '$count=0;
I primarily use the CSV files produced by Roary, and currently markers missing are represented by an empty cell if opened in a spreadsheet. With large datasets (say 500-1000 genomes) this is sometimes awkward, especially if one wants to do formatting. If all cells were filled (i.e. N/A for missing values), then this would make life a lot easier.
Is this possible? It saves doing Find/Replace of empty cells in Excel. I have done Find/Replace of "" in the text file, but this gives some errors and hence does not work well.
Note sure what these relate to:
2015/11/05 09:32:30 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presen
sence.csv -c _clustered.clstr -i /mnt/seq/JOBS/J2014-06814/nullarbor.modern/roary/PGjpiKTwDC//_gff_files -f /mnt/s
BS/J2014-06814/nullarbor.modern/roary/PGjpiKTwDC//_fasta_files -t 11 --dont_create_rplots -v -j Parallel --proc
s 8 --group_limit 50000 -cd 99
Use of uninitialized value in require at /bio/perl5/lib/perl5/File/Slurper.pm line 32.
2015/11/05 09:32:32 Reinflate clusters
2015/11/05 09:32:32 Split groups with paralogs
Use of uninitialized value in require at /bio/perl5/lib/perl5/x86_64-linux-thread-multi/Encode.pm line 59.
Hi!
I get this recurring error message, also in the newest version.
I ran with -v option to get a bit more info on when it is happening and it seems to be in the end of the FastTree part.
2015/11/13 09:40:24 Running command: /usr/local/bin/FastTree -fastest -nt accessory_binary_genes.fa > accessory_binary_genes.fa.newick
FastTree Version 2.1.8 SSE3, OpenMP (4 threads)
Alignment: accessory_binary_genes.fa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.50
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
Initial topology in 0.00 seconds
Refining topology: 0 rounds ME-NNIs, 2 rounds ME-SPRs, 0 rounds ML-NNIs
Total branch-length 0.000 after 0.00 sec
Total time: 0.00 seconds Unique: 1/16 Bad splits: 0/0
Aligning each cluster
Use of uninitialized value in require at (eval 1957) line 1.
The next thing that happens is:
2015/11/13 09:41:08 Running command: protein_alignment_from_nucleotides -v pan_genome_sequences/acdA.fa pan_genome_sequences/ackA.fa pan_genome_sequences/acpP.fa pan_genome_sequences/acyP.fa pan_genome_sequences/addA.fa pan_genome_sequences/adh.fa pan_genome_sequences/adk.fa pan_genome_sequences/ahpC.fa pan_genome_sequences/alaS.fa pan_genome_sequences/alr.fa
After this it continues to run and finishes without further complaints and I wonder if the error is something that one should be concerned about?
Best wishes,
Kaisa
Currently summary_statistics.txt is space padded. I am currently splitting it on :
to format it into a table for reports. Maybe make it TSV ?
Roary outputs a lot of files. I was hoping you would add an --outdir
option to place everything in a specific folder?
This would simplify pipelines so they don't have to do lots of mkdir/cd logic.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.