Giter Site home page Giter Site logo

orthomcl-pipeline's Introduction

OrthoMCL Pipeline

Automates running of OrthoMCL software from http://orthomcl.org/orthomcl/

Usage

The brief overview of running the OrthoMCL pipeline is as follows:

  1. Run the following command to setup the database, verify the setup and generate an OrthoMCL configuration file.

    perl scripts/orthomcl-setup-database.pl --user orthomcl_database_user --password orthomcl_database_password --host orthomcl_database_host --database orthomcl_database --outfile configure_outfile.conf [--no-create-database]
  2. Run the following command to start OrthoMCL.

    perl scripts/orthomcl-pipeline.pl -i input/ -o output/ -m orthomcl.conf --nocompliant

    Where input/ contains a set of gene annotations in FASTA format, one file per genome (e.g. genome1.fasta, genome2.fasta, must end in .fasta), output/ is the location to store the OrthoMCL output files, orthomcl.conf is the OrthoMCL configuration file generated in step 2, and --nocompliant adjusts gene names in fasta files to make them unique.

A walkthrough of using the OrthoMCL pipeline on example data can be found at https://github.com/apetkau/microbial-informatics-2014/tree/master/labs/orthomcl.

Installation

Please see the Installation documentation for details on how to install.

Detailed Usage

Usage: orthomcl-pipeline -i [input dir] -o [output dir] -m [orthmcl config] [Options]
	Options:
	-i|--input-dir: The input directory containing the files to process.
	-o|--output-dir: The output directory for the job.
	-s|--split:  The number of times to split the fasta files for blasting
	-c|--config:  The main config file (optional, overrides default config).
	-m|--orthomcl-config:  The orthomcl config file
	--compliant:  If fasta data is already compliant (headers match, etc) (default).
	--nocompliant:  If fasta data is not already compliant (headers match, etc).
	--print-config: Prints default config file being used.
	--print-orthomcl-config:  Prints example orthomcl config file.
	--yes: Automatically answers yes to every question (could overwrite/delete old data).
	--scheduler: Defined scheduler (sge or fork).
	--no-cleanup: Does not remove temporary tables from database.
	-h|--help:  Show help.

	Examples:
	orthomcl-pipeline -i input/ -o output/ -m orthomcl.config
		Runs orthomcl using the input fasta files under input/ and orthomcl.confg as config file.
		Places data in output/.  Gets other parameters (blast, etc) from default config file.

	orthomcl-pipeline -i input/ -o output/ -m orthomcl.config -c orthomcl-pipeline.conf
		Runs orthomcl using the given input/output directories.  Overrides parameters (blast, etc)
		from file orthomcl-pipeline.conf.

	orthomcl-pipeline --print-config
		Prints default orthomcl-pipeline.conf config file (which can then be changed).

	orthomcl-pipeline --print-orthomcl-config
		Prints orthomcl example config file which must be changed to properly run.

	orthomcl-pipeline -i input/ -o output/ -m orthomcl.confg --compliant
		Runs orthmcl with the given input/output/config files.
		Skips the orthomclAdjustFasta stage on input files.

	orthomcl-pipeline -i input/ -o output/ -m orthomcl.confg --no-cleanup
		Runs orthmcl with the given input/output/config files.
		Does not cleanup temporary tables.

orthomcl-pipeline's People

Contributors

apetkau avatar jencabral avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

orthomcl-pipeline's Issues

YAML/Tiny.pm preventing set-up

When trying to run orthomcl-pipeline-setup.pl, the error message says: Can't locate YAML/Tiny.pm in @inc (you may need to install the YAML::Tiny module). When I try to install this module it fails at installing another module, Devel::Leak::Module and therefore cannot get get the YAML: Tiny module to install.

Verify README instructions for setting up database.

While I was testing this, I ran into issues when following the instructions for setting up the database https://github.com/apetkau/orthomcl-pipeline/blob/development/INSTALL.md. In particular, after running perl scripts/orthomcl-setup-database.pl ... and then attempting to run the pipeline I run into issues like DBD::mysql::st execute failed: CREATE VIEW command denied to user 'ortho3'@'localhost' for table 'InterTaxonMatch' at /home/aaron/Projects/software/orthomcl-pipeline/orthomclsoftware-custom/bin/orthomclInstallSchema line 184, <F> line 12.. I'm wondering if you have to add more statements to the GRANT command for this all to work.

Can you look into this? Thanks.

Suggestion : Update to use blast+

Hi
The legacy blast (blastall, formatdb) is being updated anymore.
I would suggest to update the requirement to the blast+ software
Best
Greg

failing tests with t/test_pipeline.pl

Hi, I am trying to install OrthoMCL on my google colab environment.

I have successfully set up everything up to the orthomcl orthomcl-setup-database.pl step, but I had errors when trying to run the test_pipeline.pl, so I was hoping to get some advice about it.

When I tried to run !perl orthomcl-pipeline/t/test_pipeline.pl -m orthomcl.conf -s fork -t /tmp , I received error messages as below:

Test using scheduler fork

TESTING NON-COMPLIANT INPUT
TESTING FULL PIPELINE RUN 2
README:
Test with split=2 of fasta files.
Could not execute command /content/orthomcl-pipeline/t/../bin/orthomcl-pipeline --nocompliant --scheduler fork --yes -c /content/orthomcl-pipeline/t/data/basic/2/etc/orthomcl-pipeline.conf -i /content/orthomcl-pipeline/t/data/basic/2/input -o /tmp/orthomcl-pipeline.fEJpC0/output -m /tmp/orthomcl-pipeline.fEJpC0/orthomcl.config 2>/tmp/orthomcl-pipeline.fEJpC0/orthomcl-pipeline.err.log 1>/tmp/orthomcl-pipeline.fEJpC0/orthomcl-pipeline.out.log

The following are error log files:

!cat /tmp/orthomcl-pipeline.fEJpC0/orthomcl-pipeline.err.log

Error executing command: /content/orthomclSoftware-v2.0.9/bin/orthomclLoadBlast "/tmp/orthomcl-pipeline.fEJpC0/orthomcl.config" "/tmp/orthomcl-pipeline.fEJpC0/output/blast_load/similarSequences.txt" 1>/tmp/orthomcl-pipeline.fEJpC0/output/log/10.orthomclLoadBlast.out.log 2>/tmp/orthomcl-pipeline.fEJpC0/output/log/10.orthomclLoadBlast.err.log. See logs /tmp/orthomcl-pipeline.fEJpC0/output/log/10.orthomclLoadBlast.out.log and /tmp/orthomcl-pipeline.fEJpC0/output/log/10.orthomclLoadBlast.err.log

!cat /tmp/orthomcl-pipeline.fEJpC0/orthomcl-pipeline.err.log

DBD::mysql::st execute failed: Loading local data is disabled; this must be enabled on both the client and server sides at /content/orthomclSoftware-v2.0.9/bin/orthomclLoadBlast line 39, <F> line 12.

!cat /tmp/orthomcl-pipeline.fEJpC0/orthomcl-pipeline.out.log

Starting OrthoMCL pipeline on: Sun May 19 09:51:53 2024
Git commit: d0bacb3bd0f655406e09bc7fc3f776a40a57c75c


=Stage 1: Validate Files =
Validating 2.fasta ... 5 sequences
Validating 1.fasta ... 5 sequences
Validating 3.fasta ... 5 sequences
Validated 3 files
Stage 1 took 0.00 minutes 

=Stage 2: Validate Database=
Warning: some tables exist already in database dbi:mysql:orthomcl:localhost:mysql_local_infile, user=orthomcl, database_name=orthomcl. Do you want to remove (y/n)? Executing: 'drop database orthomcl'
Executing: 'create database orthomcl'
Successfully removed old database entries
Stage 2 took 0.02 minutes 


=Stage 3: Load OrthoMCL Database Schema=
/content/orthomclSoftware-v2.0.9/bin/orthomclInstallSchema "/tmp/orthomcl-pipeline.fEJpC0/orthomcl.config" "/tmp/orthomcl-pipeline.fEJpC0/output/log/orthomclSchema.log" 1>/tmp/orthomcl-pipeline.fEJpC0/output/log/3.loadschema.stdout.log 2>/tmp/orthomcl-pipeline.fEJpC0/output/log/3.loadschema.stderr.log
Stage 3 took 0.00 minutes 


=Stage 4: Adjust Fasta=
/content/orthomclSoftware-v2.0.9/bin/orthomclAdjustFasta 2 "/content/orthomcl-pipeline/t/data/basic/2/input/2.fasta" 1
/content/orthomclSoftware-v2.0.9/bin/orthomclAdjustFasta 1 "/content/orthomcl-pipeline/t/data/basic/2/input/1.fasta" 1
/content/orthomclSoftware-v2.0.9/bin/orthomclAdjustFasta 3 "/content/orthomcl-pipeline/t/data/basic/2/input/3.fasta" 1
Stage 4 took 0.00 minutes 


=Stage 5: Filter Fasta=
/content/orthomclSoftware-v2.0.9/bin/orthomclFilterFasta "/tmp/orthomcl-pipeline.fEJpC0/output/compliant_fasta" 10 20
Stage 5 took 0.00 minutes 


=Stage 6: Split Fasta=
splitting /tmp/orthomcl-pipeline.fEJpC0/output/blast_dir/goodProteins.fasta into 2 pieces
Stage 6 took 0.00 minutes 


=Stage 7: Format Database=
/content/blast-2.2.26/bin/formatdb -i "/tmp/orthomcl-pipeline.fEJpC0/output/blast_dir/goodProteins.fasta" -p "T" -l "/tmp/orthomcl-pipeline.fEJpC0/output/log/formatdb.log" 1>/tmp/orthomcl-pipeline.fEJpC0/output/log/7.format-stdout.log 2>/tmp/orthomcl-pipeline.fEJpC0/output/log/7.format-stderr.log
Stage 7 took 0.00 minutes 


=Stage 8: Perform Blast=
performing blastsexecuting /content/blast-2.2.26/bin/blastall -p "blastp" -i "/tmp/orthomcl-pipeline.fEJpC0/output/blast_dir/goodProteins.fasta.1" -m "8" -d "/tmp/orthomcl-pipeline.fEJpC0/output/blast_dir/goodProteins.fasta" -o "/tmp/orthomcl-pipeline.fEJpC0/output/blast_results/blast_results.1" -v "100000" -F "F" -e "1e-5" -b "100000" 1>/tmp/orthomcl-pipeline.fEJpC0/output/log/blast/8.stdout.blast.1 2>/tmp/orthomcl-pipeline.fEJpC0/output/log/blast/8.stderr.blast.1
executing /content/blast-2.2.26/bin/blastall -p "blastp" -i "/tmp/orthomcl-pipeline.fEJpC0/output/blast_dir/goodProteins.fasta.2" -m "8" -d "/tmp/orthomcl-pipeline.fEJpC0/output/blast_dir/goodProteins.fasta" -o "/tmp/orthomcl-pipeline.fEJpC0/output/blast_results/blast_results.2" -v "100000" -F "F" -e "1e-5" -b "100000" 1>/tmp/orthomcl-pipeline.fEJpC0/output/log/blast/8.stdout.blast.2 2>/tmp/orthomcl-pipeline.fEJpC0/output/log/blast/8.stderr.blast.2
Stage 8 took 0.02 minutes 
done


=Stage 9: Parse Blast Results=
cat /tmp/orthomcl-pipeline.fEJpC0/output/blast_results/blast_results.* > /tmp/orthomcl-pipeline.fEJpC0/output/blast_load/all.fasta
/content/orthomclSoftware-v2.0.9/bin/orthomclBlastParser "/tmp/orthomcl-pipeline.fEJpC0/output/blast_load/all.fasta" "/tmp/orthomcl-pipeline.fEJpC0/output/compliant_fasta" 1>/tmp/orthomcl-pipeline.fEJpC0/output/blast_load/similarSequences.txt 2>/tmp/orthomcl-pipeline.fEJpC0/output/log/9.parseBlast.log
Stage 9 took 0.00 minutes 


=Stage 10: Load Blast Results=
/content/orthomclSoftware-v2.0.9/bin/orthomclLoadBlast "/tmp/orthomcl-pipeline.fEJpC0/orthomcl.config" "/tmp/orthomcl-pipeline.fEJpC0/output/blast_load/similarSequences.txt" 1>/tmp/orthomcl-pipeline.fEJpC0/output/log/10.orthomclLoadBlast.out.log 2>/tmp/orthomcl-pipeline.fEJpC0/output/log/10.orthomclLoadBlast.err.log

Thank you for reading through all this. Looking forward to your reply,
Sarah

orthomclpairs table lock size error

hello,when I run stage 11: OrthoMCL Pairs=
orthomclSoftware-v2.0.9/bin/orthomclPairs "orthomcl-pipeline/orthomcl.conf" "/orthomcl/log/11.orthomclPairs.log" "cleanup=yes" 1>/orthomcl/log/11.orthomclPairs.log.stdout 2>/orthomcl/log/11.orthomclPairs.log.stderr

Output such an error:
DBD::mysql::st execute failed: The total number of locks exceeds the lock table size at /he_lab/share/data/local/orthomcl-pipeline/orthomclSoftware-v2.0.9/bin/orthomclPairs line 709, line 12.

Can you give me some suggestions and solutions? Thank you!

Stopped @ stage 8 BLAST4

My Orthomcl-pipeline stopped at stage 8 due to power failure, since I am working from home. It was working fine till then without any errors.
Do you have any suggestions on how to continue or Resume my run from this stage , rather than starting all over again, since I am working on a total of 15 proteomes and the size of the results of each of the 4 blast run file is 18gb each and it already took a lot of time

groups datafile further analysis

hi there,
thanks a lot! it is a very useful and powerful pipeline!
i used ur pipeline to analysed my genome(5 strains) data and get a result file named groups.txt, i want to know how to understand this file, coz i want to get a core gene set and the accessary genome detail(strain specific gene+shared by several strains),could you give me some advise? how can i get these results?
I am looking forward to your reply! I would be grateful to all your kindness!
best regards!
wenjing cui

Multiple species

Hello

Could you please help me to run your pipeline? I only have one gene that I would like to find its orthologs in 136 sequenced plant genomes.

Do I need to download the sequenced genomes (in a folder) or this tool works with databases?

Does this tool create and set up MySql?

Please give me an example script as well.

Regards

Is there a problem with my input file?

=Stage 1: Validate Files =
Validating mfilter.fasta ... 47599 sequences
Error: file /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi2_in/bfilter.fasta contains a sequence (TRINITY_DN17801_c2_g3_i1.p2) containing non-protein alphabet (dna) at /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/bin/../scripts/orthomcl-pipeline.pl line 357, line 50938.
Validating bfilter.fasta ...
The above is my running process reported wrong.
I looked at the sequence and didn't see anything wrong, but just delete the sequence and I can run my file completely.
Although I got the output file, now I'd like to ask why stage1 reported the error?

failing tests with t/test_pipeline.pl

Hi, I am trying to install orthomcl-pipeline. I have successfully set up the orthomcl orthomcl-setup-database.pl, but was unable to create the database with this file, I recieved the error message as follows

$ perl orthomcl-setup-database.pl --user orthomcl --password password --host localhost --database orthomcl --outfile orthomcl.conf
Warning: file orthomcl.conf already exists ... overwrite? (Y/N) Y

Config file, orthomcl.conf will be overwritten
Connecting to mysql and creating database orthomcl on host localhost with user orthomcl ...
DBI connect('mysql:localhost:mysql_local_infile=1','orthomcl',...) failed: Access denied for user 'orthomcl'@'localhost' to database 'mysql' at orthomcl-setup-database.pl line 89.
error connecting to database at orthomcl-setup-database.pl line 93, line 1..

I was able to create the orthomcl database manually in mysql and grant access. I can access mysql from the command line with the user 'orthomcl' that i created, but it seems like execution form with orthomcl-pipeline is having trouble.

I passed the orthomcl-pipeline-setup.pl script successfully, but when I try the test run

$ perl t/test_pipeline.pl -m orthomcl.conf -s fork -t /tmp

Test using scheduler fork

TESTING NON-COMPLIANT INPUT
TESTING FULL PIPELINE RUN 5
README:
Tests case of one gene (in 1.fasta) not present in any other files but with a paralog in 1.fasta.
/Path_to_programs/orthomcl-pipeline/t/data/basic/5/groups/groups.txt contains entries (1|a 2|a 3|b 1|b 2|b 3|c 1|c 2|c 3|d 1|d 2|d 3|e 1|e 2|e 3|a 1|f 1|g) not in /tmp/orthomcl-pipeline.dD7Xqn/output/groups/groups.txt
not ok 1 - Expected matched returned groups file
Failed test 'Expected matched returned groups file'
at t/test_pipeline.pl line 204.

TESTING FULL PIPELINE RUN 3
README:
Tests case of one gene (in 1.fasta and 2.fasta) not present in other files.
/Path_to_programs/orthomcl-pipeline/t/data/basic/3/groups/groups.txt contains entries (1|a 2|a 3|b 1|b 2|b 3|c 1|c 2|c 3|d 1|d 2|d 3|e 1|e 2|e 3|a 1|f 2|f) not in /tmp/orthomcl-pipeline.oJnvbO/output/groups/groups.txt
not ok 2 - Expected matched returned groups file
Failed test 'Expected matched returned groups file'
at t/test_pipeline.pl line 204.

And it continues to fail 7 out of 8 tests. However, I cannot locate a err file to see what is going wrong. I have gathered that it is not generating files in the -t directory (i.e /tmp/orthomcl-pipeline.dD7Xqn/output/groups/groups.txt doesn't exist in the above error message). Is this a problem related to the mysql configuration? Any advice would be greatly appreciated.

Thanks
Zaid

error in setp 10 when i run the orthomcl-pipeline with sge scheduler

Hello,
I meet some error when i run the orthomcl-pipeline with sge scheduler.

In the log folder, I get something like these:
orthomclDumpPairs.log
11.orthomclPairs.log.stderr
11.orthomclPairs.log
11.orthomclPairs.log.stdout
10.orthomclLoadBlast.err.log
10.orthomclLoadBlast.out.log
9.parseBlast.log
blast
formatdb.log
7.format-stderr.log
7.format-stdout.log
split.log
filterFasta.log
adjustFasta.log
3.loadschema.stderr.log
3.loadschema.stdout.log
run.properties

In the 10.orthomclLoadBlast.err.log file, I got the error message:
DBI connect('orthomcl:localhost:mysql_local_infile=1','orthomcl',...) failed: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) at /share/nas1/xub/softwares/anaconda3/bin/../lib/perl/OrthoMCLEngine/Main/Base.pm line 56.

There are 1 manager node, 1 storage node and 3 compute nodes in my cluster. In the manager node, I can connect to the mysql with "mysql -uorthomcl -porthomcl" command; and in the compute node, I can connect to the mysql with "mysql -h ip -uorthomcl -porthomcl" command.
Does this error mean that i should install and configure another mysql sever in all compute nodes? or have I make some other mistakes?

Any help is much appreciated.
Thanks.

Error in test, step 3

Hi,

I'm receiving an error in step3 during testing. It seems to be related to DBD::mysql not being found, despite the fact it is installed. I am wondering if it is due to me using perlbrew to have perl 5.28 active, but I am not certain since other modules are being found just fine.

Command:
perl t/test_pipeline.pl -m orthomcl.conf -s fork -t /tmp

The file 3.loadschema.stderr contains the following:

Can't locate DBD/mysql.pm in @inc (you may need to install the DBD::mysql module) (@inc contains: /Users/phelix/apps/orthomclsoftware-custom/bin/../lib/perl /Library/Perl/5.18/darwin-thread-multi-2level /Library/Perl/5.18 /Network/Library/Perl/5.18/darwin-thread-multi-2level /Network/Library/Perl/5.18 /Library/Perl/Updates/5.18.2/darwin-thread-multi-2level /Library/Perl/Updates/5.18.2 /System/Library/Perl/5.18/darwin-thread-multi-2level /System/Library/Perl/5.18 /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level /System/Library/Perl/Extras/5.18 .) at /Users/phelix/apps/orthomclsoftware-custom/bin/../lib/perl/OrthoMCLEngine/Main/Base.pm line 51, line 12.

But if I try to install DBD::mysql I am told it is up to date. If I turn off perlbrew, I am instead told other modules can't be located and that is very early on.

which MySQL version is appropriate

DBD::mysql::st execute failed: The used command is not allowed with this MySQL version at /home/galaxy/Downloads/orthomclSoftware-v2.0.9/bin/orthomclLoadBlast line 39, line 12.

slurm support

Hi
I would like to know if slurm is supported as a scheduler? Currently you mention sge or fork as an option value. Is sge option is compatible with slurm?

Thanks

Joseph

error in step 8: orthomclBlastParser blastresult compliantFasta > similarSequences.txt

here are my codes:
#! /bin/bash
#SBATCH -J processdata.sh
#SBATCH -p cast
#SBATCH -N 1
#SBATCH --ntasks=1
#SBATCH --mem=48gb
#SBATCH --mail-type=ALL
#SBATCH --mail-user=[email protected]
module load OrthoMCL/2.0.9
orthomclBlastParser blastresult compliantFasta > similarSequences.txt
#end

and the slurm file shows:
couldn't find taxon for gene 'apis|J9JPJ9' at /apps/local/software/bioinformatics/OrthoMCL-2.0.9/bin/orthomclBlastParser line 105, line 1.
here are my blastresult file in case you need:
(base) [qulujiang@master orthomcl]$ head blastresult
fin|TRINITY_DN10008_c0_g1_i2.p1 apis|J9JPJ9 31.868 182 115 5 16 192 53 230 6.72e-20 91.7
fin|TRINITY_DN10008_c0_g1_i2.p1 gpap|GPPI003973 20.859 163 120 5 35 192 70 228 1.07e-10 66.2

as you can see , apis|J9JPJ9 is the object ID of the first gene in my input file,so it just stuck here ,could you tell me how to fix the issue so i can move to the next step..
thanks a lot!

Re-run orthoMCL-pipeline from a given stage ?

Hello,
is it possible to re-run orthoMCL-pipeline from a given step ? The pipeline failed after the BLAST, that is obviously the most time consuming step, so I would greatly prefer not to have to go through again...
Thanks in advance, Damien

Error while connection with db

hello ,

Can you you please help to figure out whats wrong with Orthomcl databse connection??

error is-----
Connecting to mysql and creating database orthomcl1 on host localhost with user [email protected] ...
DBI connect('mysql:localhost:mysql_local_infile=1','[email protected]',...) failed: Access denied for user '[email protected]'@'localhost' (using password: YES) at ./orthomcl-setup-database.pl line 89.
error connecting to database at ./orthomcl-setup-database.pl line 93.

Does the MySQL version have an impact on the results?

Hello, when I was doing this process, I was always reporting the error at step 10, and my MySQL version started out with 5.5, and I saw someone saying it had to be 5.7.So I changed the version of MySQL from 5.5 to 5.6 and then to 5.7, but why is it always wrong at step 10?I wonder if there is any requirement for the version of MySQL.Besides, why is my tenth step wrong?

=Stage 10: Load Blast Results=
/data/users/zhangjingjing/OrthoMCL/orthomclSoftware-v2.0.9/bin/orthomclLoadBlast "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/scripts/orthomcl.conf" "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/blast_load/similarSequences.txt" 1>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/log/10.orthomclLoadBlast.out.log 2>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/log/10.orthomclLoadBlast.err.log
Error executing command: /data/users/zhangjingjing/OrthoMCL/orthomclSoftware-v2.0.9/bin/orthomclLoadBlast "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/scripts/orthomcl.conf" "/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/blast_load/similarSequences.txt" 1>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/log/10.orthomclLoadBlast.out.log 2>/data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/log/10.orthomclLoadBlast.err.log. See logs /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/log/10.orthomclLoadBlast.out.log and /data/users/zhangjingjing/OrthoMCL/orthomcl-pipeline/ceshi11_out/log/10.orthomclLoadBlast.err.log

Stage 5: Filter Fasta error

Hi there,
I have got an error at stage 5 indicating that the header of fasta files are not correct:
processing file GCF_014441545.1_ROS_Cfam_1.0_protein.fasta
The ID on def line '>GCF_014441545.1_ROS_Cfam_1.0_protein|NP_001002930.1' is missing the prefix '0_protein|' 'GCF_014441545.1_ROS_Cfam_1.0_protein'

I have downloaded the protein fasta files from the NCBI genome annotation for each species. I wonder if I need to edit these files before processing using pipeline. Can you please let me know which format of fasta file would require for orthomcl-pipeline
Thanks

=Stage 5: Filter Fasta= error

=Stage 5: Filter Fasta=
/home/orthomcl/opt/orthomclSoftware-v2.0.9/bin/orthomclFilterFasta "/home/orthomcl/test_data/zbl-out/compliant_fasta" 10 20
Failed for command /home/orthomcl/opt/orthomclSoftware-v2.0.9/bin/orthomclFilterFasta "/home/orthomcl/test_data/zbl-out/compliant_fasta" 10 20. Check log /home/orthomcl/test_data/zbl-out/log/filterFasta.log at /home/orthomcl/orthomcl-pipeline/bin/../scripts/orthomcl-pipeline.pl line 433, <> line 1.

I used new data,but I met a error as above. I think mybe it's data problem this time.
How can I fix it? thanks!!! @apetkau

Stage 5: Filter Fasta error In Window11/Ubuntu22.02.4

I have Ubuntu22.02.4 in Winsows11 VMware,and I have installed the orthomcl-pipeline in it.
I have got an error at stage 5 as “Failed for command /home/tian/orthomclsoftware-custom-master/bin/orthomclFilterFasta "/home/tian/List/Output/compliant_fasta" 10 20.”
The filterFasta.log,as the "check log",shows that:
processing file GCF_000225385.1protein.fasta
The ID on def line '>GCF_000225385.1protein|WP_000068665.1' is missing the prefix '1protein|' 'GCF_000225385.1protein'

The orthomcl version is orthomclsoftware-custom.I don't know if it's related to the ubuntu,orthomcl,or some other things.
Looking forward to your answer.Thank you very much!
@apetkau

Wrong syntax in scripts/orthomcl-pipeline.pl at line 294.

The warning message 'Odd number of elements ...' is known issue, but no problem in OrthoMCL running as mentioned in previous post.
I had also ignored this warning message in Ubuntu14.04(perl[v5.18.2]) environment because running properly.

But OrthoMCL running error occurs after upgrading Ubuntu16.04(perl[v5.22.1]) environment.
I diagnosed this error and figured out error causes in scripts/orthomcl-pipeline.pl at line 294.

***** scripts/orthomcl-pipeline.pl at line 294 *****
Before: Bio::SeqIO->new(-file => $file_path, 'Fasta')
After: Bio::SeqIO->new(-file => $file_path, -format => 'Fasta')

After modifying line 294 as above, OrthoMCL runnning properly with no warning message.
This wrong syntax causes warning message 'Odd number of elements ...' and critical error in my Ubuntu16.04(perl[v5.22.1]) environment.

Can you please modify scripts/orthomcl-pipeline.pl at line 294 as mentioned above?

Yuki

Questions about OrthoMCL software

Hi,I met a mistake, how can I fix it? Thanks!!!
/orthomcl/orthomcl-pipeline-master/scripts/nml_parse_orthomcl.pl -i ./groups.txt -g genome-groups.txt -s --draw -o orthomcl-stats.txt --genes (What is the file for the -g parameter? -g : A CSV file that contains list of groups to view their ortholog groups. How do I get this CSV file.Is it a generated file or a file that needs to be configured. ) Thank you very much 

failing tests with t/test.pipeline.pl

Hi,
I am trying to install orthomcl-pipeline and I successfully wrote the path of all programs into orthomcl-pipeline.conf. Here is the content of the file.

blast:
F: 'm S'
b: '100000'
e: '1e-5'
v: '100000'
filter:
max_percent_stop: '20'
min_length: '10'
mcl:
inflation: '1.5'
path:
blastall: /usr/bin/blastall
formatdb: /usr/bin/formatdb
mcl: /home/huiyu/.linuxbrew/bin/mcl
orthomcl: /home/huiyu/Desktop/orthomclSoftware-v2.0.9/bin
scheduler: fork
split: '4'

However, when i try to run the test, I had 7 of those 8 tests failed. Here is the report:
$ perl t/test_pipeline.pl -m orthomcl.conf -s fork -t /tmp
Test using scheduler fork

TESTING NON-COMPLIANT INPUT
TESTING FULL PIPELINE RUN 4
README:
Tests case of one gene (in 1.fasta) not present in any other files.
/home/huiyu/orthomcl-pipeline/t/data/basic/4/groups/groups.txt contains entries (1|e 2|e 3|a) not in /tmp/orthomcl-pipeline.98cOZk/output/groups/groups.txt
not ok 1 - Expected matched returned groups file

Failed test 'Expected matched returned groups file'

at t/test_pipeline.pl line 204.

TESTING FULL PIPELINE RUN 2
README:
Test with split=2 of fasta files.
/home/huiyu/orthomcl-pipeline/t/data/basic/2/groups/groups.txt contains entries (1|e 2|e 3|a) not in /tmp/orthomcl-pipeline.IcSyak/output/groups/groups.txt
not ok 2 - Expected matched returned groups file

Failed test 'Expected matched returned groups file'

at t/test_pipeline.pl line 204.

TESTING FULL PIPELINE RUN 5
README:
Tests case of one gene (in 1.fasta) not present in any other files but with a paralog in 1.fasta.
/home/huiyu/orthomcl-pipeline/t/data/basic/5/groups/groups.txt contains entries (1|e 2|e 3|a) not in /tmp/orthomcl-pipeline.eOWOw9/output/groups/groups.txt
not ok 3 - Expected matched returned groups file

Failed test 'Expected matched returned groups file'

at t/test_pipeline.pl line 204.

TESTING FULL PIPELINE RUN 3
README:
Tests case of one gene (in 1.fasta and 2.fasta) not present in other files.
/home/huiyu/orthomcl-pipeline/t/data/basic/3/groups/groups.txt contains entries (1|e 2|e 3|a) not in /tmp/orthomcl-pipeline.90SFfP/output/groups/groups.txt
not ok 4 - Expected matched returned groups file

Failed test 'Expected matched returned groups file'

at t/test_pipeline.pl line 204.

TESTING FULL PIPELINE RUN 1
README:
Test using with no splitting of fasta files.
/home/huiyu/orthomcl-pipeline/t/data/basic/1/groups/groups.txt contains entries (1|d 2|e 3|a) not in /tmp/orthomcl-pipeline.NWIJa1/output/groups/groups.txt
not ok 5 - Expected matched returned groups file

Failed test 'Expected matched returned groups file'

at t/test_pipeline.pl line 204.

TESTING COMPLIANT INPUT
TESTING FULL PIPELINE RUN 1
README:
Test non-compliant fasta input, no splitting.
ok 6 - No compliant parameter successfully caught
/home/huiyu/orthomcl-pipeline/t/data/compliant/1/groups/groups.txt contains entries (1|d 2|e 3|a) not in /tmp/orthomcl-pipeline.FqsGz5/output/groups/groups.txt
not ok 7 - Pipeline succeeded with compliant parameter. Expected matched returned groups file

Failed test 'Pipeline succeeded with compliant parameter. Expected matched returned groups file'

at t/test_pipeline.pl line 239.

/home/huiyu/orthomcl-pipeline/t/data/compliant/1/groups/groups.txt contains entries (1|d 2|e 3|a) not in /tmp/orthomcl-pipeline.FqsGz5/output/groups/groups.txt
not ok 8 - Pipeline succeeded with (default) compliant parameter. Expected matched returned groups file

Failed test 'Pipeline succeeded with (default) compliant parameter. Expected matched returned groups file'

at t/test_pipeline.pl line 248.

1..8

Looks like you failed 7 tests of 8.


Could someone please help me figure this out?
Thank you so much!
Huiyu

Add --force option on setup script

I want to be able to generate an ISO with your scripts installed on it, but every time I run the scripts I get prompted to overwrite existing software.

Can you please add a --force option to forcibly overwrite options so that I can execute the script non-interactively?

=Stage 6: Split Fasta= no fasta records identified, exiting.

Hello,

I encountered an issue while running the software. The specific output log is as follows:

Use of uninitialized value $file_count in concatenation (.) or string at /public/home/soft/orthomcl-pipeline/orthomcl-pipeline-master/bin/../scripts/orthomcl-pipeline.pl line 371.
Warning: directory "/public/home/pep_orthomcl/" already exists, are you sure you want to store data here [Y]? Starting OrthoMCL pipeline on: Thu Aug  8 16:25:17 2024
Git commit: unknown
=Stage 1: Validate Files =
Validated  files
Stage 1 took 0.00 minutes 
=Stage 2: Validate Database=
Warning: some tables exist already in database dbi:mysql:orthomcl:10.10.101.6:mysql_local_infile, user=orthomcl, database_name=orthomcl. Do you want to remove (y/n)? Executing: 'drop database orthomcl'
Executing: 'create database orthomcl'
Successfully removed old database entries
Stage 2 took 0.02 minutes 
=Stage 3: Load OrthoMCL Database Schema=
/public/home/soft/orthomclSoftware-v2.0.9/bin/orthomclInstallSchema "/public/home/soft/orthomcl-pipeline/orthomcl-pipeline-master/orthomcl.conf" "/public/home/pep_orthomcl/log/orthomclSchema.log" 1>/public/home/pep_orthomcl/log/3.loadschema.stdout.log 2>/public/home/pep_orthomcl/log/3.loadschema.stderr.log
Stage 3 took 0.02 minutes 
=Stage 4: Adjust Fasta=
Stage 4 took 0.00 minutes 
=Stage 5: Filter Fasta=
/public/home/soft/orthomclSoftware-v2.0.9/bin/orthomclFilterFasta "/public/home/pep_orthomcl/compliant_fasta" 10 20
Stage 5 took 0.00 minutes 
=Stage 6: Split Fasta=
splitting /public/home/pep_orthomcl/blast_dir/goodProteins.fasta into 4 pieces
no fasta records identified, exiting.

Upon checking, I found that the goodProteins.fasta file is empty. However, I manually verified that the files in the compliant_fasta directory do exist, which are my input files.

Here is my input code:

orthomcl-pipeline -i /public/home/pep_sequences/ -o /public/home/pep_orthomcl/ -m /public/home/soft/orthomcl-pipeline/orthomcl-pipeline-master/orthomcl.conf --yes #--nocompliant

Here is a sample format of my input files:

>Solyc00T000002.1
MPVIPLFFFLLAFVWQAAVNCVMLTLKL......
>Solyc00T000003.1
MVTIRADEISNIIRERIEQYNREVKIVNTG.....
>Solyc00T000004.1

Could it be that the name format in my protein files is causing the issue, or is there a problem with my input files? I have only one input file, which contains protein sequences for all genes of the species that can be mapped to the reference genome. Could this be affecting the process?
Could you please help me understand what might be causing this issue? I look forward to your response and appreciate your assistance.

Thank you very much.

where can i found the core gene set?

hi sir,
I have sucessfully running your scripts. but where can i found the core gene set?
my final resulr txt like this:
Number of genes seen in the following genomes:
CCBAU83666: 6266
CCBAU25509: 6218
CCBAU45436: 6097
CCBAU05684: 4798
CCBAU05631: 5411
Total genes seen: 28790
'Core' gene sets that is contained: 5 genomes has 3697 genes.
so where can i find these core gene set?

Conda installation

Do you recommend conda installation? Does conda setup the mySQL required for OrthoMCL as well?

Error executing

Hi,
I run orthomcl-pipeline with the following error. Could you help me solve it? Thanks

=Stage 9: Parse Blast Results=
cat /home/software/orthomcl-pipeline/testout/blast_results/blast_results.* > /home/software/orthomcl-pipeline/testout/blast_load/all.fasta
/home/software/orthomclSoftware-v2.0.9/bin/orthomclBlastParser "/home/software/orthomcl-pipeline/testout/blast_load/all.fasta" "/home/software/orthomcl-pipeline/testout/compliant_fasta" 1>/home/software/orthomcl-pipeline/testout/blast_load/similarSequences.txt 2>/home/software/orthomcl-pipeline/testout/log/9.parseBlast.log
Error executing command: /home/software/orthomclSoftware-v2.0.9/bin/orthomclBlastParser "/home/software/orthomcl-pipeline/testout/blast_load/all.fasta" "/home/software/orthomcl-pipeline/testout/compliant_fasta" 1>/home/software/orthomcl-pipeline/testout/blast_load/similarSequences.txt 2>/home/software/orthomcl-pipeline/testout/log/9.parseBlast.log. See logs /home/software/orthomcl-pipeline/testout/blast_load/similarSequences.txt and /home/software/orthomcl-pipeline/testout/log/9.parseBlast.log

error at Stage 10: Load Blast Results

Hi,
When I run orthmcl-pipeline at Stage 10: Load Blast Results, it reported the error below.
DBD::mysql::st execute failed: The table 'SimilarSequences' is full at /data/liuyu/Software/orthomcl/bin/orthomclLoadBlast line 39, line 12.
Can you give me some advice?
Thanks in advance.

error at Stage 10: Load Blast Results

Hi,
When I run orthmcl-pipeline at Stage 10: Load Blast Results,it reports the error below.
DBD::mysql::st execute failed: Data too long for column 'SUBJECT_ID' at row 279 at /orthomclSoftware-v2.0.9/bin/orthomclLoadBlast line 39, <F> line 12.
Can you give me some advice?
Thanks,

containing non-protein alphabet (dna)

Hi,
When I run the orthomcl-pipeline,it report the error below.
sicum.fasta contains a sequence (lycopersicum|XP_025883763.1) containing non-protein alphabet (dna) at /scripts/orthomcl-pipeline.pl line 361, <GEN5> line 22972.
And the sequence of XP_025883763.1 is NNNNNNNNNNNTNTNNNTNNNNNNNTNNNNNNNDNNNDNNDNNNNNNNDNNNNNNNDTNNNNNNDNNNDDDNNDNNDNNNNNNNNNDNNNTNNNTNNNNNNNNNTNNTNNNNNNNNNNNTTTTTNNTNTNNTNNNNTNNNTNNNNNNNKNNNKNNNNNNTNTNNNNNNNNTNTNNNNNNNTYNNNNNNNTNNNNTNNNTNNNNNNNNNSNNSNNNNNNNDDDDDDNNDNNNNNDNNNNNNNDDNNNNNNNNNNNSNSNNSNNSSSSSSNNNNNNNNNNNDDDDDNNNNNNNNNKNNKNNNNNDDDDDNNNDNNNNNNNNNNNNDNNDNNNNDNNNDNYNDNNDDNNNNNNNNNNNSNNSNNNN
Why not is identified by the software ?
Thanks,

=Stage 1: Validate Files =

Hi,
I met a mistake, how can I fix it? Thanks!!!
=Stage 1: Validate Files =
Error: file /home/guming/test_data/zbl/U.fasta contains invalid header for "lcl|CP017025.1_prot_AOH49748.1_1": files not marked as compliant but found compliant header.
Perhaps try removing --nocompliant, or checking files. at /home/guming/orthomcl/bin/../scripts/orthomcl-pipeline.pl line 334, line 1.
Validating U.fasta ... [guming@localhost orthomcl]$

failing tests with t/test.pipeline.pl

Hi,
When I want to test the pipeline, I met a problem.
"Can't load '/data/liuyu/Software/miniconda3/envs/LTR_retriever/lib/site_perl/5.26.2/x86_64-linux-thread-multi/auto/List/Util/Util.so' for module List::Util: /data/liuyu/Software/miniconda3/envs/LTR_retriever/lib/site_perl/5.26.2/x86_64-linux-thread-multi/auto/List/Util/Util.so: undefined symbol: Perl_drand48_r at /data/liuyu/Software/miniconda3/envs/LTR_retriever/lib/site_perl/5.26.2/x86_64-linux-thread-multi/XSLoader.pm line 96."
Could you please give me some advice?
Thanks in advance.
Liu

error at orthomcl-setup-database step

I successful to install both orthomcl and orthomcl-pipeline by follow this link "http://darencard.net/blog/2018-01-12-orthomcl-tutorial/"

orthomcl.conf was create without any problem.

However, when I try to set up orthomcl database with the following command and it gave me an error

/usr/bin/perl /home/wat/Desktop/software/orthomcl-pipeline/scripts/orthomcl-setup-database.pl --user orthomcl_test --password password --host localhost --database orthomcl --outfile orthomcl.conf --no-create-database

The error message was shown below.
Connecting to database orthomcl on host localhost with user orthomcl_test ...
DBI connect('orthomcl:localhost:mysql_local_infile=1','orthomcl_test',...) failed: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) at /home/wat/Desktop/software/orthomcl-pipeline/scripts/orthomcl-setup-database.pl line 71.
error connecting to database (please ensure database exists and try again) at /home/wat/Desktop/software/orthomcl-pipeline/scripts/orthomcl-setup-database.pl line 75.

Does anyone had the same problem. Any help would be appropriate.

improvement: restart from specific step in case the pipeline stopped prematurely

Hi,
I have been running orthomcl-pipeline on a cluster, and it has happened that the job would timeout or run out of memory.

At present, correct me if I am wrong, any restart of the pipeline causes all results from the previous run (including blast, which takes the longest) to be lost, as one would typically use the --yes option to answer yes to all questions.

It would be nice to have the possibility to restart at the step that failed.

Thanks

JML

Can't set up a new mysql user count

When i type mysql> GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, CREATE VIEW, INDEX, DROP on *.* to orthomcl; by logging into the MySQL server as root
i get ERROR 1133 (42000): Can't find any matching row in the user table

type flush privileges; doesn't work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.