Giter Site home page Giter Site logo

biobakery / phylophlan Goto Github PK

View Code? Open in Web Editor NEW
118.0 6.0 33.0 114.55 MB

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes

Home Page: https://huttenhower.sph.harvard.edu/phylophlan

License: MIT License

Python 83.90% Shell 2.85% Perl 13.25%
python tools biobakery phylogenetic-trees

phylophlan's Issues

write_default_configs.sh missing

I've forked phylophlan and I planning on making some unit tests, since those seem to be absent from this repo. I'm working on making a shortened version of Example 01, but am running into the error:

[e] "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan_configs/" folder does not exists
Creating folder "output_references"
Creating folder "output_references/tmp"
"low-fast" preset
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/bin/phylophlan", line 11, in <module>
    load_entry_point('PhyloPhlAn==3.0', 'console_scripts', 'phylophlan')()
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 3169, in phylophlan_main
    project_name = check_args(args, sys.argv, verbose=args.verbose)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 492, in check_args
    elif os.path.isfile(os.path.join(args.configs_folder, args.config_file)):
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)

It appears that phylophlan is looking for a config file that doesn't exist in the bioconda install. The phylophlan CLI help doc states:

                      The configuration file to load, four ready-to-use
                        configuration files can be generated using the
                        "write_default_configs.sh" script present in the
                        "configs" folder (default: None)

...but write_default_configs.sh doesn't exist.

A couple of general comments about the phylophlan code:

  1. The bash scripts in the examples use the *.py extensions for the phylophlan executables, but the executables, once installed, don't have the *.py extension.
  2. Why not use the logging package instead of creating custom info() and error() functions?
  3. Do you see any problem with adding a --max_proteins param to phylophlan_setup_database in order to reduce the execution time for testing (eg., max of 100 core proteins)?

[e] build_gene_tree crashed

My command line:

phylophlan_write_default_configs.sh
phylophlan \
    -i ../../F-06-MAG/03_modify/7_final/ \
    -d phylophlan --diversity high -f supertree_aa.cfg \
    --genome_extension .fa \
    --maas ~/Software/anaconda3/envs/phylophlan/lib/python3.9/site-packages/phylophlan/phylophlan_substitution_models/phylophlan.tsv \
    --verbose

stderr:


[e] Command '['~/Software/anaconda3/envs/phylophlan/bin/FastTree', '-quiet', '-pseudo', '-spr', '4', '-mlacc', '2', '-slownni', '-fastest', '-no2nd', '-mlnni', '4', '-lg', '-out', '~/Work/2020-09-MgAffect/Analyze/phylophlan/7_final_phylophlan/tmp/gene_tree1/p0197.tre', '7_final_phylophlan/tmp/sub/p0197.aln']' returned non-zero exit status 1.

[e] error while building gene tree
    command_line: ~/Software/anaconda3/envs/phylophlan/bin/FastTree -quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -lg -out 7_final_phylophlan/tmp/gene_tree1/p0197.tre 7_final_phylophlan/tmp/sub/p0197.aln
           stdin: None
          stdout: None
             env: {'CONDA_SHLVL': '3', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.m4a=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.oga=01;36:*.opus=01;36:*.spx=01;36:*.xspf=01;36:', 'CONDA_EXE': '~/Software/anaconda3/bin/conda', 'SSH_CONNECTION': '192.168.137.1 63214 192.168.137.128 22', '_': '~/Software/anaconda3/envs/phylophlan/bin/phylophlan', 'LANG': 'en_US.UTF-8', 'HISTCONTROL': 'ignoredups', 'HOSTNAME': 'localhost.localdomain', 'OLDPWD': '../../', 'COLORTERM': 'truecolor', 'CONDA_PREFIX': '~/Software/anaconda3/envs/phylophlan', 'DOTNET_ROOT': '/usr/lib64/dotnet', '_CE_M': '', 'XDG_SESSION_ID': '2', 'DOTNET_BUNDLE_EXTRACT_BASE_DIR': '~/.cache/dotnet_bundle_extract', 'USER': 'hwrn', 'CONDA_PREFIX_1': '~/Software/anaconda3', 'CONDA_PREFIX_2': '~/Software/anaconda3/envs/mylib', 'CONDA_PYTHON_EXE': '~/Software/anaconda3/bin/python', 'VSCODE_GIT_ASKPASS_NODE': '~/.vscode-server/bin/d2e414d9e4239a252d1ab117bd7067f125afd80a/node', 'TERM_PROGRAM': 'vscode', 'SSH_CLIENT': '192.168.137.1 63214 22', 'TERM_PROGRAM_VERSION': '1.50.1', 'TMUX': '/tmp/tmux-1000/default,129144,0', 'XDG_DATA_DIRS': '~/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share', '_CE_CONDA': '', 'VSCODE_IPC_HOOK_CLI': '/run/user/1000/vscode-ipc-f567ba04-d1be-40de-9dbe-205a254be7c3.sock', 'CONDA_PROMPT_MODIFIER': '(phylophlan) ', 'MAIL': '/var/spool/mail/hwrn', 'VSCODE_GIT_ASKPASS_MAIN': '~/.vscode-server/bin/d2e414d9e4239a252d1ab117bd7067f125afd80a/extensions/git/dist/askpass-main.js', 'SHELL': '/bin/bash', 'TERM': 'screen', 'TMUX_PANE': '%0', 'SHLVL': '4', 'VSCODE_GIT_IPC_HANDLE': '/run/user/1000/vscode-git-723a0c4359.sock', 'LOGNAME': 'hwrn', 'DBUS_SESSION_BUS_ADDRESS': 'unix:path=/run/user/1000/bus', 'GIT_ASKPASS': '~/.vscode-server/bin/d2e414d9e4239a252d1ab117bd7067f125afd80a/extensions/git/dist/askpass.sh', 'XDG_RUNTIME_DIR': '/run/user/1000', 'PATH': '~/Software/anaconda3/envs/phylophlan/bin:~/.vscode-server/bin/d2e414d9e4239a252d1ab117bd7067f125afd80a/bin:~/Software/anaconda3/condabin:~/.local/bin:~/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:~/.dotnet/tools', 'CONDA_DEFAULT_ENV': 'phylophlan', 'HISTSIZE': '1000', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s'}

[e] Command '['~/Software/anaconda3/envs/phylophlan/bin/FastTree', '-quiet', '-pseudo', '-spr', '4', '-mlacc', '2', '-slownni', '-fastest', '-no2nd', '-mlnni', '4', '-lg', '-out', '7_final_phylophlan/tmp/gene_tree1/p0197.tre', '7_final_phylophlan/tmp/sub/p0197.aln']' returned non-zero exit status 1.

[e] error while building gene tree
    {'program_name': '~/Software/anaconda3/envs/phylophlan/bin/FastTree', 'params': '-quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -lg', 'output': '-out', 'command_line': '#program_name# #params# #output# #input#'}
    PROTCATLG
    7_final_phylophlan/tmp/sub/p0197.aln
    7_final_phylophlan/tmp/gene_tree1
    p0197.tre

[e] Command '['~/Software/anaconda3/envs/phylophlan/bin/FastTree', '-quiet', '-pseudo', '-spr', '4', '-mlacc', '2', '-slownni', '-fastest', '-no2nd', '-mlnni', '4', '-lg', '-out', '7_final_phylophlan/tmp/gene_tree1/p0197.tre', '7_final_phylophlan/tmp/sub/p0197.aln']' returned non-zero exit status 1.

[e] build_gene_tree crashed

Then I tried

FastTree -quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -lg -out 7_final_phylophlan/tmp/gene_tree1/p0197.tre 7_final_phylophlan/tmp/sub/p0197.aln

stderr:

Non-unique name 'maxbin2_107.032' in the alignment

In 7_final_phylophlan/tmp/sub/p0197.aln (fisrt 20 line):

>76_sub
GTRLKMIFYLMMSGIPPGLAAEKDR
>55_sub22
SPERHEGHHLGHAPVPAGL------
>maxbin2_107.027_sub
GTRLKMIYYLMLAGIPPGIAAEKDR
>82_sub
GTRLKMIYYLMLAGIPPGLAAEKDR
>maxbin2_107.013_sub
GTRLKMIYYLMLAGIPPGLAAEKDR
>maxbin2_107.028_sub
GTRLKMVYYLMLAGIPPGLAAEKDR
>maxbin2_107.022
GARLKMIYYLMLAGIPPGLAAEKDR
>maxbin2_107.032
GTRLKMIYYLMMADIPPGLAAEKDR
>maxbin2_107.032
GARVREGHHLGHPTLAHYLRARHDR
>maxbin2_107.026
GTRLKMAYYLMLAGIPPGLAAEKDR

What should I do to solve this error? And can I know which file are generated by which command on external software?

Thanks for your program, and thanks for your advises!

Both Nucleotide Databases Appear Empty

Hello! Cool program, trying to get it to work with the supermatrix_nt.cfg file on some of my SAGs and MAGs and ref seqs acquired through GenBank. I am having issues at the first makeblastdb stage. I have tried both options for -d and get the following errors for each:

[e] Command '['/home/jmcgonigle/.conda/envs/phylophlan/bin/makeblastdb', '-parse_seqids', '-dbtype', 'nucl', '-in', 'phylophlan_databases/amphora2/amphora2.fna', '-out', 'phylophlan_databases/amphora2/amphora2']' returned non-zero exit status 1.
[e] Command '['/home/jmcgonigle/.conda/envs/phylophlan/bin/makeblastdb', '-parse_seqids', '-dbtype', 'nucl', '-in', 'phylophlan_databases/phylophlan/phylophlan.fna', '-out', 'phylophlan_databases/phylophlan/phylophlan']' returned non-zero exit status 1.

Seems related to the fact that both databases .fna files are empty. I added the --verbose option to see where these files are downloading from and they seem empty from the source if I download them myself and look at the contents.

UnboundLocalError: local variable 'input_fna_clean' referenced before assignment

Hi developer,
I met the error below. How could I deal with it? Besides, for arguments -d, after first time I ran the command, the database file phylophlan.dmnd was produced. Could I just specified the path of database like /path/phylophlan_databases/phylophlan/phylophlan.dmndfor my next running command? It would be appreciated for your help.

(phylophlan) [root@instance-xvlawteu phylophlan3]# phylophlan -i /home/prodigal/output/0827fna/ --accurate --diversity low -d phylophlan -f supermatrix_aa.cfg -t a --nproc 2 --output_folder /home/phylophlan3/phy_output/ --proteome_extension faa
Loading files from "/home/prodigal/output/0827fna"
Traceback (most recent call last):
File "/root/anaconda3/envs/phylophlan/bin/phylophlan", line 10, in
sys.exit(phylophlan_main())
File "/root/anaconda3/envs/phylophlan/lib/python3.8/site-packages/phylophlan/phylophlan.py", line 3213, in phylophlan_main
standard_phylogeny_reconstruction(project_name, configs, args, db_dna, db_aa)
File "/root/anaconda3/envs/phylophlan/lib/python3.8/site-packages/phylophlan/phylophlan.py", line 2910, in standard_phylogeny_reconstruction
if input_fna_clean:
UnboundLocalError: local variable 'input_fna_clean' referenced before assignment

Can't locate phylophlan_write_default_configs.sh

Hi Francesco!

I just download Phylophlan 3 through Conda but can't seem to locate phylophlan_write_default_configs.sh. It is not exported in path.

Could you check this out?

Many thanks,
Sander

Fragmentary entries not accepted by RAxML

Hello Francesco,
thank you for great documentation for PhyloPhlAn. I looked for solution of my issue but I was not able to find it so I started new one. When analyzing couple of MAGs, I can see gaps in subsampled alignments:

>3300_18
YVVDLGDLTSWPDDRIG
>3376_74
-----------------
>33012_7
FAVDLGDLSSWPGDRIS

Later, RAxML outputs files (e.g. RAxML_info.p0111.tre) in the gene_tree2 folder with error for each MAG with gaps:

ERROR: Sequence 3376_74 consists entirely of undetermined values which will be treated as missing data

My command is below:

phylophlan -i /mnt/DATA01/user/mags/ \
-d phylophlan \
--diversity high \
-f /mnt/DATA01/user/mags/supertree_aa.cfg \
--proteome_extension .faa \
--fast \
--nproc 20 \
--maas /mnt/DATA01/user/mags/phylophlan.tsv \
--submod_folder /mnt/DATA01/user/mags/ \
--configs_folder /mnt/DATA01/user/mags/

I guess remove_fragmentary_entries works correctly, marker is not completely missing in non-subsampled alignment:

>3300_18
EKN-ENFDWNAFENY-DQPEQITEAYDKTLSNVAVGEVVEGTVTAITKREVLVNIYSEGV
IPVSEFRYNP---DLKVDKIEVYVESAEDKNQLALHKKARQLKSDRVNEALEKDEIIKGY
IKCRTKMIVDVGIEALGQIDVKPIRDYDIYVDKTMEFKVVKINQEFRNVVVHKALIEAEL
EAQKQVIMSKLEKQILETKNITSYVVDLGVDLIITDLSWGRVNHPEEIVSLDQKINVILD
FDDQKKRIAGLQLTPHEALDPNLKVGDKVKGRVVVMADYAVEIAPVEIVEMSSQHLRSAQ
EFMKVGDEVEAVILTLDREERKMSGIKQLTPDENIETKYPVTKCTAKVRNFNFVVEIEEG
IDGLIISLSTKKVKHPGEFTQVADIDVVVEIDKENRRLSLHKQLEENWNGFEAQFPVESI
HEGTITEMTDKAVVALGNIEGFCPARQLVEDG-----TTPKVGDKLNFKVIEFSKATKRI
TLLRTYDDARREA-AAAATKTKASEKTTLGDI--------
>3376_74
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
--------------E------------IVLGEIVDITDFAVRIGPTDLLQV---------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
----------------------------------------
>33012_7
---ENDL--LSLEA------TM---LEN-------GQLVEGTVVRVDKDEVLIDIYSEGV
IPPRELSN-PTEV-VSLERIEAVVLQREDKERLVLKKRAQYERASKIEEVKEADGVVTGS
VIEVVKLIVDIGLRGLALVELRRVRDLQPYIGTNVEAKIIELDKNRNNVVLRRAWLEENQ
KEQREEFLHDLRPEVRKVSSVVNFAVDLGMDLIVSELSWKHVDHPGSIVTVGDEIDQVLE
IDMSRERISSLATQQDQEFATAHQVGELVYGRVTKLVPFAVQVGEIEVIEMSAHHVELPE
QVVTPAEELWVKIIDLDLQRRRISSIKQ-------------------A---------AEG
-----VA---------AE-----E-----HFGE-----------ADD-------------
-EGNIG----------G--DDSE-------------------------------------
----------------------------------------

Please, do you have any suggestion how to proceed with RAxML?
Thank you in advance!

phylophlan_write_default_configs.sh doesn't write threads parameter for RAxML

Dear Phylophlan Team!

Thanks for 3.0 release and all efforts to make it work better.
phylophlan_write_default_configs.sh not adding threads setting into configs. Spend several hours to wait for the tree to be processed while noticed RAxML working single thread.

Default config section looks as follows:

[tree2]
program_name = /usr/bin/raxmlHPC-PTHREADS-SSE3
params = -p 1989 -m GTRCAT
database = -t
input = -s
output_path = -w
output = -n
version = -v
command_line = #program_name# #params# #threads# #database# #output_path# #input# #output#

need to add
threads = -T

/phylophlan_configs/" folder does not exists and unable to download phylophlan_databases.txt?dl=1

Hi

I would like to use phylophlan. but I got an error message when use the command as follows:

phylophlan -i AR_and_Bac/ -d phylophlan --diversity low -f supermatrix_nt.cfg --nproc 42

[e] "/home/emma/anaconda2/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_configs/" folder does not exists
[e] unable to download "https://www.dropbox.com/s/x7cvma5bjzlllbt/phylophlan_databases.txt?dl=1"

I installed the phylophlan using the following two commands

1: conda create -n "phylophlan" -c bioconda phylophlan=3.0
then: conda activate phylophlan
2: phylophlan_write_default_configs.sh [output_folder]
and then test:
phylophlan --version
PhyloPhlAn version 3.0.51 (11 May 2020)

but i got error when try basic usage

I don't know the reason. could you help me fix this ?

Thank you very much!
Looking forwar for your reply!

occured core dumped

Hi Dr. Asnicar,

When I used multi threads to run phylophlan, it occured that error while mapping:

_[e] Command '['/root/anaconda3/envs/phylophlan/bin/diamond', 'blastp', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', '/mnt/hgfs/share/phylophlan3/output/0827faa_phylophlan/tmp/clean_aa/Ahniella_affigens_D13_T_GCA_003015185.1_prodigal_protein.faa', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', '/mnt/hgfs/share/phylophlan3/output/0827faa_phylophlan/tmp/map_aa/Ahniella_affigens_D13_T_GCA_003015185.1_prodigal_protein.b6o.bkp']' died with <Signals.SIGABRT: 6>.

[e] gene_markers_identification crashed_

And them I run diamond blastp --threads 6 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0 --query /mnt/hgfs/share/phylophlan3/0827faa/Dokdonella_koreensis_DS-123_T_GCA_001632775.1_prodigal_protein.faa --db phylophlan_databases/phylophlan/phylophlan.dmnd --out /mnt/hgfs/share/phylophlan3/output/0827faa_phylophlan/tmp/map_aa/Dokdonella_koreensis_DS-123_T_GCA_001632775.1_prodigal_protein.b6o.bkp

It occured that Aborted (core dumped):
#CPU threads: 6
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /mnt/hgfs/share/phylophlan3/output/0827faa_phylophlan/tmp/map_aa
Opening the database... [0.000605s]
#Target sequences to report alignments for: unlimited
Opening the input file... [0.000501s]
Opening the output file... [0.000501s]
Loading query sequences... [0.005972s]
Masking queries... [0.060089s]
Building query seed set... [0.000322s]
Algorithm: Double-indexed
Building query histograms... [0.03639s]
Allocating buffers... [5.8e-05s]
Loading reference sequences... [0.133974s]
Building reference histograms... [4.28004s]
Allocating buffers... [6.8e-05s]
Initializing temporary storage... [0.057323s]
Processing query chunk 0, reference chunk 0, shape 0, index chunk 0.
Building reference index... [1.41807s]
Building query index... [0.018558s]
Building seed filter... [0.049821s]
Searching alignments... No such file or directory
Error: Error writing file /mnt/hgfs/share/phylophlan3/output/0827faa_phylophlan/tmp/map_aa/diamond-tmp-E9ek5S
terminate called after throwing an instance of 'File_write_exception'
what(): Error writing file /mnt/hgfs/share/phylophlan3/output/0827faa_phylophlan/tmp/map_aa/diamond-tmp-E9ek5S
Aborted (core dumped)

It won't happened when I use single thread.

Phylophlan_metagenomic.py : Problem to generate output files

Hi,

First of all thank you for this very useful tool.

I have a little problem using phylophlan_metagenomic.py :
3 outputs are supposed to be generated. I have the first two ( output_sketches/ and output_dists) but the third isn't present (output.tsv)

Additionnaly, here's the error message when I try to re launch the script:

/home/vdarbot/.conda/envs/phylophlan/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3334: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/vdarbot/.conda/envs/phylophlan/lib/python3.8/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/home/vdarbot/.conda/envs/phylophlan/bin/phylophlan_metagenomic", line 10, in <module>
    sys.exit(phylophlan_metagenomic())
  File "/home/vdarbot/.conda/envs/phylophlan/lib/python3.8/site-packages/phylophlan/phylophlan_metagenomic.py", line 808, in phylophlan_metagenomic
    binn_2_dists[binn].append(float(rc[2]))
KeyError: ''

Thank's by advance for your help

Problem with Example 01: S. aureus

Hi developer,I met the error below. How could I deal with it?
I installed it with Conda with command: conda create -n "nsqphy" -c bioconda phylophlan=3.0
generating the four default configuration files with command : phylophlan_write_default_configs.sh phylo

when I follow the instructions of Example 01: S. Aureus at step 2
phylophlan_setup_database -g s__Staphylococcus_aureus
-o examples/01_saureus/
--verbose 2>&1 | tee logs/phylophlan_setup_database.log
I got the error information immediately:
[e] ciao

image

Tree of life for MetaPhlAn3 species

Hi,

I regularly run bacterial composition analysis using UniFrac distances based on the relative abundances returned by MetaPhlAn. The UniFrac algorithm requires a phylogenetic tree describing the distance between the phylogenetic analysis and so far I always used the tree as part of the Bioconductor package CuratedMetagenomicData that is returned by its function getMetaphlanTree. According to its help, it tree was created using PhyloPhlAn by you, @fasnicar .

Due to the update to the new ChocoPhlAn database, there are now many more species in the database of MetaPhlAn and the tree present in CuratedMetagenomicData does not included all species. Therefore, I extracted all genomes from the MetaPhlAn pickle file (of database version v293), selected one representative per species, and ran PhyloPhlAn3 with the option --diversity high and --accurate using the PhyloPhlAn marker database. After trimming and everything, I concatenate all alignments into one super-alignment and ran IQtree to get the tree

PhyloPhlAn3 returns a tree including most of the input species. A few of them were discarded because of the low information content of the alignment of these species. Most of the closely related species form species-specific clades but the there is definitely conformity that all archaea form the a single monophyletic clade etc.

My question: did you do something differently to have such a clean tree of life based on the species present in MetaPhlAn?

gene_markers_identification crashed

Dear Phylophlan Team:
i was downloaded protein file form NCBI and change the suffix to '.faa', files includes "Candidatus Methanomethylophilus alvus Mx1201"(https://www.ncbi.nlm.nih.gov/nuccore/NC_020913.1?report=fasta), "Candidatus Methanoplasma termitum strain MpT1 chromosome"(https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP010070.1?report=fasta) and "Candidatus_Methanomassiliicoccus_intestinalis_Issoire_Mx1"(https://www.ncbi.nlm.nih.gov/nuccore/NC_021353.1?report=fasta). the way of download can see: image
then i run:

phylophlan_write_config_file -d a \
    -o 02_tol.cfg \
    --db_aa diamond \
    --map_dna diamond \
    --map_aa diamond \
    --msa mafft \
    --trim trimal \
    --tree1 iqtree \
    --verbose 2>&1 | tee phylophlan_write_config_file.log

and then:

phylophlan -i bins \
    -d phylophlan \
    -f 02_tol.cfg \
    --diversity high \
    --fast \
    -o total \
    -t a \
   --verbose

i got error,

phylophlan -i bins \
>     -d phylophlan \
>     -f 02_tol.cfg \
>     --diversity high \
>     --fast \
>     -o total \
> -t a \
> --verbose
PhyloPhlAn version 3.0.51 (11 May 2020)

Command line: /home/wusu/anaconda3/envs/micro/bin/phylophlan -i bins -d phylophlan -f 02_tol.cfg --diversity high --fast -o total -t a --verbose

Automatically setting "input=bins" and "input_folder=/mnt/d/wudb"
Creating folder "total/tmp"
"high-fast" preset
Setting "sort=True" because "database=phylophlan"
Arguments: {'input': 'bins', 'clean': None, 'output': 'total', 'database': 'phylophlan', 'db_type': 'a', 'config_file': '02_tol.cfg', 'diversity': 'high', 'accurate': False, 'fast': True, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 1, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 1, 'trim': 'greedy', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.9, 'subsample': <function phylophlan at 0x7f19f2da59e0>, 'unknown_fraction': 0.3, 'scoring_function': <function trident at 0x7f19f2da7050>, 'sort': True, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.67, 'min_num_entries': 4, 'maas': None, 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': '/mnt/d/wudb/bins', 'data_folder': 'total/tmp', 'databases_folder': 'phylophlan_databases/', 'submat_folder': '/home/wusu/anaconda3/envs/micro/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': '/home/wusu/anaconda3/envs/micro/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/', 'configs_folder': 'phylophlan_configs/', 'output_folder': '', 'genome_extension': '.fna', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "02_tol.cfg"
Checking configuration file
Checking "/home/wusu/anaconda3/envs/micro/bin/diamond"
Checking "/home/wusu/anaconda3/envs/micro/bin/mafft"
Checking "/home/wusu/anaconda3/envs/micro/bin/trimal"
Checking "/home/wusu/anaconda3/envs/micro/bin/iqtree"
"db_aa" database "phylophlan_databases/phylophlan/phylophlan.dmnd" present
Loading files from "/mnt/d/wudb/bins"
Loading files from "/mnt/d/wudb/bins"
Checking 3 inputs
Checking "/mnt/d/wudb/bins/Candidatus Methanomethylophilus alvus Mx1201.faa"
Checking "/mnt/d/wudb/bins/Candidatus Methanoplasma termitum strain MpT1 chromosome.faa"
Checking "/mnt/d/wudb/bins/Candidatus_Methanomassiliicoccus_intestinalis_Issoire_Mx1.faa"
Creating folder "total/tmp/clean_aa"
Cleaning 3 inputs
Cleaning "/mnt/d/wudb/bins/Candidatus Methanomethylophilus alvus Mx1201.faa"
"total/tmp/clean_aa/Candidatus Methanomethylophilus alvus Mx1201.faa" generated in 0s
Cleaning "/mnt/d/wudb/bins/Candidatus Methanoplasma termitum strain MpT1 chromosome.faa"
"total/tmp/clean_aa/Candidatus Methanoplasma termitum strain MpT1 chromosome.faa" generated in 0s
Cleaning "/mnt/d/wudb/bins/Candidatus_Methanomassiliicoccus_intestinalis_Issoire_Mx1.faa"
"total/tmp/clean_aa/Candidatus_Methanomassiliicoccus_intestinalis_Issoire_Mx1.faa" generated in 0s
Loading files from "total/tmp/clean_aa"
Creating folder "total/tmp/map_aa"
Mapping "phylophlan" on 3 inputs (key: "map_aa")
Mapping "total/tmp/clean_aa/Candidatus Methanomethylophilus alvus Mx1201.faa"

[e] Command '['/home/wusu/anaconda3/envs/micro/bin/diamond', 'blastp', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'total/tmp/clean_aa/Candidatus', 'Methanomethylophilus', 'alvus', 'Mx1201.faa', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'total/tmp/map_aa/Candidatus', 'Methanomethylophilus', 'alvus', 'Mx1201.b6o.bkp']' returned non-zero exit status 1.

[e] cannot execute command
    command_line: /home/wusu/anaconda3/envs/micro/bin/diamond blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0 --query total/tmp/clean_aa/Candidatus Methanomethylophilus alvus Mx1201.faa --db phylophlan_databases/phylophlan/phylophlan.dmnd --out total/tmp/map_aa/Candidatus Methanomethylophilus alvus Mx1201.b6o.bkp
           stdin: None
          stdout: None
             env: {'LESSOPEN': '| /usr/bin/lesspipe %s', 'CONDA_PROMPT_MODIFIER': '(micro) ', 'USER': 'wusu', 'SHLVL': '1', 'HOME': '/home/wusu', 'CONDA_SHLVL': '2', '_CE_M': '', 'WSL_DISTRO_NAME': 'Ubuntu', 'LOGNAME': 'wusu', 'NAME': 'wusu', '_': '/home/wusu/anaconda3/envs/micro/bin/phylophlan', 'TERM': 'xterm-256color', '_CE_CONDA': '', 'PATH': '/home/wusu/anaconda3/envs/micro/bin:/home/wusu/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/d/rtools40/usr/bin:/mnt/d/bug/BugBase:/mnt/d/移动Linux软件/LxRunOffline-v3.3.3:/mnt/d/FastTree/mafft/mafft-win:/mnt/d/FastTree:/mnt/d/java/bin:/mnt/c/Users/wusu/AppData/Local/Microsoft/WindowsApps:/mnt/d/rtools40/usr/bin:/mnt/d/Rnew/R:/mnt/d/Rnew/R/bin/x64:/snap/bin', 'LANG': 'C.UTF-8', 'CONDA_PREFIX_1': '/home/wusu/anaconda3', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:', 'CONDA_PYTHON_EXE': '/home/wusu/anaconda3/bin/python', 'SHELL': '/bin/bash', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'CONDA_DEFAULT_ENV': 'micro', 'PWD': '/mnt/d/wudb', 'CONDA_EXE': '/home/wusu/anaconda3/bin/conda', 'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop', 'CONDA_PREFIX': '/home/wusu/anaconda3/envs/micro', 'HOSTTYPE': 'x86_64', 'WSLENV': ''}

[e] Command '['/home/wusu/anaconda3/envs/micro/bin/diamond', 'blastp', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'total/tmp/clean_aa/Candidatus', 'Methanomethylophilus', 'alvus', 'Mx1201.faa', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'total/tmp/map_aa/Candidatus', 'Methanomethylophilus', 'alvus', 'Mx1201.b6o.bkp']' returned non-zero exit status 1.

[e] error while mapping
    {'program_name': '/home/wusu/anaconda3/envs/micro/bin/diamond', 'params': 'blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0', 'input': '--query', 'database': '--db', 'output': '--out', 'version': 'version', 'command_line': '#program_name# #params# #input# #database# #output#'}
    total/tmp/clean_aa/Candidatus Methanomethylophilus alvus Mx1201.faa
    phylophlan_databases/phylophlan/phylophlan.dmnd
    total/tmp/map_aa
    Candidatus Methanomethylophilus alvus Mx1201.b6o.bkp
    1
    True

[e] Command '['/home/wusu/anaconda3/envs/micro/bin/diamond', 'blastp', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'total/tmp/clean_aa/Candidatus', 'Methanomethylophilus', 'alvus', 'Mx1201.faa', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'total/tmp/map_aa/Candidatus', 'Methanomethylophilus', 'alvus', 'Mx1201.b6o.bkp']' returned non-zero exit status 1.

[e] gene_markers_identification crashed      

about phylophlan info:

conda list | grep phylophlan

phylophlan 3.0 py_5 bioconda

phylophlan --version

PhyloPhlAn version 3.0.51 (11 May 2020)

What should I do next? thanks.

Any places could download the Built reference phylogeny

Are any places we could download the reference phylogeny built in example2.

The software work fine. But if I want to use the tree directly because downloading and rebuilding are too computational expensive, is there any way to download it.

Thanks!

Minimum quality for including MAGs?

I am trying to build a de novo tree just using a set of MAG's we binned from a dataset. We have 295 of these at the species level with completeness > 50 % and redundancy < 10 %

We built the tree using phylophlan -d phylophlan -f supermatrix_aa.cfg --diversity high --accurate. But the tree seems quite sensitive to the way proteins are called, for example using only full proteins (prodigal with -c flag) or all proteins (including partial ones). Most clades seem stable, but some are quite variable.

When building de novo trees with just MAG's, without reference genomes, is there a threshold quality that is recommended?

Phylophlan_metagenomic

My output for a MAG shows Average distance of 0.35328 (from output_metagenomic.tsv) with its closest SGBs. What does it mean? How do we assign taxonomic label based on this value (Threshhold is <5%)? Can u please make it clear.
Thanks
Jitesh

verison

Hey,

I have been using phylophlan2, thanks for the cool software, I installed it in a python 3 environment using:
conda install -c bioconda phylophlan=3.0

But when I check the version:
phylophlan --version
PhyloPhlAn version 0.43 (2 March 2020)

I guess this is still v0.43 of phylophlan3?

Cheers

question about example 01 S. aureus, Step 4. Add S. aureus reference genomes

when I use the following code, I get an error

../../phylophlan.py \
    -i input_references \
    -o output_references \
    -d s__Staphylococcus_aureus \
    -t a -f references_config.cfg \
    --nproc 10 \
    --subsample twentyfivepercent \
    --diversity low \
    --fast \
    2>&1 |tee logs/phylophlan__reference_genomes__s__Staphylococcus_aureus.log
"output_references/tmp/sub/A0A2C9TN08.aln" generated in 4s
Concatenating alignments
Alignments concatenated "output_references/input_references_concatenated.aln" in 2s
Building phylogeny "output_references/input_references_concatenated.aln"

[e] Command '['FastTreeMP', '-quiet', '-mlacc', '2', '-slownni', '-spr', '4', '-fastest', '-mlnni', '4', '-no2nd', '-lg', '-out', '/public/home/sample_lib/ckzhu/software/phylophlan/phylophlan/phylophlan/examples/01_saureus/output_references/input_references.tre', 'output_references/input_references_concatenated.aln']' returned non-zero exit status 1.

[e] error while executing
    command_line: FastTreeMP -quiet -mlacc 2 -slownni -spr 4 -fastest -mlnni 4 -no2nd -lg -out /public/home/sample_lib/ckzhu/software/phylophlan/phylophlan/phylophlan/examples/01_saureus/output_references/input_references.tre output_references/input_references_concatenated.aln
           stdin: None
          stdout: None

Then I type following code again, I get an other error

FastTreeMP -quiet -mlacc 2 -slownni -spr 4 -fastest -mlnni 4 -no2nd -lg -out /public/home/sample_lib/ckzhu/software/phylophlan/phylophlan/phylophlan/examples/01_saureus/output_references/input_references.tre output_references/input_references_concatenated.aln
Wrong number of characters for GCA_000543025: expected 1257 but have 712 instead.
This sequence may be truncated, or another sequence may be too long.

so I check the input_references_concatenated.aln file

>GCA_000543025
VAVVRQNVSAASNVAIDKVTPPFTKSHNVSPNGKNITATTRTTLLKSLSQTENRMMNSDS
ANEQANPAFLAIHFLNANIKINVIVAAKSHLISVNTNTNNVHKAILAREFEAAMQSDAAV
LIRTQPTVEIFTVATLRKNAKVAHSALDSIFLLASIVVLFISHFIMKIFTLRALNSEDAK
GTSDKIQSEETNQQKITTKDITHDQVQYHNRWNNNAAYTINQNRNFNFALKHIPTNFTIM
KRSSVHLFLDMGV------FMVNKAVSANERFQQQNDAANNGSVQFPNHQNDTTSANEQY
QQQNDAANQTRVDVANTVELMVILASYSAFKSQTKIAIIQFQIACVTQAVIYEKPWLSML
YRAYNNTAMTTYNVRVYSSQAGNFWRAYNNTSMSTYDILIYPSEVSAFTSAAFTGKHHES
EKSYCNKSRREETTNSLFSTLLGFVISHRYKRPLFSTLLRFVINFEYYESMLASFYGVIA
SHRQKEPHATSKFKQGEVWSKTIQYQYFTEDETYTAQEAAASFKSTHPNSESRAMQCYEF
EEFRNEVKSNIVAVLTFSMIEWHYRRAIERAVLQPSIIEKEFEGHSIERNLIYRKNKLYE
AMAMDNTNSDTTVQDTNVANNGLSAQASGSATSVSPQTGNTVSATTNNGGDAAYASGTDF
ANTDIAFDYETDKPVKDTYTPNDSVNENGLVTDTANTTNTVETITKAKATVA
>GCA_900041155
VVLGRQNVSAASDAAIERIKPPFTKSHNVSPNDKNIASAAKTALLKNLSQTKDRMMNSNG
ANKQANPAFLAIHFLNANTKINVIVAANTQSVSANTNTSNVHQALLIREFEAAMESDAAV
LIRTQSIVEIYAVVTFRKKTKVAHNTLNTIFLLASIVVLFISHFIMKIFTLRALNSEDAK
GTSDNIQSEETNQQKITTKDITHDQVQYHNRWNNNAAYTINQNWYDKGDKGQSFKVRENR
NFNFALKHIPTNFTIMKRSSVHLFLDMGFMANKAVSANERFQQQNDAANSANEQYQQQND
AANQTRVDESNAVQFQIACVTQAVIYEKPWLSMLELMVILASYSAFKSQTKIAIIWRAYN
NTSMSTYDILIYPSEVSAFYRAYNNTAMTTYNVRVYSSQAGNFYRAYNNTAMTTYNVRVY
SNQASNFWKSFDRHSLTVFDILIWSSEVSSYTSAAFTGKHHESEKSYCNKSRREETTNSL
FSTLLGFVISHRYKRPLFSTLLRFVINFEYYESMLASFYGVIASHRQKEPHATSKFKQGE
VWSKTIQYQYFTEDETYTAQEAAASFKSTHPNSESRAMQCYEFEEFRNEVNTSEHYRRAI
ERAVSKEFEHYAKNPSKEEHQEPTTYQTNNTYANTCSAENFYKAKYSRQTTQHVIGIMAL
TLLSKKFKQGEVWSKEGKTVCSIQYQAYTLDDQVMFIEDAIVATLKLEKSKSNIVAVLTF
SMIEWHYRRAIERAVLQPSIIEKEFEGHSIERNLIYRKNKLYEAMAMDNTQGESLGHNTN
VDTSDISSQTSVGVMPVPSSSAKSAATNTNDDRDAAYISGTDFANVDVGFDYESDKQIKD
TFSPEDSVNENRLVADMVDATNTIEALAQANNITA
...

I am a novice, I am learning all your examples, can you give me some help

Default for --min_num_markers

What is the default for the --min_num_markers parameter? The wiki says it's 100 & 34 respectively for the phylophlan and amphora2 marker sets respectively, however phylophlan -h shows that it's set to 1 by default

When running it with the phylophlan database using -d phylophlan, --min_num_markers called by phylophlan still seems to default to 1

Tree of life at nucleotide level

Is there a way to make the tree of life using phylophan database at the nucleotide level? The default phylophan database only has a protein database from my understanding.

database "phylophlan" not found in "phylophlan_databases"

I've installed phylophlan (the version in today's github repo (2020-04-21), and I'm attempting to run Example-02:-Tree-of-life.

I've run phylophlan_get_reference, and created the config file:

phylophlan_write_config_file -d a \ -o 02_tol.cfg \ --db_aa diamond \ --map_dna diamond \ --map_aa diamond \ --msa mafft \ --trim trimal \ --tree1 iqtree \ --verbose 2>&1 | tee phylophlan_write_config_file.log

When I run phylophlan:

phylophlan -i input_genomes \ -d phylophlan \ -f 02_tol.cfg \ --diversity high \ --fast \ -o output_tol \ --nproc 8 \ --verbose 2>&1 | tee logs/phylophlan.log

I get this error:

[e] database "phylophlan" not found in "phylophlan_databases" Available databases in "phylophlan_databases":

Am I supposed to put something in phylophlan_databases before I run this example? Bear with me, I'm a sysadmin rather than a biologist.

Thanks,
Matthew Cahn

Can not download the whole reference genomes.

Dear Developers,

When I download the reference genome using the command "phylophlan_get_reference -g all -o input_genomes/ -n 1 --verbose 2>&1 | tee logs/phylophlan_get_reference.log" , it usually stopped dowloading before finished.

Downloading file of size: 1.16 MB
1.16 MB 100.06 % 0.48 MB/sec 0 min -0 sec
Downloading 1 reference genomes for k__Bacteria|p__Candidatus_Veblenbacteria|c__Candidatus_Veblenbacteria_unclassified|o__Candidatus_Veblenbacteria_unclassified|f__Candidatus_Veblenbacteria_unclassified|g__Candidatus_Veblenbacteria_unclassified|s__Candidatus_Veblenbacteria_bacterium_RIFOXYC2_FULL_42_11
Downloading "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/822/985/GCA_001822985.1_ASM182298v1/GCA_001822985.1_ASM182298v1_genomic.fna.gz" to "input_genomes/GCA_001822985.fna.gz"
Downloading file of size: 0.23 MB
0.23 MB 100.10 % 0.17 MB/sec 0 min -0 sec
Downloading 1 reference genomes for k__Bacteria|p__Candidatus_Veblenbacteria|c__Candidatus_Veblenbacteria_unclassified|o__Candidatus_Veblenbacteria_unclassified|f__Candidatus_Veblenbacteria_unclassified|g__Candidatus_Veblenbacteria_unclassified|s__Candidatus_Veblenbacteria_bacterium_RIFOXYD1_FULL_43_11
Downloading "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/823/015/GCA_001823015.1_ASM182301v1/GCA_001823015.1_ASM182301v1_genomic.fna.gz" to "input_genomes/GCA_001823015.fna.gz"
Downloading file of size: 0.19 MB
0.05 MB 24.20 % 0.04 MB/sec 0 min 4 sec

I do not know if it is becasue the connection with ncbi stopped or other reason. How can I do for that?

Documentation does not say how to download databases

Hello

Section https://github.com/biobakery/phylophlan/wiki#databases says "PhyloPhlAn 3.0 is able to automatically download two databases of universal markers for prokaryotes:" but does not then go on to say how phylophlan does this, or what commands a user needs to execute to download those databases.

When installing from conda, no databases come with the install:

phylophlan --database_list --diversity low
Available databases in "phylophlan_databases/":

which tree should be used

Thank you for developing the amazing tool.

After I run it, which output file should I use? And why the raxml_bestree_final.tre didn't show branch value?

Thanks very much!

New feature of Phylophlan3

Hi, Thanks for your help on installing Phylophlan3

We are able to run the pipeline. I have some more questions about the new feature of Phylophlan3.

One feature of phyloplan3 is to integrate new genomes to tree of life. We want to integrate a set of genomes into tree of life. The tutorial of tree-of-life contains 17000 genomes, which is too big. The 3171 genomes seems a good size. I am wondering how this can be done. Where we can download this 3171 genomes.

https://huttenhower.sph.harvard.edu/phylophlan
FEATURES
Completely automatic, as the user needs only to provide the (unannotated) protein sequences of the input genomes (as multifasta files of peptides – not nucleotides)

The possibility of integrating new genomes in the already reconstructed most comprehensive tree of life (3,171 microbial genomes)

Thank you!
Jie

Using manually downloaded database

Thanks for the great looking tool. My question/issue relates to phylophlan metagenomic: Is there a simple way to download, extract, and point to the database manually?

Have spent the last day trying to get Phylophlan_metagenomic working but keep getting stuck with the database.

My cloud instance can't seem to complete the download from within the program without having the occasional connectivity drop and the download breaking, so I just keep having to restart and hope (so far to no avail)

I can easily manually download the .tar file with wget -c to avoid issues of connectivity loss, but then can't seem to find a way for the tool to see that the database exists

I've tried the following
phylophlan_metagenomic -i myfolder -o output-folder --nproc 8 -d SGB.DEC19 --database_folder place/with/the/database/

and get the following error
[e] invalid number of URLs for "SGB.DEC19" in the downloaded file

Looking at the code, I can see a check for whether the database exists, or if the md5 exists

   if (    not os.path.exists(os.path.join(args.database_folder, args.database)) or
               not os.path.exists(os.path.join(args.database_folder, args.mapping)) or
               not os.path.exists(os.path.join(args.database_folder, args.database + '.md5'))    )

both should be true (though technically the file is .tar so not sure if that would return true), but the program still runs the URL check and fails. Is there a means of downloading the database manually, set it up, and running the tool without having it try to download everything again?

I'm sure I'm missing something obvious, but just can't work it out

gene_markers_selection crashed

[e] expected str, bytes or os.PathLike object, not NoneType

[e] gene_markers_selection crashed
output_tree/tmp/map_dna/JIAJIAN16_Bin_002.LargeContigs.b6o.bz2" generated in 1s

What does the following error mean? I used the following command.

phylophlan -i input_genomes -d phylophlan -f /mnt/home/gunturus/phylophlan/supermatrix_aa.cfg --diversity high --fast -o output_tree --nproc 20 --verbose

local variable 'input_faa_clean' referenced before assignment

Dear phylophlan team,
Thanks for your great effort in producing phylophlan 3.0.
I recently occured same issue described in #9 , however, I have successfully installed phylophlan from github, which should have no issue.
So maybe something else is wrong.
My error log:
Traceback (most recent call last): File "/home/yzz0191/anaconda3/bin/phylophlan", line 11, in <module> load_entry_point('PhyloPhlAn==3.0', 'console_scripts', 'phylophlan')() File "/home/yzz0191/anaconda3/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 3203, in phylophlan_main standard_phylogeny_reconstruction(project_name, configs, args, db_dna, db_aa) File "/home/yzz0191/anaconda3/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 3008, in standard_phylogeny_reconstruction all_inputs = (os.path.splitext(os.path.basename(i))[0] for i in input_faa_clean) UnboundLocalError: local variable 'input_faa_clean' referenced before assignment

It seems like error log in  #9 , but different, because the error line number in my case is 3203 and 3008, but 3200 and 3005 in #9 case. What's more, I have checked the "/home/yzz0191/anaconda3/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", and I am pretty sure this python script is new version, the head of the script:

#!/usr/bin/env python

author = ('Francesco Asnicar ([email protected]), '
'Francesco Beghini ([email protected]), '
'Claudia Mengoni ([email protected]), '
'Mattia Bolzan ([email protected]), '
'Nicola Segata ([email protected])')
version = '3.0.53'
date = '8 June 2020'

I really don't know how to fix it, and here is my code :

phylophlan -i ${input_folder} -d phylophlan --force_nucleotides --diversity high -f supermatrix_aa.cfg --nproc 24

Many thanks for any suggestions!

e] both db_dna and db_aa are None!

I am getting the error:

e] both db_dna and db_aa are None!

My config file looks like:

[db_dna]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/makeblastdb
params = -parse_seqids -dbtype nucl
input = -in
output = -out
version = -version
command_line = #program_name# #params# #input# #output#

[map_dna]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/blastn
params = -outfmt 6 -max_target_seqs 1000000
input = -query
database = -db
output = -out
version = -version
command_line = #program_name# #params# #input# #database# #output#

[msa]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/mafft
params = --quiet --anysymbol --thread 1 --auto
version = --version
command_line = #program_name# #params# #input# > #output#
environment = TMPDIR=/tmp

[tree1]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/raxmlHPC-PTHREADS-SSE3
params = -p 1989 -m GTRCAT
input = -s
output_path = -w
output = -n
version = -v
command_line = #program_name# #params# #threads# #output_path# #input# #output#
threads = -T

The script calling phylophlan looks like:

phylophlan -i $path -d phylophlan --diversity low -f phylophlan_custom_config.cfg

My config file is defining db_dna so I am unsure why this error is occurring

Phylophlan_metagenomic.py

Hi,
Thank you for this useful tool.

I got errors using phylophlan_metagenomic.py as follows:
phylophlan_metagenomic -i test -d SGB.Dec19 -o test-out --nproc 4 -n 10

The error details:
Traceback (most recent call last):
File "/home/wjn/.conda/envs/phylophlan/bin/phylophlan_metagenomic", line 10, in
sys.exit(phylophlan_metagenomic())
File "/home/wjn/.conda/envs/phylophlan/lib/python3.8/site-packages/phylophlan/phylophlan_metagenomic.py", line 884, in phylophlan_metagenomic
f.write('\t'.join([binn] + ["{}{}:{}:{}:{}".format(sgb_2_info[i[0]][5],
File "/home/wjn/.conda/envs/phylophlan/lib/python3.8/site-packages/phylophlan/phylophlan_metagenomic.py", line 884, in
f.write('\t'.join([binn] + ["{}
{}:{}:{}:{}".format(sgb_2_info[i[0]][5],
KeyError: '54917'

The attached file is the log.

Many thanks,
Jianing Wang

Phylophlan 2.0 release?

There are tags for 1.0, 3.0, and 3.0.1 but 2.x is completely missing. Former links to the project on bitbucket are dead. Is there any way to access this version?

I require it for the MAGpy workflow.

phylophlan_strain_finder - wrong threshold arguments

Dear Francesco,

First, thank you for the very useful and configurable tool! Great job.
I run into an into a "Namespace" error when tried to use phylophlan_strain_finder tool:
File "/urigo/vadimd/conda_phylophlan3/lib/python3.8/site-packages/phylophlan/phylophlan_strain_finder.py", line 168, in phylophlan_strain_finder
check_params(args, args.verbose)
File "/urigo/vadimd/conda_phylophlan3/lib/python3.8/site-packages/phylophlan/phylophlan_strain_finder.py", line 114, in check_params
if args.p_threshold < 0.0:
AttributeError: 'Namespace' object has no attribute 'p_threshold'

In the documentation under "Finding strains in trees" part, you wrote that the thresholds can be tuned using: --phylo_thr and --mutrate_thr , however under "phylophlan_strain_finder.py" you have 2 different arguments instead: --p_threshold P_THRESHOLD and --m_threshold M_THRESHOLD.
I have checked phylophlan_strain_finder.py, you added --phylo_thr and --mutrate_thr as argparse arguments, but in check_params function you checked for:

if args.p_threshold < 0.0:
        error('p_threshold should be a positive number', exit=True)
if args.m_threshold < 0.0:
        error('m_threshold should be a positive number', exit=True)

which are not defined and hence was the error. I have replaced these 2 arguments instead of --phylo_thr and --mutrate_thr in argparse and the script seems to work fine with the default threholds of 0.05.

In addition I wanted to ask you how do you recommend to read/interpretate the output table from this scripts? It's not easy readable in the current output format.

Thank you
Vadimd

db_dna

Hi,

I'm running phylophlan3 on a set of MAGs together with reference genomes. I was wondering if you could help me trouble shoot regarding error
[e] both db_dna and db_aa are None!

My command is:

phylophlan
--input_folder ./fna
-o ./out
--nproc 48
--diversity high
-d phylophlan
-f /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/default_nt.cfg
--configs_folder /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/
--submat_folder /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices
--maas /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/phylophlan.tsv
-i reducedtree
--force_nucleotides

The config file is:

[db_dna]
program_name = makeblastdb
params = -parse_seqids -dbtype nucl
input = -in
output = -out
version = -version
command_line = #program_name# #params# #input# #output#
[map_dna]
program_name = blastn
params = -outfmt 6 -max_target_seqs 1000000
input = -query
database = -db
output = -out
version = -version
command_line = #program_name# #params# #input# #database# #output#
[msa]
program_name = muscle
params = -quiet -maxiters 2
input = -in
output = -out
version = -version
command_line = #program_name# #params# #input# #output#
[tree1]
program_name = iqtree
params = -quiet -nt AUTO -m GTR
input = -s
output = -pre
version = -version
command_line = #program_name# #params# #input# #output#

So I have defined db_dna in the config, how come the software cannot find it?
PS. I used conda installation and want to use the phylophlan 400 proteins.

Many thanks

Package folder not found

I have just installed phylophlan3 using conda and am getting the following error straight away, any idea of the cause?

(analysis) ecoli@bact:~/Hitch_analysis/analysis/v0.4/Output/Afrizal-2$ phylophlan -i Genome_tree -o Genome_tree/Output -d phylophlan --diversity medium [e] "/home/ecoli/anaconda/envs/Analysis/lib/python3.8/site-packages/phylophlan/phylophlan_configs/" folder does not exists

[e] unable to download "https://www.dropbox.com ... in phylophlan_metagenomic

Now I'm trying to run phylophlan_metagenomic
3 bugs:

  1. all the bash file in your samples/ have commands like phylophlan_metagenomic.py \. However, there should not be a .py
  2. In this version, can -d be dismissed?
  3. Though I already download phylophlan_metagenomic.txt to --database_folder, the program still doing Downloading "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1" to "phylophlan_metagenomic.txt", and then [e] unable to download "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1"

Thanks!

Error in "Refining phylogeny"

Hi Francesco,

I worked with Matthew. With your help, we successfully installed phylophlan3 on our cluster and complete the test run.
However, when I used it on my real data, I got an error as shown below:
It seems the alignment and initial tree has been completed, but the error comes when there is a polytomies on the tree.

I am wondering if the initial tree is still accurate and useful. And how this can be fixed?

my command:
phylophlan -i dereplicated_genomes_fna -d phylophlan -o dereplicated_genomes --nproc 16 --diversity high -f supermatrix_aa.cfg --verbose

Input are all fna genomes

The error is:

Concatenating alignments
Alignments concatenated "dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_concatenated.aln" in 0s
Building phylogeny "dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_concatenated.aln"
Phylogeny "dereplicated_genomes_fna.tre" built in 55s
Resolving 1 polytomies
Resolving polytomies for "dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna.tre"
"dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_resolved.tre" generated in 0s

Refining phylogeny "dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_resolved.tre"

[e] Command '['/tigress/MOLBIO/local/bin/raxmlHPC-PTHREADS-SSE3', '-p', '1989', '-m', 'PROTCATLG', '#threads#', '-t', 'dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_resolved.tre', '-w', '/projects/DONIA/jliu/Temporal/Phylophlan3/dereplicated_genomes_fna_supermatrix_aa', '-s', 'dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_concatenated.aln', '-n', 'dereplicated_genomes_fna_refined.tre']' returned non-zero exit status 127.

[e] error while executing
command_line: /tigress/MOLBIO/local/bin/raxmlHPC-PTHREADS-SSE3 -p 1989 -m PROTCATLG #threads# -t dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_resolved.tre -w /projects/DONIA/jliu/Temporal/Phylophlan3/dereplicated_genomes_fna_supermatrix_aa -s dereplicated_genomes_fna_supermatrix_aa/dereplicated_genomes_fna_concatenated.aln -n dereplicated_genomes_fna_refined.tre
stdin: None
stdout: None
env: {'SLURM_NODELIST': 'tiger-h21d1', 'SLURM_JOB_NAME': 'runPhylophlan_dereplicated_genomes_fna-JL.sh', 'MANPATH': '/usr/share/man:/usr/local/share/man:/opt/puppetlabs/puppet/share/man', 'XDG_SESSION_ID': '55254', 'SLURMD_NODENAME': 'tiger-h21d1', 'SLURM_TOPOLOGY_ADDR': 'tiger-h21d1', 'HOSTNAME': 'tiger-h21d1', 'SELINUX_ROLE_REQUESTED': '', 'SLURM_NODE_ALIASES': '(null)', 'SHELL': '/bin/bash', 'TERM': 'xterm-256color', 'SLURM_JOB_QOS': 'tiger-donia', 'HISTSIZE': '1000', 'TMPDIR': '/tmp', 'SLURM_TOPOLOGY_ADDR_PATTERN': 'node', 'SSH_CLIENT': '128.112.70.123 52389 22', 'CONDA_SHLVL': '2', 'CONDA_PROMPT_MODIFIER': '(/tigress/MOLBIO/local/pythonenv/phylophlan)', 'SELINUX_USE_CURRENT_RANGE': '', 'NCARG_FONTCAPS': '/usr/lib64/ncarg/fontcaps', 'QTDIR': '/usr/lib64/qt-3.3', 'QTINC': '/usr/lib64/qt-3.3/include', 'SSH_TTY': '/dev/pts/261', 'QT_GRAPHICSSYSTEM_CHECKED': '1', 'SLURM_NNODES': '1', 'USER': 'jl103', 'LS_COLORS': 'rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:.tar=38;5;9:.tgz=38;5;9:.arc=38;5;9:.arj=38;5;9:.taz=38;5;9:.lha=38;5;9:.lz4=38;5;9:.lzh=38;5;9:.lzma=38;5;9:.tlz=38;5;9:.txz=38;5;9:.tzo=38;5;9:.t7z=38;5;9:.zip=38;5;9:.z=38;5;9:.Z=38;5;9:.dz=38;5;9:.gz=38;5;9:.lrz=38;5;9:.lz=38;5;9:.lzo=38;5;9:.xz=38;5;9:.bz2=38;5;9:.bz=38;5;9:.tbz=38;5;9:.tbz2=38;5;9:.tz=38;5;9:.deb=38;5;9:.rpm=38;5;9:.jar=38;5;9:.war=38;5;9:.ear=38;5;9:.sar=38;5;9:.rar=38;5;9:.alz=38;5;9:.ace=38;5;9:.zoo=38;5;9:.cpio=38;5;9:.7z=38;5;9:.rz=38;5;9:.cab=38;5;9:.jpg=38;5;13:.jpeg=38;5;13:.gif=38;5;13:.bmp=38;5;13:.pbm=38;5;13:.pgm=38;5;13:.ppm=38;5;13:.tga=38;5;13:.xbm=38;5;13:.xpm=38;5;13:.tif=38;5;13:.tiff=38;5;13:.png=38;5;13:.svg=38;5;13:.svgz=38;5;13:.mng=38;5;13:.pcx=38;5;13:.mov=38;5;13:.mpg=38;5;13:.mpeg=38;5;13:.m2v=38;5;13:.mkv=38;5;13:.webm=38;5;13:.ogm=38;5;13:.mp4=38;5;13:.m4v=38;5;13:.mp4v=38;5;13:.vob=38;5;13:.qt=38;5;13:.nuv=38;5;13:.wmv=38;5;13:.asf=38;5;13:.rm=38;5;13:.rmvb=38;5;13:.flc=38;5;13:.avi=38;5;13:.fli=38;5;13:.flv=38;5;13:.gl=38;5;13:.dl=38;5;13:.xcf=38;5;13:.xwd=38;5;13:.yuv=38;5;13:.cgm=38;5;13:.emf=38;5;13:.axv=38;5;13:.anx=38;5;13:.ogv=38;5;13:.ogx=38;5;13:.aac=38;5;45:.au=38;5;45:.flac=38;5;45:.mid=38;5;45:.midi=38;5;45:.mka=38;5;45:.mp3=38;5;45:.mpc=38;5;45:.ogg=38;5;45:.ra=38;5;45:.wav=38;5;45:.axa=38;5;45:.oga=38;5;45:.spx=38;5;45:*.xspf=38;5;45:', 'CONDA_EXE': '/usr/licensed/anaconda3/2019.10/bin/conda', 'SLURM_JOBID': '4674298', 'COBBLER_SERVER': '10.36.16.2', 'SLURM_NTASKS': '16', 'NCARG_GRAPHCAPS': '/usr/lib64/ncarg/graphcaps', '_CE_CONDA': '.', 'CONDA_PREFIX_1': '/usr/licensed/anaconda3/2019.10', 'SLURM_TASKS_PER_NODE': '16', 'PATH': '/tigress/MOLBIO/local/diamond-0.9.31:/tigress/MOLBIO/local-rhel7/bin:/tigress/MOLBIO/local/pythonenv/phylophlan/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/home/jl103/.local/bin:/home/jl103/bin:/tigress/MOLBIO/local/bin', 'MAIL': '/var/spool/mail/jl103', 'SLURM_WORKING_CLUSTER': 'tiger2:tiger2-slurm:6820:8704:101', 'SLURM_JOB_ID': '4674298', 'CONDA_PREFIX': '/tigress/MOLBIO/local/pythonenv/phylophlan', 'SLURM_JOB_USER': 'jl103', 'PWD': '/tigress/DONIA/jliu/Temporal/Phylophlan3', 'LMFILES': '/usr/licensed/Modules/modulefiles/anaconda3/2019.10:/tigress/MOLBIO/Modules/modulefiles-rhel7/blast/2.7.1:/tigress/MOLBIO/Modules/modulefiles-rhel7/diamond/0.9.31.132:/tigress/MOLBIO/Modules/modulefiles-rhel7/phylophlan/3.0', 'NCARG_ROOT': '/usr', 'LANG': 'en_US.UTF-8', 'MODULEPATH': '/tigress/MOLBIO/Modules/modulefiles-rhel7:/tigress/MOLBIO/Modules/modulefiles-shared:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/local/share/Modules/modulefiles:/opt/share/Modules/modulefiles:/usr/licensed/Modules/modulefiles:/projects/SOFTWARE/Modules/modulefiles/tiger', 'NCARG_DATABASE': '/usr/lib64/ncarg/database', 'SLURM_JOB_UID': '150058', 'LOADEDMODULES': 'anaconda3/2019.10:blast/2.7.1:diamond/0.9.31.132:phylophlan/3.0', 'SLURM_NODEID': '0', 'SLURM_SUBMIT_DIR': '/projects/DONIA/jliu/Temporal/Phylophlan3', 'SELINUX_LEVEL_REQUESTED': '', 'SLURM_TASK_PID': '41165', 'SLURM_NPROCS': '16', 'SLURM_CPUS_ON_NODE': '16', 'CE_M': '.', 'SLURM_PROCID': '0', 'ENVIRONMENT': 'BATCH', 'HISTCONTROL': 'ignoredups', 'NCARG_LIB': '/usr/lib64/ncarg', 'SLURM_JOB_NODELIST': 'tiger-h21d1', 'HOME': '/home/jl103', 'SHLVL': '2', 'NCARG_NCARG': '/usr/share/ncarg', 'SLURM_LOCALID': '0', 'SLURM_JOB_GID': '30054', 'SLURM_JOB_CPUS_PER_NODE': '16', 'SLURM_CLUSTER_NAME': 'tiger2', 'SLURM_GTIDS': '0', 'SLURM_SUBMIT_HOST': 'tigercpu.princeton.edu', 'SLURM_JOB_PARTITION': 'donia', 'CONDA_PYTHON_EXE': '/usr/licensed/anaconda3/2019.10/bin/python', 'LOGNAME': 'jl103', 'CVS_RSH': 'ssh', 'QTLIB': '/usr/lib64/qt-3.3/lib', 'SLURM_JOB_ACCOUNT': 'molbio', 'SSH_CONNECTION': '128.112.70.123 52389 128.112.172.210 22', 'XDG_DATA_DIRS': '/home/jl103/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share', 'SLURM_JOB_NUM_NODES': '1', 'MODULESHOME': '/usr/share/Modules', 'CONDA_DEFAULT_ENV': '/tigress/MOLBIO/local/pythonenv/phylophlan', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s', 'XDG_RUNTIME_DIR': '/run/user/150058', 'SLURM_MEM_PER_NODE': '153600', 'BASH_FUNC_module()': '() { eval /usr/bin/modulecmd bash $*\n}', '': '/tigress/MOLBIO/local/pythonenv/phylophlan/bin/phylophlan'}
Elapsed time is 116 minutes and 35 seconds.

Thank you!
Jie

database of phylophlan 3.0

Dear Francesco
Thanks for the brilliant tool.
When running the test command phylophlan_setup_database -g s__Staphylococcus_aureus -o 01_saureus --verbose, it sometimes works wrong due to the bad or unstable network speed, warnings like this:

Downloading "http://www.uniprot.org/uniref/UniRef90_Q5HNU0.fasta" to "./s__Staphylococcus_aureus/Q5HNU0.faa"
[e] unable to download "http://www.uniprot.org/uniref/UniRef90_Q5HNU0.fasta"
Downloading "http://www.uniprot.org/uniref/UniRef90_Q2FHD9.fasta" to "./s__Staphylococcus_aureus/Q2FHD9.faa"
[e] unable to download "http://www.uniprot.org/uniref/UniRef90_Q2FHD9.fasta"
Downloading "http://www.uniprot.org/uniref/UniRef90_A7X5K8.fasta" to "./s__Staphylococcus_aureus/A7X5K8.faa"
[e] unable to download "http://www.uniprot.org/uniref/UniRef90_A7X5K8.fasta"
Downloading "http://www.uniprot.org/uniref/UniRef90_A0A181DVE6.fasta" to "./s__Staphylococcus_aureus/A0A181DVE6.faa"
[e] unable to download "http://www.uniprot.org/uniref/UniRef90_A0A181DVE6.fasta"
......

So is it possible to modify the script to extract sequences from the dowloaded ”uniref90.fasta.gz“ according to uniref ID? If possible, I think it is more convenient and user-friendly.

Thank you very much.

Scheduler system instead of mulitprocessing

Hi everyone,

First of all, thanks for keeping up the great work. I was wondering about the design philosophy that you chose for executing the PhyloPhlAn commands and whether using the Python module multiprocessing is the best way to go for very large (> 10,000 species) data sets?

While using multiprocessing's functions allows to distribute the individual task across multiple cores of the same computational node, it is restricted by the size of the current node. In the environments I am currently working at, we typically have a large number of nodes (> 50) that are relatively small (on average 36 cores) connected by a scheduling system. For my current system, PhyloPhlAn would substantially benefit if the individual tasks were submitted to different nodes, rather than run on a single node.

I think the main functionality of PhyloPhlan (function standard_phylogeny_reconstruction from phylophlan.py) could be easily replaced by a pipeline constructed for being run using Snakemake or Nextflow. Especially Snakemake should be an easy port because it uses Python for configuration. Using such a pipeline would then be suitable for both scenarios, either one big node with many cores or many small nodes with fewer cores, because one could decide whether to run the pipeline locally or using a scheduling system, e.g. SLURM.

I was wondering whether you already had thought about it and, if yes, what your design decision against it has been?

Thanks, Alex

Is it possible to add eukaryotic core proteins to the PhyloPhlAn database?

Hi all,

I am new to this program and working with MAGs from skin microbiome samples. I need Malassezia yeast to be included in my database. Is there a way to add species from this genera to the precomputed database? Or do I have to create a new one with both prokaryotic and eukaryotic information?

I appreciate any help! Thanks!

Phylophan_metagenomic's mash dist uses all cpus despite using --nproc

Hi,

Thank you for your work.

Checking on our server with htop I realised that mash dist was using all of our CPUs. I believe it is caused by the missing -p argument here:

cmd = ['mash', 'dist', sgb_msh_idx, msh_idx]

While it is present in this call:

cmd = ['mash', 'dist', '-t', '-p', str(nproc), inp_a, inp_b]

question about example 5 step 3

when I run this step, I get an error

phylophlan_get_reference -g c__Espilonproteobacteria \
    -n -1 \
    -o input_bins \
    --verbose 2>&1 | tee logs/phylophlan_get_reference.log
phylophlan_get_reference.py version 0.16 (9 April 2020)

Command line: ../../phylophlan_get_reference.py -g c__Espilonproteobacteria -n -1 -o input_bins --verbose

Arguments: {'get': 'c__Espilonproteobacteria', 'list_clades': False, 'database_update': False, 'output_file_extension': '.fna.gz', 'output': 'input_bins', 'how_many': None, 'genbank_mapping': 'assembly_summary_genbank.txt', 'verbose': True}
File "taxa2genomes.txt" present
File "taxa2genomes_cpa201901_up201901.txt.bz2" present
Output folder "input_bins" present
File "assembly_summary_genbank.txt" present
[e] no reference genomes found for "c__Espilonproteobacteria", please check the taxonomic label provided

Location of examples folder and database files

Hi! Installed phylophlan using conda. I am able to succesfully print the version after installation but my phylophlan directory does dot contain an examples folder. Am I supposed to clone the examples folder from github to be able to perform the tutorials?

Also, I would like to use the nucleotide version of the default databases for other purposes. Is this found somewhere in the phylophlan directories after installation or am I able to access it from somewhere else?

local variable 'input_faa_clean' referenced before assignment

Hi, still trying to get a successful run on Example 02 tree of life. I'm using the first 10 files in input_genomes to make a shorter test.

Traceback (most recent call last): File "/tigress/MOLBIO/local/pythonenv/phylophlan/bin/phylophlan", line 11, in <module> load_entry_point('PhyloPhlAn==3.0', 'console_scripts', 'phylophlan')() File "/tigress/MOLBIO/local/pythonenv/phylophlan/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 3200, in phylophlan_main standard_phylogeny_reconstruction(project_name, configs, args, db_dna, db_aa) File "/tigress/MOLBIO/local/pythonenv/phylophlan/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 3005, in standard_phylogeny_reconstruction all_inputs = (os.path.splitext(os.path.basename(i))[0] for i in input_faa_clean) UnboundLocalError: local variable 'input_faa_clean' referenced before assignment

Best,
Matthew Cahn

Problem at last step (gene refining) and --maas options

Dear developers,
I have encountered a problem when running Phylophlan3. The program does not generate the final tree, it actually stops at the final stages (gene refining with raxml). This is the command and parameters used:

phylophlan -i folder-genomes -d phylophlan -o try-out -t a --diversity high--nproc 72 -f /home/egg/miniconda3/envs/phylophlan3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/supertree_aa.cfg --maas /home/egg/miniconda3/envs/phylophlan3/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/phylophlan.tsv

I just want to obtain the consensus tree as it was done in the old versions of phylophlan with the 400 markers (using the option -u, user tree, with my input folder). My input folder is .faa

The initial steps run properly:
Inputs already checked
Inputs already cleaned
Loading files from "picos-try-phylo3-out/tmp/clean_aa"
"phylophlan" markers already mapped (key: "map_aa")
Markers already selected
Markers already extracted
Inputs already translated into markers
Markers already aligned (key: "msa")
Markers already trimmed (key: "trim")
Markers already subsampled
Gene trees already built
Polytomies already resolved
Refining 206 gene trees
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0351.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0336.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0350.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0133.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0023.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0298.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0218.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0233.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0267.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0257.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0177.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0135.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0000.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0084.aln"
Refining gene tree "picos-try-phylo3-out/tmp/sub/p0272.aln"

Then, these are the specific errors:

[e] Command '['/home/egg/miniconda3/envs/phylophlan3/bin/raxmlHPC', '-m', 'PROTCATLG', '-p', '1989', '-t', 'picos-try-phylo3-out/tmp/gene_tree1_polytomies/p0298.tre', '-w', '/home/pedroj/phyloplan/picos-try-phylo3-out/tmp/gene_tree2', '-s', 'picos-try-phylo3-out/tmp/sub/p0298.aln', '-n', 'p0298.tre']' returned non-zero exit status 255.

[e] error while executing
command_line: /home/egg/miniconda3/envs/phylophlan3/bin/raxmlHPC -m PROTCATLG -p 1989 -t picos-try-phylo3-out/tmp/gene_tree1_polytomies/p0298.tre -w /home/pedroj/phyloplan/picos-try-phylo3-out/tmp/gene_tree2 -s picos-try-phylo3-out/tmp/sub/p0298.aln -n p0298.tre
stdin: None
stdout: None
env: {'LESSOPEN': '| /usr/bin/lesspipe %s', 'CONDA_PROMPT_MODIFIER': '(phylophlan3) ', 'USER': 'pedroj', 'SSH_CLIENT': '193.147.133.245 49821 22', 'LC_TIME': 'es_ES.UTF-8', 'BLASTDB': '/home/egg/Databases/blastdb/', 'XDG_SESSION_TYPE': 'tty', 'SHLVL': '1', 'MOTD_SHOWN': 'pam', 'HOME': '/home/pedroj', 'CONDA_SHLVL': '1', 'OLDPWD': '/home/pedroj', 'SSH_TTY': '/dev/pts/1', 'LC_MONETARY': 'es_ES.UTF-8', 'DBUS_SESSION_BUS_ADDRESS': 'unix:path=/run/user/1002/bus', 'CE_M': '', 'LIBVIRT_DEFAULT_URI': 'qemu:///system', 'LOGNAME': 'pedroj', '': '/home/egg/miniconda3/envs/phylophlan3/bin/phylophlan', 'XDG_SESSION_CLASS': 'user', 'TERM': 'xterm', 'XDG_SESSION_ID': '902', '_CE_CONDA': '', 'SSUALIGNDIR': '/home/egg/Programs/ssu-align/lib', 'PATH': '/home/egg/miniconda3/envs/phylophlan3/bin:/home/egg/miniconda3/condabin:/home/egg/Programs/kofam_scan-1.3.0:/home/egg/Programs/sratoolkit.2.10.9-ubuntu64/bin:/home/egg/Programs/snippy-4.6.0/bin:/home/egg/Programs/get_homologues-3.3.3:/home/egg/Programs/ssu-align/bin:/home/egg/Programs/velvet:/home/egg/Programs/picardtools:/home/egg/Programs/blat:/home/egg/Programs/SPAdes-3.14.1/bin:/home/egg/Programs/simka/build/bin:/home/egg/Programs/sharedtools:/home/egg/Programs/rodney/creep:/home/egg/Programs/metabat2/build/bin:/home/egg/Programs/megahit/build:/home/egg/Programs/idba/bin:/home/egg/Programs/Flye/bin:/home/egg/Programs/cmdtools:/home/egg/Programs/canu-2.1.1/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin', 'LC_ADDRESS': 'es_ES.UTF-8', 'XDG_RUNTIME_DIR': '/run/user/1002', 'DISPLAY': 'localhost:10.0', 'LANG': 'en_US.UTF-8', 'LC_TELEPHONE': 'es_ES.UTF-8', 'BIOPERL_INDEX': '/home/egg/Databases/blastdb/', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:*.xspf=00;36:', 'HMMERDB': '/home/egg/Databases/pfam/', 'CONDA_PYTHON_EXE': '/home/egg/miniconda3/bin/python', 'SHELL': '/bin/bash', 'LC_NAME': 'es_ES.UTF-8', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'PFAMDB': '/home/egg/Databases/pfam/', 'CONDA_DEFAULT_ENV': 'phylophlan3', 'LC_MEASUREMENT': 'es_ES.UTF-8', 'LC_IDENTIFICATION': 'es_ES.UTF-8', 'PWD': '/home/pedroj/phyloplan', 'CONDA_EXE': '/home/egg/miniconda3/bin/conda', 'SSH_CONNECTION': '193.147.133.245 49821 10.128.0.15 22', 'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop', 'LC_NUMERIC': 'es_ES.UTF-8', 'CONDA_PREFIX': '/home/egg/miniconda3/envs/phylophlan3', 'CORTADO': '/home/egg/Programs/cmdtools/', 'LC_PAPER': 'es_ES.UTF-8'}

Is there any way to simplify this? Maybe I need to write another config file instead of supertree_aa.cfg?
Many thanks for your help,
Best regards,
Pedro J

--subsample None?

The wiki says that the full alignment will be used if --subsample is set to None, however, it currently says that such an option doesn't exist

local variable 'out' referenced before assignment

This command:

phylophlan -i test_genomes -d phylophlan -o sbtest --diversity high -f supertree_aa.cfg --maas phylophlan.tsv

is gving this error:

Traceback (most recent call last): File "/tigress/MOLBIO/local/pythonenv/phylophlan/bin/phylophlan", line 11, in <module> load_entry_point('PhyloPhlAn==3.0', 'console_scripts', 'phylophlan')() File "/tigress/MOLBIO/local/pythonenv/phylophlan/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 3194, in phylophlan_main standard_phylogeny_reconstruction(project_name, configs, args, db_dna, db_aa) File "/tigress/MOLBIO/local/pythonenv/phylophlan/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 2983, in standard_phylogeny_reconstruction build_gene_tree(configs, 'gene_tree1', sub_mod, inp_f, out_f, nproc=args.nproc, verbose=args.verbose) File "/tigress/MOLBIO/local/pythonenv/phylophlan/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 2376, in build_gene_tree if (not os.path.isfile(os.path.join(output_folder, out))) and \ UnboundLocalError: local variable 'out' referenced before assignment

I think that's because the "if" block starting at line 2376 is supposed to be indented under the "for" loop just above it.

-- Matthew Cahn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.