Giter Site home page Giter Site logo

Comments (10)

widdowquinn avatar widdowquinn commented on June 14, 2024

Hi Jinhui,

I don't think this problem is to do with the number of input files. It looks like at least one of your inputs is so distinct from another that there is no similarity being identified by BLAST. The error being thrown is that no data is being read from a BLAST output file.

It would help me diagnose this properly if you could please provide a small (minimal) dataset that reproduces your issue, and/or please inspect the intermediate BLAST output, to see if one of them lacks content.

It could also help if you please run pyani with the -v option, and provide the output log file.

In the meantime, I'll look at making that error catching and reporting a bit more informative.

Thanks,

L.

from pyani.

jinhuiwang avatar jinhuiwang commented on June 14, 2024

Yes, you are right. I excluded two sequences that have very low similarity with others, and it worked!

from pyani.

jinhuiwang avatar jinhuiwang commented on June 14, 2024

Hi! I attached the dataset. The two sequences Lcr_BT1_P1 and Lcr_BT1_P2 are distinct from other sequences. But is it possible to include these two sequences into the blastall output file?
ANI.zip

from pyani.

widdowquinn avatar widdowquinn commented on June 14, 2024

Hi Jinhui,

I have tried running your data with the current version on GitHub, and I don't get an error (see attached log file and output).

Can you please confirm whether you are using pyani 0.2.0.post1 or the development version on GitHub, as I think the development version fixes your issue and records the empty BLAST output results correctly for Lcr_BT1_LC1.fasta and Lcr_BT1_LC2.fasta

I note that the log file doesn't currently report the pyani version, so I'll fix that.

I would also very much recommend using ANIm, rather than ANIblastall.

If you can clone the current master branch and confirm that it works correctly on your data, I'll close the issue.

L.

from pyani.

jinhuiwang avatar jinhuiwang commented on June 14, 2024

Thanks for the suggestion! I use the current version on GitHub, $ git clone from your repository. The ANIm is fine for this dataset but not ANIblastall.
And I attach the blastoutput and also log file, no blastall result found in any Lcr_BT1_P1/P2vs*.tab file. I think you are right about choose ANIm for this dataset.
blastall_output.zip
ANIblastall.log.zip

jinhwang@jinhwang-HP:~/bio_app/pyani$ average_nucleotide_identity.py -i prophages -o prophages_ANIblastall -m ANIblastall -f -g -v --label prophages/labels.tab

(LP: deleted log text for space, as it is present in the ANIblastall.log.zip file.)

from pyani.

widdowquinn avatar widdowquinn commented on June 14, 2024

I'm a little puzzled. The command line I ran on your data was:

./average_nucleotide_identity.py -v -i tests/issue_10 -o tests/issue_10_output --method ANIblastall -g --gformat png -l issue_10.log

where tests/issue_10 contained the files you attached. This appears to be essentially identical to your command line:

/usr/local/bin/average_nucleotide_identity.py -i prophages -o prophages_ANIblastall -m ANIblastall -f -g -v --label prophages/labels.tab

and also generates a number of empty BLAST output files:

$ ls -ltrS *.blast_tab | head -n 30
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLso_ZC1_P2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLso_ZC1_P1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLso_NZ1_P1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLso_FIN114_phaA.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLas_psy62_FP2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLas_UF506_SC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLas_UF506_SC1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLam_SaoPaulo_SP2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC2_vs_CLaf_PTSAPSY_P1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC1_vs_CLas_psy62_FP2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC1_vs_CLas_UF506_SC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC1_vs_CLas_UF506_SC1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 Lcr_BT1_LC1_vs_CLam_SaoPaulo_SP2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLso_ZC1_P2_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLso_ZC1_P1_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLso_NZ1_P1_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLso_FIN114_phaA_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLas_psy62_FP2_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLas_psy62_FP2_vs_Lcr_BT1_LC1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLas_UF506_SC2_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLas_UF506_SC2_vs_Lcr_BT1_LC1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLas_UF506_SC1_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLas_UF506_SC1_vs_Lcr_BT1_LC1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLam_SaoPaulo_SP2_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLam_SaoPaulo_SP2_vs_Lcr_BT1_LC1.blast_tab
-rw-r--r--  1 lpritc  staff     0B 21 Sep 16:58 CLaf_PTSAPSY_P1_vs_Lcr_BT1_LC2.blast_tab
-rw-r--r--  1 lpritc  staff    64B 21 Sep 16:58 Lcr_BT1_LC1_vs_CLso_NZ1_P1.blast_tab
-rw-r--r--  1 lpritc  staff    64B 21 Sep 16:58 CLso_ZC1_P1_vs_Lcr_BT1_LC1.blast_tab
-rw-r--r--  1 lpritc  staff    65B 21 Sep 16:58 CLso_ZC1_P2_vs_Lcr_BT1_LC1.blast_tab
-rw-r--r--  1 lpritc  staff    68B 21 Sep 16:58 CLso_NZ1_P1_vs_Lcr_BT1_LC1.blast_tab

but also gives me result output, so I think the current version should work with your data. I think it may be that the script in /usr/local/bin/average_nucleotide_identity.py (the one which is being used, according to your log file) might not be the most current version.

Please could you try running the script from the repository directory, with ./average_nucleotide_identity.py instead, and seeing if that makes a difference. If so, there might be an installation issue to get past.

L.

from pyani.

jinhuiwang avatar jinhuiwang commented on June 14, 2024

Hi, I run the script directly from the pyani repository this time.
ANIblastall.log.zip

jinhwang@jinhwang-HP:~/bio_app/pyani$ ./average_nucleotide_identity.py -v -i ANI -o ANIblastall_out -m ANIblastall -f -g -l ANIblastall.log
INFO: pyani version: 0.2.0.dev
INFO: Namespace(blastall_exe='blastall', blastn_exe='blastn', classes=None, force=True, formatdb_exe='formatdb', fragsize=1020, gformat='pdf,png,eps', gmethod='mpl', graphics=True, indirname='ANI', jobprefix='ANI', labels=None, logfile='ANIblastall.log', makeblastdb_exe='makeblastdb', maxmatch=False, method='ANIblastall', noclobber=False, nocompress=False, nucmer_exe='nucmer', outdirname='ANIblastall_out', rerender=False, scheduler='multiprocessing', seed=None, sgegroupsize=10000, skip_blastn=False, skip_nucmer=False, subsample=None, verbose=True, workers=None, write_excel=False)
INFO: command-line: ./average_nucleotide_identity.py -v -i ANI -o ANIblastall_out -m ANIblastall -f -g -l ANIblastall.log
INFO: Input directory: ANI
INFO: Creating directory ANIblastall_out
INFO: Output directory: ANIblastall_out
INFO: Using ANI method: ANIblastall
INFO: Using scheduler method: multiprocessing
INFO: Identifying FASTA files in ANI
INFO: Input files:
ANI/CLas_UF506_SC2.fasta
ANI/CLso_FIN114_phaA.fasta
ANI/CLso_ZC1_P2.fasta
ANI/CLas_UF506_SC1.fasta
ANI/CLaf_PTSAPSY_P1.fasta
ANI/CLso_NZ1_P1.fasta
ANI/CLas_psy62_FP2.fasta
ANI/CLam_SaoPaulo_SP2.fasta
ANI/CLso_ZC1_P1.fasta
INFO: Processing input sequence lengths
INFO: Sequence lengths:
CLso_FIN114_phaA: 38325
CLaf_PTSAPSY_P1: 40666
CLam_SaoPaulo_SP2: 39941
CLas_psy62_FP2: 38552
CLso_NZ1_P1: 40403
CLso_ZC1_P1: 40794
CLso_ZC1_P2: 43309
CLas_UF506_SC2: 38997
CLas_UF506_SC1: 40048
INFO: Carrying out ANIblastall analysis
INFO: Running ANIblastall
INFO: Writing BLAST output to ANIblastall_out/blastall_output
INFO: Fragmenting input files, and writing to ANIblastall_out
INFO: Creating job dependency graph
INFO: Running jobs with multiprocessing
INFO: Running job dependency graph
INFO: Command pool now running:
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLso_FIN114_phaA.fasta -t CLso_FIN114_phaA
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLam_SaoPaulo_SP2.fasta -t CLam_SaoPaulo_SP2
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLso_ZC1_P2.fasta -t CLso_ZC1_P2
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLaf_PTSAPSY_P1.fasta -t CLaf_PTSAPSY_P1
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLso_ZC1_P1.fasta -t CLso_ZC1_P1
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLso_NZ1_P1.fasta -t CLso_NZ1_P1
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLas_UF506_SC1.fasta -t CLas_UF506_SC1
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLas_psy62_FP2.fasta -t CLas_psy62_FP2
INFO: formatdb -p F -i ANIblastall_out/blastall_output/CLas_UF506_SC2.fasta -t CLas_UF506_SC2
Traceback (most recent call last):
File "./average_nucleotide_identity.py", line 806, in
results = methods[args.method][0](infiles, org_lengths)
File "./average_nucleotide_identity.py", line 532, in unified_anib
logger=logger)
File "/home/jinhwang/bio_app/pyani/pyani/run_multiprocessing.py", line 45, in run_dependency_graph
cumretval += multiprocessing_run(cmdset, workers, verbose)
File "/home/jinhwang/bio_app/pyani/pyani/run_multiprocessing.py", line 86, in multiprocessing_run
for cline in cmdlines]
File "/home/jinhwang/bio_app/pyani/pyani/run_multiprocessing.py", line 86, in
for cline in cmdlines]
AttributeError: 'module' object has no attribute 'run'

from pyani.

widdowquinn avatar widdowquinn commented on June 14, 2024

The error you're getting is due to using a Python version <3.5 - subprocessing.run() (the function that the script is not finding) was introduced in Python 3.5 (see https://docs.python.org/3/library/subprocess.html).

Traceback (most recent call last):
File "./average_nucleotide_identity.py", line 806, in 
results = methods[args.method]0
File "./average_nucleotide_identity.py", line 532, in unified_anib
logger=logger)
File "/home/jinhwang/bio_app/pyani/pyani/run_multiprocessing.py", line 45, in run_dependency_graph
cumretval += multiprocessing_run(cmdset, workers, verbose)
File "/home/jinhwang/bio_app/pyani/pyani/run_multiprocessing.py", line 86, in multiprocessing_run
for cline in cmdlines]
File "/home/jinhwang/bio_app/pyani/pyani/run_multiprocessing.py", line 86, in 
for cline in cmdlines]
AttributeError: 'module' object has no attribute 'run'

If you upgrade your local Python to version 3.5, then the error in your last message should go away. It is not clear in the documentation that you now need version 3.5+, which is my fault. Many apologies!

from pyani.

jinhuiwang avatar jinhuiwang commented on June 14, 2024

I updated Python from v3.4 to v3.5.2, now the script works fine on both ANIblastall and ANIb options! Thank you!

from pyani.

widdowquinn avatar widdowquinn commented on June 14, 2024

Fantastic! I'll close the issue then, but if the same problem recurs, we can reopen it. Otherwise, please do open another issue if you have any questions or problems.

from pyani.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.