Giter Site home page Giter Site logo

billzt / mifish Goto Github PK

View Code? Open in Web Editor NEW
13.0 2.0 3.0 9.89 MB

This is the command line version of MiFish pipeline. It can also be used with any other eDNA meta-barcoding primers

Home Page: https://mitofish.aori.u-tokyo.ac.jp/mifish/

License: GNU General Public License v3.0

Python 100.00%
edna edna-pipeline

mifish's Introduction

favicon

MiFish

This is the command line version of MiFish pipeline. It can also be used with any other eDNA meta-barcoding primers

References

If you use MiFish Pipeline in your projects, please cite:

  • Zhu T, Sato Y, Sado T, Miya M, and Iwasaki W. 2023. MitoFish, MitoAnnotator, and MiFish Pipeline: Updates in ten years. Mol Biol Evol, 40:msad035. https://doi.org/10.1093/molbev/msad035
  • Sato Y, Miya M, Fukunaga T, Sado T, Iwasaki W. 2018. MitoFish and MiFish Pipeline: A Mitochondrial Genome Database of Fish with an Analysis Pipeline for Environmental DNA Metabarcoding. Mol Biol Evol 35:1553-1555.
  • Iwasaki W, Fukunaga T, Isagozawa R, Yamada K, Maeda Y, Satoh TP, Sado T, Mabuchi K, Takeshima H, Miya M, et al. 2013. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline. Mol Biol Evol 30:2531-2540.

If you use MiFish Primers in your projects, please cite:

  • Miya M, Sato Y, Fukunaga T, Sado T, Poulsen JY, Sato K, Minamoto T, Yamamoto S, Yamanaka H, Araki H, et al. 2015. MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more than 230 subtropical marine species. R Soc Open Sci 2:150088.

Install

Currently we only support Linux. Please use conda to manage the environment. If you do not have a Linux OS, or you just want to have a quick look, you can try the Docker version

External Dependencies

Add these softwares to your system PATH. You can download all the external executable files here(except for MAFFT), or compile by yourself.

Install Steps

conda create -n MiFish python==3.9.13
conda activate MiFish
pip3 install numpy==1.23.1
pip3 install scikit-bio==0.5.6
pip3 install PyQt5==5.15.7
pip3 install ete3==3.1.2
pip3 install duckdb==0.6.1
pip3 install XlsxWriter==3.0.3
pip3 install cutadapt==4.1
pip3 install biopython==1.79
git clone https://github.com/billzt/MiFish.git
cd MiFish
git checkout vsearch
python3 setup.py develop
mifish -h

In Ubuntu, the following library is also needed.

sudo apt-get install -y libgl1

Test

cd test
mifish seq mifishdbv3.83.fa -d seq2

There are six files in the result directory MiFishResult. Note: seq and seq2 are two directories with FQ files.

Parameters

Mandatory

mifish /path/to/your/amplicon/sequencing/directory/ /path/to/your/ref/db.fa

Directory for amplicon sequencing data (FASTQ/FASTA)

Since MiFish supports multi-sample analysis, amplicon sequencing data in compressed FASTQ/FASTA format should be put in directories. Pass the path of the directory as the first parameter. Refer to MiFish's Homepage to see the rules of filenames. Here are some examples:

  • MiFish-example-02_S73_L001_R1_001.fastq.gz
  • MiFish-example-02_S73_L001_R2_001.fastq.gz
  • DRR126155_1.fastq.bz2
  • DRR126155_2.fastq.bz2
  • mydata.1.fq.xz
  • mydata.2.fq.xz

RefDB of your metabarcoding primers

Prepare your RefDB in FASTA format and index it using the makeblastdb from NCBI BLAST+. RefDB for an old version of MiFish is in test/mifishdbv3.83.fa

The head line of RefDB (FASTA) follows this rule:

gb|accessionID|species_scientific_name

Replace blanks with underscores in the species name. Here are examples.

>gb|LC021149|Ostorhinchus_angustatus
CACCGCGGTTATACGAGAGGCCCAAGCTGACAATCACCGGCGTAAAGAGTGGTTAATGAC
CCCACAATAATAAAGTCGAACATCTCCAAAGTTGTTGAACACATTCGAAGATATGAAGCT
CTACCACGAAAGTGACTTTACACTCTTTGAACCCACGAAAGCTAGGAAA
>gb|LC579122|Ostorhinchus_angustatus
CACCGCGGTTATACGAGGGGCCCAAGCTGACAATCACCGGCGTAAAGAGTGGTTAATAAC
CCCACAATAATAAAGTCGAACATCTCCAAAGTTGTTGAACACATTCGAAGATATGAAGCT
CTACCACGAAAGTGACTTTACACTCTTTGAACCCACGAAAGCTAGGAAA
>gb|LC717543|Trachidermus_fasciatus
CACCGCGGTTATACGAGAGACTCAAGCTGACAAACACCGGCGTAAAGCGTGGTTAAGCTA
AAAATTTGCTAAAGTCAAACACCTTCAAGACTGTTATACGTACCCGAAGGCAGGAAGCAC
AACCACGAAAGTGACTTTAACTAAGCTGAATCCACGAAAGCTAAGGAA

accessionID can be any unique strings. Primers were trimmed off from the sequences.

Optional (important❗️)

Following optional parameters are designed for MiFish metabarcoding primers. If running with other eDNA primers, change them to satisfy your own primers.

Length filtering

  -m MIN_READ_LEN, --min-read-len MIN_READ_LEN
                        Minimum read length(bp) (default: 204)

  -M MAX_READ_LEN, --max-read-len MAX_READ_LEN
                        Maximum read length(bp) (default: 254)

The range of amplicon lengths (including primers). Adjust them to satisfy your own primers. You can estimate the range of from your reference database file.

Primer sequences

  -f PRIMER_FWD, --primer-fwd PRIMER_FWD
                        forward sequence of primer (5->3) (default: GTCGGTAAAACTCGTGCCAGC)
  -r PRIMER_REV, --primer-rev PRIMER_REV
                        reverse sequence of primer (5->3) (default: CATAGTGGGGTATCTAATCCCAGTTTG)

change them according to your own primers

Optional

Following optional parameters are designed for all metabarcoding primers.

Group samples

  -d OTHER_DATA_DIR, --other-data-dir OTHER_DATA_DIR
                        other directory of the amplicon sequencing data file (FASTQ/FASTA). Can specify multiple times. Each directory is considered as a group (default: None)

If your samples are in multiple groups, please arrange them in different directories and use the -d parameter for multiple times. e.g. -d 2nd_group_dir -d 3rd_group_dir

Threshold of BLASTN identity

  -i BLAST_MIN_IDENTITY, --blast-min-identity BLAST_MIN_IDENTITY
                        Minimum identity (percentage) for filtering BLASTN results (default: 97.0)

Threshold of UNOISE3

  -u UNOISE_MIN, --unoise-min UNOISE_MIN
                        value for the -minsize option in UNOISE3 (default: 8)

Decrease this value would get higher sensitivity but lower accuracy.

Skip downstream analysis

  -s, --skip-downstream-analysis
                        Skip abandance statics, phylogenetic and bio-diversity analysis (default: False)

Turn on this option if you only want to get taxonomy identification results and do not need other analysis.

Output directory

  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        directory for output (default: .)

Default is putting MiFishResult under your current directory. If you specify another directory /path/dir/, it will put results into /path/dir/MiFishResult

Number of threads

  -t THREADS, --threads THREADS
                        number of threads for BLASTN and usearch (default: 2)

Pass to external programs such as usearch

Keep temporary files

  -k, --keep-tmp-files  Keep temporary files (default: False)

Useful for debug. If you encountered problems, turn it on and share me the Sample-* directory in the MiFishResult directory.

Results

There are six files in the MiFishResult directory.

QC.zip
read_stat.xlsx
taxonomy.xlsx
tree.zip (if not using -s)
relative_abandance.json  (if not using -s)
diversity.json  (if not using -s but using -d)

The first four files are the same as the web version of MiFish. (Screenshots were from DRR126155 against refDB v3.83)

QC Species tree

An example on using other eDNA primers

See Riaz

Tips

  1. Please make sure that in a FASTQ/FASTA file, names of reads should start with an identitcal word, such as:
@DRR231392.1
@DRR231392.2
@DRR231392.3

Otherwise usearch cannot work properly.

mifish's People

Contributors

billzt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mifish's Issues

Help with multiple input data files (groups)

The mitofish pipeline is working when I use a single input directory, however, I am trying to analyze the data in three subgroups. I have placed each sub-group into its own folder and specified each of the folders by using the "-d" argument.

However, I am now getting an error (see below) with the pipeline as it seems it cannot find the database.

What am I doing wrong?

$ mifish -o output/WTRBA-YEARS -t 124 -d /home/cbfgws6/MiFish/WTRBA_YEAR/21 -d /home/cbfgws6/MiFish/WTRBA_YEAR/22 -d /home/cbfgws6/MiFish/WTRBA_YEAR/23 /home/cbfgws6/MiFish/mifishdb-Oct2023/mitofish.db.fa
usage: mifish [-h] [-d OTHER_DATA_DIR] [-m MIN_READ_LEN] [-M MAX_READ_LEN] [-f PRIMER_FWD] [-r PRIMER_REV] [-u UNOISE_MIN] [-i BLAST_MIN_IDENTITY] [-s] [-k]
              [-o OUTPUT_DIR] [-t THREADS]
              seq_dir db
mifish: error: the following arguments are required: db

mifish crashes when a group has been skipped

I am using three groups; this sample <Sample 21_Pt4_LO_S_1_> in the first group has only 27 reads and is skipped during initial processing.

Sample 21_Pt4_LO_S_1_ Step 0: Decompress
Sample 21_Pt4_LO_S_1_ Step 1: filter the quality of FASTQ and merge Pair-End Reads
Sample 21_Pt4_LO_S_1_ Step 2: filter read length and remove primers
Sample 21_Pt4_LO_S_1_ has not passed read length filter. Only has 27 reads. Skip

Later, the pipeline crashes:

Traceback (most recent call last):
  File "/home/cbfgws6/miniconda3/envs/MiFish/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/home/cbfgws6/MiFish/mifish/cmd/mifish.py", line 76, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/home/cbfgws6/MiFish/mifish/core/pipeline.py", line 397, in runMiFish
    json.dump(stat.eco_diversity(workdir, group_to_sample), fp=out_handle, indent=4)
  File "/home/cbfgws6/MiFish/mifish/core/stat.py", line 79, in eco_diversity
    with open(f'{workdir_sample}/04_blast/{sample_name}.json') as handle:
FileNotFoundError: [Errno 2] No such file or directory: 'output/WTRBA-YEARS/MiFishResult/Sample-21_Pt4_LO_S_1_/04_blast/21_Pt4_LO_S_1_.json'

It is true that there is no JSON file, as it was skipped. The pipeline should figure this out, or at least not crash, and move on to the next sample.

The immediate workaround would be to remove this sample (or any samples that are "SKIPPED" from the analysis/pipeline.

error parsing the blastxml in mifish/core/pipeline.py

Hi There. I was getting no hit results back and I noticed that the percent identities and #miss-matches being reported in "haploids with low identities" tab on the output taxonomy spreadsheet didnt make any sense.

Looking at pipeline.py line 256-261

# /core/pipeline.py
for alignment in blast_record.alignments:
    hsp = alignment.hsps[0]
    aln_len = alignment.length
    identity = hsp.identities/aln_len
    if identity >= blast_identity/100:
        good_alns.append(alignment)

For me aln_len is reporting the length of the hit record in the database, not the HSP overlap length. This means the identity number is really much smaller than it should be. I fixed it by assigning aln_len to hsp.align_length (see below).

#/core/pipeline.py
for alignment in blast_record.alignments:
    hsp = alignment.hsps[0]
    aln_len = hsp.align_length #alignment.length
    identity = hsp.identities/aln_len
    if identity >= blast_identity/100:
        good_alns.append(alignment)

Now I get correct reporting on the identity because it is dividing by the HSP length and not the hit record length.

This also needs to be fixed on lines 266 and 289 (moving it below the hsp assignment which occurs on line 269 and 292, respectively)

Species_num error

Hi all,

I've been trying to work with MiFish with a custom amplicon reference database built using makeblastdb, and a results file in .fasta format.

I've been using the command:

mifish seq/ database/crabtest.fasta

All the dependencies are found et al. But I get this error:

Detect your data as
#########
	zip warning: name not matched: ./MiFishResult/Sample-*/01_filter_fastq_and_merge/*.html

zip error: Nothing to do! (./MiFishResult/QC.zip)
Traceback (most recent call last):
  File "/home/labaccount/miniconda3/envs/MiFish/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/home/labaccount/projects/mifish_test/MiFish/mifish/cmd/mifish.py", line 82, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/home/labaccount/projects/mifish_test/MiFish/mifish/core/pipeline.py", line 395, in runMiFish
    if simple_result == False and 'species_num' in stat_data and stat_data['species_num'] > 3:
UnboundLocalError: local variable 'stat_data' referenced before assignment

I'm not sure how to interpret this. It looks like potentially there are few matches in the amplicon database?

core dumped with usearch -otutab

Hi there. While testing the pipeline with 2 groups of real data, I got a core dumped with usearch.

mifish seq/AA1 ../../MitoFish_db/MitoFish -d seq/AB2 -s -o MiFish_re_Result
#########
Sample AA1_3 Step 0: Decompress
Sample AA1_3 Step 1: filter the quality of FASTQ and merge Pair-End Reads
Sample AA1_3 Step 2: filter read length and remove primers
Sample AA1_3 Step 3: De-noise and generate haploid
sh: line 1: 165256 Aborted                 (core dumped) usearch -otutab MiFish_re_Result/MiFishResult/Sample-AA1_3/02_process_fasta/AA1_3.processed.fa -zotus MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.fasta -threads 2 -otutabout MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.size.txt > MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.otutab.log 2>&1
Traceback (most recent call last):
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish_re/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/cmd/mifish.py", line 71, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/core/pipeline.py", line 223, in runMiFish
    sizeFasIntegrator.run(zotusCountFile=f'{workdir_sample}/03_haploid/{sample_name}.zotus.size.txt', \
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/core/sizeFasIntegrator.py", line 5, in run
    with open(zotusCountFile) as handle:
FileNotFoundError: [Errno 2] No such file or directory: 'MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.size.txt'

In the end of AA1_3.otutab.log , we found

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Was the memory limit of 32-bit version usearch casued the issue?

ete3.parser.newick.NewickError in Step 5

Hi, thank you for the great tool.
I installed MiFish in a new conda environment as recommended. While testing with mifish seq mifishdbv3.83.fa -d seq2, I got an error during Step 5: Phylogenetic Analysis.

Detect your data as
#########
Group1: 1 samples
Sample DRR126155: read type = pe
#########
Group2: 1 samples
Sample DRR126155B: read type = pe
#########
Sample DRR126155 Step 0: Decompress
Sample DRR126155 Step 1: filter the quality of FASTQ and merge Pair-End Reads
Sample DRR126155 Step 2: filter read length and remove primers
Sample DRR126155 Step 3: De-noise and generate haploid
Sample DRR126155 Step 4: BLAST and calculate LOD Score
Sample DRR126155 Step 5: Phylogenetic Analysis
Traceback (most recent call last):
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/jdfsbjcas1/workdir/Tools/MiFish/mifish/cmd/mifish.py", line 71, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/jdfsbjcas1/workdir/Tools/MiFish/mifish/core/pipeline.py", line 349, in runMiFish
    drawTree.svg(species_result=species_result, tree_file=f'{workdir_sample}/05_MSA/{sample_name}.nwk', \
  File "/jdfsbjcas1/workdir/Tools/MiFish/mifish/core/drawTree.py", line 29, in svg
    tree_handle = Tree(tree_file)
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish/lib/python3.9/site-packages/ete3/coretype/tree.py", line 212, in __init__
    read_newick(newick, root_node = self, format=format,
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish/lib/python3.9/site-packages/ete3/parser/newick.py", line 264, in read_newick
    raise NewickError('Unexisting tree file or Malformed newick tree structure.')
ete3.parser.newick.NewickError: Unexisting tree file or Malformed newick tree structure.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

If -s was added, the pipeline would finish smoothly. In my conda env, ete3==3.1.2, as recommended. How can I rule this out?

For your reference, my conda env was as follows:

# packages in environment at /jdfsbjcas1/workdir/Env/miniconda/envs/MiFish:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
alsa-lib                  1.2.8                h166bdaf_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
arrow-cpp                 10.0.1           ha770c72_6_cpu    conda-forge
asttokens                 2.2.1              pyhd8ed1ab_0    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.6.21               hd93a3ba_3    conda-forge
aws-c-cal                 0.5.20               hff2c3d7_3    conda-forge
aws-c-common              0.8.5                h166bdaf_0    conda-forge
aws-c-compression         0.2.16               hf5f93bc_0    conda-forge
aws-c-event-stream        0.2.18               h57874a7_0    conda-forge
aws-c-http                0.7.0                h96ef541_0    conda-forge
aws-c-io                  0.13.12              h57ca295_1    conda-forge
aws-c-mqtt                0.7.13              h0b5698f_12    conda-forge
aws-c-s3                  0.2.3                h82cbbf9_0    conda-forge
aws-c-sdkutils            0.1.7                hf5f93bc_0    conda-forge
aws-checksums             0.1.14               h6027aba_0    conda-forge
aws-crt-cpp               0.18.16             hf80f573_10    conda-forge
aws-sdk-cpp               1.10.57              ha834a50_1    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                pyhd8ed1ab_3    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
biopython                 1.79             py39hb9d737c_3    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
cachecontrol              0.12.11            pyhd8ed1ab_1    conda-forge
cairo                     1.16.0            ha61ee94_1014    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.7            py39h4b4f3f3_0    conda-forge
cryptography              39.0.0           py39h079d5ae_0    conda-forge
cutadapt                  4.1              py39hbf8eff0_1    bioconda
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
cython                    0.29.33          py39h227be39_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
dnaio                     0.10.0           py39hbf8eff0_0    bioconda
ete3                      3.1.2              pyh9f0ad1d_0    conda-forge
exceptiongroup            1.1.0              pyhd8ed1ab_0    conda-forge
executing                 1.2.0              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                h27087fc_0    conda-forge
fftw                      3.3.10          nompi_hf0379b8_106    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.38.0           py39hb9d737c_1    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glib                      2.74.1               h6239696_1    conda-forge
glib-tools                2.74.1               h6239696_1    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gst-plugins-base          1.21.3               h4243ec0_1    conda-forge
gstreamer                 1.21.3               h25f0c4b_1    conda-forge
gstreamer-orc             0.4.33               h166bdaf_0    conda-forge
harfbuzz                  6.0.0                h8e241bc_0    conda-forge
hdmedians                 0.14.2           py39h2ae25f5_3    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
ipython                   8.9.0              pyh41d4057_0    conda-forge
isa-l                     2.30.0               ha770c72_4    conda-forge
jack                      1.9.21               h583fa2b_2    conda-forge
jedi                      0.18.2             pyhd8ed1ab_0    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4            py39hf939315_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.14                 hfd0df8a_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libarrow                  10.0.1           hf9c26a6_6_cpu    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcap                    2.66                 ha37c62d_0    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libclang                  15.0.7          default_had23c3d_0    conda-forge
libclang13                15.0.7          default_h3e3d535_0    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcups                   2.3.3                h36d4200_3    conda-forge
libcurl                   7.87.0               hdc1c0ab_0    conda-forge
libdb                     6.2.32               h9c3ff4c_0    conda-forge
libdeflate                1.17                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.2                h27087fc_0    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgcrypt                 1.10.1               h166bdaf_0    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libglib                   2.74.1               h606061b_1    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libgoogle-cloud           2.5.0                h21dfe5b_1    conda-forge
libgpg-error              1.46                 h620e276_0    conda-forge
libgrpc                   1.51.1               h30feacc_0    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.4                h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libllvm15                 15.0.7               hadd5161_0    conda-forge
libnghttp2                1.51.0               hff17c54_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libpq                     15.1                 hb675445_3    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsndfile                1.2.0                hb75c966_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libsystemd0               252                  h2a991cd_0    conda-forge
libthrift                 0.16.0               he500d00_2    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libtool                   2.4.7                h27087fc_0    conda-forge
libudev1                  252                  h166bdaf_0    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.10.3               h7463322_0    conda-forge
libxslt                   1.1.37               h873f0b0_0    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
lockfile                  0.12.2                     py_1    conda-forge
lxml                      4.9.2            py39h14694de_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
matplotlib-base           3.6.3            py39he190548_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mifish                    1.0                       dev_0    <develop>
mpg123                    1.31.2               hcb278e6_0    conda-forge
msgpack-python            1.0.4            py39hf939315_1    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.0.32               ha901b37_0    conda-forge
mysql-libs                8.0.32               hd7da12d_0    conda-forge
natsort                   8.2.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.82                 he02c5a1_0    conda-forge
numpy                     1.23.1           py39hba7629e_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.0.7                h0b41bf4_2    conda-forge
orc                       1.8.2                hfdbbad2_0    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3            py39h2ad29b5_0    conda-forge
parquet-cpp               1.5.1                         1    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
pbzip2                    1.1.13                        0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
pickleshare               0.7.5           py39hde42818_1002    conda-forge
pigz                      2.6                  h27826a3_0    conda-forge
pillow                    9.4.0            py39ha08a7e4_0    conda-forge
pip                       23.0               pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pluggy                    1.0.0            py39hf3d152e_4    conda-forge
ply                       3.11                       py_1    conda-forge
pooch                     1.6.0              pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.36             pyha770c72_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pulseaudio                16.1                 ha8d29e2_1    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyarrow                   10.0.1          py39hf0ef2fd_6_cpu    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pygments                  2.14.0             pyhd8ed1ab_0    conda-forge
pyopenssl                 23.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyqt                      5.15.7           py39h5c7b992_3    conda-forge
pyqt5-sip                 12.11.0          py39h227be39_3    conda-forge
pysocks                   1.7.1            py39hf3d152e_5    conda-forge
pytest                    7.2.1              pyhd8ed1ab_0    conda-forge
python                    3.9.15          hba424b6_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-duckdb             0.6.1            py39hb98b84a_1    conda-forge
python-isal               1.1.0            py39hb9d737c_1    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pytz                      2022.7.1           pyhd8ed1ab_0    conda-forge
qt-main                   5.15.6               h602db52_6    conda-forge
re2                       2022.06.01           h27087fc_1    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.2             pyhd8ed1ab_0    conda-forge
s2n                       1.3.31               h3358134_0    conda-forge
scikit-bio                0.5.6            py39h16ac069_4    conda-forge
scikit-learn              1.2.1            py39h86b2a18_0    conda-forge
scipy                     1.10.0           py39h7360e5f_0    conda-forge
setuptools                66.1.1             pyhd8ed1ab_0    conda-forge
sip                       6.7.6            py39h227be39_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
sqlite                    3.40.0               h4ff8645_0    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
traitlets                 5.9.0              pyhd8ed1ab_0    conda-forge
tzdata                    2022g                h191b570_0    conda-forge
unicodedata2              15.0.0           py39hb9d737c_0    conda-forge
urllib3                   1.26.14            pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.6              pyhd8ed1ab_0    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xcb-util                  0.4.0                h516909a_0    conda-forge
xcb-util-image            0.4.0                h166bdaf_0    conda-forge
xcb-util-keysyms          0.4.0                h516909a_0    conda-forge
xcb-util-renderutil       0.3.9                h166bdaf_0    conda-forge
xcb-util-wm               0.4.1                h516909a_0    conda-forge
xlsxwriter                3.0.3              pyhd8ed1ab_0    conda-forge
xopen                     1.7.0            py39hf3d152e_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstandard                 0.19.0           py39h29414ee_1    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

Issue with crabs db_download with mitofish

I am trying to download the mitofish database using the crabs conda installation (as far as I know Docker does not play well on the NeSI infrastructure). We are getting the following error:
crabs db_download --source mitofish --output mitofish.fasta --keep_original yes

downloading sequences from the MitoFish database
Traceback (most recent call last):
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/bin/crabs", line 1372, in
main()
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/bin/crabs", line 1369, in main
args.func(args)
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/bin/crabs", line 96, in db_download
dl_file = mitofish_download(url)
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/lib/python3.6/site-packages/function/module_db_download.py", line 139, in mitofish_download
os.remove('complete_partial_mitogenomes.zip')
FileNotFoundError: [Errno 2] No such file or directory: 'complete_partial_mitogenomes.zip'

How to Format DB for use with MitoFish

I downloaded the entire database from the site: http://mitofish.aori.u-tokyo.ac.jp/species/detail/download/?filename=download%2F/complete_partial_mitogenomes.zip

Then I used this command

$ makeblastdb -in mito-all.fa -dbtype nucl

Building a new DB, current time: 09/28/2023 16:06:29
New DB name:   /home/cbfgws6/MiFish/mifishdb/mito-all.fa
New DB title:  mito-all.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 825365 sequences in 24.0024 seconds.

I then attempt to run the pipline and I get this error:

$ mifish -d /home/cbfgws6/MiFish/WTRBA_21-22-23/ seq /home/cbfgws6/MiFish/mifishdb/ -t 124 -o WTRBA_ALL
Error: /home/cbfgws6/MiFish/mifishdb/ does not seem to be a valid database for NCBI BLAST+

What am I doing wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.