Giter Site home page Giter Site logo

biobakery / shortbred Goto Github PK

View Code? Open in Web Editor NEW
28.0 7.0 5.0 4.13 MB

ShortBRED is a pipeline to take a set of protein sequences, reduce them to a set of unique identifying strings ("markers"), and then search for these markers in metagenomic data and determine the presence and abundance of the protein families of interest.

Home Page: http://huttenhower.sph.harvard.edu/shortbred

Python 100.00%
public python biobakery

shortbred's People

Contributors

chuttenh avatar kelsthom13 avatar ljmciver avatar sagun98 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

shortbred's Issues

Key error

We are using ShortBRED on our EFI website (efi.igb.illinois.edu/efi-cgfp), and the code we have uses the diamond branch that was created over two years. DIAMOND is essential to our workflow because the BLAST option can take weeks to complete.

On this branch I consistently get key errors on line 566 of the MarkX function (process_identify.py). For example:

Finding overlap with reference database...
Traceback (most recent call last):
File "/home/groups/efi/apps/shortbred/sb_diamond_2018-08-17/shortbred_identify.py", line 361, in
dictGOICounts = pi.MarkX(dictSBFamilies,dictGOICounts)
File "/home/groups/efi/apps/shortbred/sb_diamond_2018-08-17/src/process_identify.py", line 566, in MarkX
dictOverlap[strName][i] = dictOverlap[strName][i] + 9999999
KeyError: 'A0A155BS15'

I am not sure what is causing this.

MUSCLE in ShortBRED

I installed ShortBRED yesterday by Conda.
When I download the example files and tried it using the following command
shortbred_identify.py --goi input_prots.faa --ref ref_prots.faa --markers mytestmarkers.faa --tmp example_identify

I got the following error

Traceback (most recent call last):
  File "/cluster/projects/nn8021k/Conda-env/my_shortbred/bin/shortbred_identify.py", line 281, in <module>
    "-dbtype", "prot", "-logfile", dirTmp + os.sep + "goidb.log"])
  File "/cluster/projects/nn8021k/Conda-env/my_shortbred/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['makeblastdb', '-in', 'example_identify/clust/clust.faa', '-out', 'example_identify/clustdb/goidb', '-dbtype', 'prot', '-logfile', 'example_identify/goidb.log']' returned non-zero exit status 1

When I tried ShorBRED with my data I got similar error

Invalid command line
Unknown option in

Checking dependencies...
Checking to make sure that installed version of usearch can make databases...
Traceback (most recent call last):
  File "/cluster/projects/nn8021k/Conda-env/my_shortbred/bin/shortbred_identify.py", line 274, in <module>
    pb.ClusterFams(dirClust, args.dClustID,strClustFile,args.dConsThresh,args.strMUSCLE )
  File "/cluster/projects/nn8021k/Conda-env/my_shortbred/bin/src/process_blast.py", line 320, in ClusterFams
    subprocess.check_call([strMUSCLE, "-in", str(fileFasta), "-out", str(fileAlign)])
  File "/cluster/projects/nn8021k/Conda-env/my_shortbred/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['muscle', '-in', '/cluster/projects/nn8021k/Databases/CARD_327/markers_shortbred/Temp/clust/fams/gb|AEZ36150.1|ARO:3000448|QepA1.faa', '-out', '/cluster/projects/nn8021k/Databases/CARD_327/markers_shortbred/Temp/clust/fams/gb|AEZ36150.1|ARO:3000448|QepA1.faa.aln']' returned non-zero exit status 1

Can you please help me with this?

Python version mismatch

Hello,

According to the Installation documentation (README.md), this requires Python 2.7.9, but according to setup.py it is Python 3.4. What versions of Python is this software compatible with?

Thanks!

Specifying strands in USEARCH within shortbred_quantify

Hi ShortBRED team & users,

I am trying to run shortbred_quantify on some paired-end reads. I am using the following command :

shortbred_quantify.py --markers mymarkers.faa --wgs R1_001-paired.fastq R2_001-paired.fastq --results results.txt --tmp tmp_quantify --avgreadBP 101

I am getting the following error, asking to specify strands in USEARCH:

---Fatal error---
Must specify -strand plus or both with nt db
('Using this version of usearch: ', u'v10.0.240')

Are there any workarounds for this issue, or known solutions?

Thank you in advance.

diamond branch

Hi. A while back we worked with you to make CGFP available on our web site to allow integration with SSN (sequence similarity networks). You made a branch that uses Diamond. We are still on that, and it seems that you have made some fixes (zero division, Python 3) that improve the code. Are there any plans to update that branch with the fixes from master, and perhaps merge diamond into master?

Protein of interest

Hi, I am quite confused when I was reading the manual.
I do not know how to get the protein of interest for each sample.
I am working with metagenome data to identify mobile genetic elements (MGEs) in my dataset.

So, to create the protein of interest for each sample, I have to do the assembly first, then do the annotation, then run Blast against the MGEs database. From this, I have to extract the contigs which were assigned to MGE.

Is this the way to get the protein of interest?
It will be annoying if this is the way!

Or can I run Shortbred and obtain the markers only from the database and then use these markers for quantification?

I have the short-read dataset and the MGEs database.

Can you please help me with understanding this?

Biopython, pip and Python-2 deprecation.

Pip no longer supports Python-2 because Python-2 was deprecated 01/01/2020. Biopython is only available for Python-3 (at least via pip). It appears that your code does not support Python-3 (e.g. shortbred_quantify.py : 320).

Questions :

  1. Are there any plans on updating your code to make it Python-3 compatible?
  2. Is there a Python-3 compatible version of this project? Maybe a successor project?

Thanks!

SBhits.txt header

Hello,

It would be possible to specify a header/column meanings for the SBhits.txt?

Thank you!

ValueError("Unknown format '%s'" % format) error in quantify step.

Dear users,

I wanted to used shortbread with a dataset of paired reads from several locations.
Following this tutorial https://github.com/biobakery/shortbred I managed to get my references ('marcadores.faa' to just differentiate it from the toy example). The markers were obtained considering the CARD database and the uniref database.

Now on Step 2 it gives me this error:

(shortbred) [andrespara@nagual SHORTBRED]$  ./shortbred-0.9.4/shortbred_quantify.py --markers marcadores.faa  --wgs AC-18_1.fq  --results exampleresults.txt --tmp example_quantify
Tested usearch. Appears to be working.
Treating input as a wgs file...
usearch v11.0.667_i86linux32, 4.0Gb RAM (264Gb total), 32 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch

License: personal use only

00:00 41Mb    100.0% Reading mytestmarkers.faa
00:00 7.1Mb   100.0% Masking (fastamino)      
00:00 49Mb    100.0% Word stats         
00:00 49Mb    100.0% Alloc rows
00:01 49Mb    100.0% Build index
00:01 36Mb   Buffers (34 seqs)  
00:01 53Mb    100.0% Seqs

List of files in WGS set:AC-18_1.fq

List of files in WGS set (after unpacking tarfiles):AC-18_1.fq 

Working on file 1 of 1
Traceback (most recent call last):
  File "./shortbred-0.9.4/shortbred_quantify.py", line 522, in <module>
    for seq in SeqIO.parse(streamWGS, strFormat):
  File "/export/home/andrespara/miniconda3/envs/shortbred/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 609, in parse
    raise ValueError("Unknown format '%s'" % format)
ValueError: Unknown format 'unknown'

I also tried it with the uncompressed file.
The problem persist even against the example custom mymarkers.faa reference obtained following the tutorial Step 1.

Here is a look of my fastq

@K00114:872:H25N2BBXY:2:1101:2311:1173 1:N:0:NCCAATAC
ACGCCAGTGGCTTTATCCTTGAATGTTTGGTTATCACCGAAACCGCTAGAAGCTTCCTCGATGTTTGGATTGATTGTGACATCAAAAGATTTCAAATCTTCAAACACATCAATCTCTGGAACGCCAGTATTGATTTTCTCGCTACTGTA
+
AAFFJJJFFFFFFJJFFFAJFFJJJ-FJJJF7FJJFFJJJJJJJJJJJJJFJJJAFJJJ<7FFFJ<FFFFJFFJJFF<JJJ<JJJJJJFFJJFJA<JJ<F-FJJFJJJFJFJJJJJJJJJFFFJJF-7A-7F-AFJJJFJFJJJJAFJJ
@K00114:872:H25N2BBXY:2:1101:2585:1191 1:N:0:NCCAATCT
GAAAACATCAAGTTTGAAGGCGCTCAATACCGCCAAGATATTGGCTACAGGGCGATTAAGGGCTAATGCCTTGCAGAACGCACGCGGGCTTAAACACTTTTGAAAATCTAGCGGTGTGATTGCCTATCCAACCGAGGCGATTTTTGGTT
+
<A<AFJJ<FAJFFJFFJJJJJJJJJAJJJJJJJJJ-F-<AFF<FJF<JFFJAFJ-7<JA-<-FJ7AAJJJJFAFF-FJJJAFJJJFJJJ-AFAFJ-7J7AF-AJAAJFJ<FJFF7AJJF-<AAJJJ7AFFJJJJJJJ7A-77A<-7AFF
@K00114:872:H25N2BBXY:2:1101:2788:1191 1:N:0:NCCAATAT
TAAATGCGGCGCTTCGCCGATTGCAGTTTTACCAGAAATCGAAATTTCAATGTGTTGAGAACAAGTTTGCTATCCAACATTTACACGAGGCGCTCGGCTGCTTGCGGGCGCGAACCAACAAGCGGGAGCTCCGCGGTGTTGAAGGAACA

I am using Fedora 32 and conda 4.8.3
I created a conda env solely for this program, here is the output of conda list.


# packages in environment at /export/home/andrespara/miniconda3/envs/shortbred:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       0_gnu    conda-forge
asn1crypto                1.3.0            py27h8c360ce_1    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
beautifulsoup4            4.8.2                    py27_0  
biopython                 1.70                np112py27_1    bioconda
blas                      1.1                    openblas    conda-forge
blast                     2.9.0                h20b68b9_1    bioconda
boost                     1.68.0          py27h8619c78_1001    conda-forge
boost-cpp                 1.68.0            h11c811c_1000    conda-forge
brotlipy                  0.7.0           py27h516909a_1000    conda-forge
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
cd-hit                    4.8.1                h8b12597_3    bioconda
certifi                   2019.11.28               py27_0    anaconda
cffi                      1.14.0           py27hd463f26_0    conda-forge
chardet                   3.0.4           py27h8c360ce_1006    conda-forge
conda                     4.8.3            py27h8c360ce_1    conda-forge
conda-build               3.18.11                  py27_0  
conda-package-handling    1.6.0            py27hdf8410d_2    conda-forge
contextlib2               0.6.0.post1                py_0    conda-forge
cryptography              2.8              py27h1ba5d50_0  
enum34                    1.1.10           py27h8c360ce_1    conda-forge
filelock                  3.0.12             pyh9f0ad1d_0    conda-forge
freetype                  2.10.2               he06d7ca_0    conda-forge
futures                   3.3.0            py27h8c360ce_1    conda-forge
glob2                     0.7                        py_0    conda-forge
gmp                       6.2.0                he1b5a44_2    conda-forge
gnutls                    3.6.13               h79a8f9a_0    conda-forge
icu                       58.2              hf484d3e_1000    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
ipaddress                 1.0.23                     py_0    conda-forge
jinja2                    2.11.2             pyh9f0ad1d_0    conda-forge
jpeg                      9d                   h516909a_0    conda-forge
ld_impl_linux-64          2.34                 h53a641e_5    conda-forge
libarchive                3.3.3             h3a8160c_1008    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
libgomp                   9.2.0                h24d8f2e_2    conda-forge
libiconv                  1.15              h516909a_1006    conda-forge
liblief                   0.9.0                hf8a498c_1    conda-forge
libpng                    1.6.37               hed695b0_1    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libtiff                   4.1.0                hc7e4089_6    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
libxml2                   2.9.10               he19cac6_1  
llvm-openmp               8.0.1                hc9558a2_0    conda-forge
lz4-c                     1.9.2                he1b5a44_1    conda-forge
lzo                       2.10              h14c3975_1000    conda-forge
markupsafe                1.1.1            py27hdf8410d_1    conda-forge
mmtf-python               1.1.2                      py_0    conda-forge
msgpack-python            1.0.0            py27h9e3301b_1    conda-forge
muscle                    3.8.1551             hc9558a2_5    bioconda
ncurses                   6.1               hf484d3e_1002    conda-forge
nettle                    3.4.1             h1bed415_1002    conda-forge
numpy                     1.12.1          py27_blas_openblash1522bff_1001  [blas_openblas]  conda-forge
olefile                   0.46                       py_0    conda-forge
openblas                  0.3.3             h9ac9557_1001    conda-forge
openmp                    8.0.1                         0    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
patchelf                  0.11                 he1b5a44_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
perl                      5.26.2            h516909a_1006    conda-forge
perl-archive-tar          2.32                    pl526_0    bioconda
perl-carp                 1.38                    pl526_3    bioconda
perl-common-sense         3.74                    pl526_2    bioconda
perl-compress-raw-bzip2   2.087           pl526he1b5a44_0    bioconda
perl-compress-raw-zlib    2.087           pl526hc9558a2_0    bioconda
perl-exporter             5.72                    pl526_1    bioconda
perl-exporter-tiny        1.002001                pl526_0    bioconda
perl-extutils-makemaker   7.36                    pl526_1    bioconda
perl-io-compress          2.087           pl526he1b5a44_0    bioconda
perl-io-zlib              1.10                    pl526_2    bioconda
perl-json                 4.02                    pl526_0    bioconda
perl-json-xs              2.34            pl526h6bb024c_3    bioconda
perl-list-moreutils       0.428                   pl526_1    bioconda
perl-list-moreutils-xs    0.428                   pl526_0    bioconda
perl-pathtools            3.75            pl526h14c3975_1    bioconda
perl-scalar-list-utils    1.52            pl526h516909a_0    bioconda
perl-types-serialiser     1.0                     pl526_2    bioconda
perl-xsloader             0.24                    pl526_0    bioconda
pillow                    6.2.1            py27hd70f55b_1    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
pkginfo                   1.5.0.1                    py_0    conda-forge
psutil                    5.7.0            py27hdf8410d_1    conda-forge
py-lief                   0.9.0            py27he1b5a44_1    conda-forge
pycosat                   0.6.3           py27hdf8410d_1004    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
pysocks                   1.7.1            py27h8c360ce_1    conda-forge
python                    2.7.15          h5a48372_1011_cpython    conda-forge
python-libarchive-c       2.9                      py27_0    conda-forge
python_abi                2.7                    1_cp27mu    conda-forge
pytz                      2020.1             pyh9f0ad1d_0    conda-forge
pyyaml                    5.3.1            py27hdf8410d_0    conda-forge
readline                  8.0                  hf8c457e_0    conda-forge
reportlab                 3.5.26           py27he686d34_0    anaconda
requests                  2.24.0             pyh9f0ad1d_0    conda-forge
ripgrep                   12.1.1               h516909a_0    conda-forge
ruamel_yaml               0.15.80         py27hdf8410d_1001    conda-forge
scandir                   1.10.0           py27hdf8410d_1    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
soupsieve                 1.9.5                    py27_0  
sqlite                    3.32.3               hcee41ef_0    conda-forge
tk                        8.6.10               hed695b0_0    conda-forge
tqdm                      4.47.0             pyh9f0ad1d_0    conda-forge
urllib3                   1.25.9                     py_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
zstd                      1.4.4                h6597ccf_3    conda-forge

Thanks for your help!

shortbred warming usearch error

Firstly, i was install shortbred by conda https://anaconda.org/HCC/shortbred.
(1) i run the following command, it same the usearch path is not right
(shortbred) dell@dell-PowerEdge-R720:~$ shortbred_quantify.py --markers /home/dell/raw_data/amrfinder/amrfinder.fa --wgs /home/dell/raw_data/rmhg19fastqcat/H1.rmhg19.fastp.fastq --results /home/dell/raw_data/shortbred/H1_results.txt --tmp /home/dell/raw_data/shortbred/H1_quantify
Traceback (most recent call last):
File "/home/dell/miniconda3/envs/shortbred/bin/shortbred_quantify.py", line 139, in
src.CheckDependency(args.strUSEARCH,"","usearch")
File "/home/dell/miniconda3/envs/shortbred/lib/python2.7/site-packages/shortbred_src/init.py", line 36, in CheckDependency
raise IOError("\nShortBRED was unable to find " + strIntendedProgram + " at the path " + strCmd + "\nPlease check that the program is installed and the path is correct. Please note that ShortBRED will not be able to use the unix alias for the program.")
IOError:
ShortBRED was unable to find usearch at the path usearch
Please check that the program is installed and the path is correct. Please note that ShortBRED will not be able to use the unix alias for the program.

(2)so i used --usearch to note the path, but it still return error, i wonder how to resove it.
(shortbred) dell@dell-PowerEdge-R720:~$ shortbred_quantify.py --markers /home/dell/raw_data/amrfinder/amrfinder.fa --usearch /home/dell/bioinfo/usearch11.0.667_i86linux32 --wgs /home/dell/raw_data/rmhg19fastqcat/H1.rmhg19.fastp.fastq --results /home/dell/raw_data/shortbred/H1_results.txt --tmp /home/dell/raw_data/shortbred/H1_quantify
Traceback (most recent call last):
File "/home/dell/miniconda3/envs/shortbred/bin/shortbred_quantify.py", line 139, in
src.CheckDependency(args.strUSEARCH,"","usearch")
File "/home/dell/miniconda3/envs/shortbred/lib/python2.7/site-packages/shortbred_src/init.py", line 43, in CheckDependency
pCmd = subprocess.Popen([strCmd],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
File "/home/dell/miniconda3/envs/shortbred/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/home/dell/miniconda3/envs/shortbred/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied

thanks.
Qiang

subprocess.CalledProcessError:

when i run python shortbred_identify.py --goi ./tmp/tonB.faa --ref ./tmp/tutorial_ref_prots.faa --markers ./markers.faa --tmpdir example_identify_tmp3 --usearch ./usearch i got an error:

Making BLAST database for the family consensus sequences...
Traceback (most recent call last):
File "shortbred_identify.py", line 281, in
"-dbtype", "prot", "-logfile", dirTmp + os.sep + "goidb.log"])
File "/home/chenh/miniconda3/envs/shortbred/lib/python2.7/subprocess.py", line 190, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['makeblastdb', '-in', 'example_identify_tmp3/clust/clust.faa', '-out', 'example_identify_tmp3/clustdb/goidb', '-dbtype', 'prot', '-logfile', 'example_identify_tmp3/goidb.log']' returned non-zero exit status 1

how i fix it?thanks

Merge diamond branch with master

Could I ask that the diamond branch be merged into master? We have been using it successfully for over a year on efi.igb.illinois.edu/efi-cgfp, and other than the issue I reported (#2) there have been no problems. It would be nice to have updates/bugfixes from master, rather than maintaining a separate branch.

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.