Comments (4)
Hi Arian,
Sorry about that, let's see if we can figure out an answer!
A couple things to check:
First, are you using the standard bacterial RefSeq database that is available for distribution with SAMSA2, or is this a custom-built database?
Second, it looks from this error like there's at least one entry where there's an ID but no description. Could you try re-running with the following code added to replace the existing line 138 of the standardized_DIAMOND_analysis_counter.py script?
try:
db_entry = db_entry[1][:-1]
except IndexError:
print(line)
print(db_entry)
print("this occurs at line: " + str(db_line_counter))
The idea here is to see the offending line, as well as where in your reference database this line occurs (for easy fixing, if there's an issue with it). This would require you to re-run the script, although you don't need to re-run the whole master_script; you could just run the command:
python $python_programs/standardized_DIAMOND_analysis_counter.py -I $file -D $RefSeq_db -O
(You'll need to correct the pathways and replace the variables with your output file and the RefSeq_db path.)
Essentially, this looks to be an issue with the reference database; happy to help identify where it's occurring and what, specifically, is causing it.
from samsa2.
Hi Sam!
Thanks for your quick response. I appreciate it.
I actually use a custom-built database containing information on three different organisms. While working with the data, I noticed that the Fungi dataset (which is included in the combined database) only had IDs and no descriptions. I've taken care of this by downloading new data from Ensembl and creating a new database for DIAMOND. It's currently running, but I'll let you know if I run into any issues.
By the way, I mentioned earlier that I downloaded Fungi protein sequences from Ensembl. Do you have any suggestions for other good sources or databases to use? I used NCBI refseq as well but I believe the sequence files were corrupted/weird, because the sequences had numbers or "_" in between which resulted an error with DIAMOND as well.
Thank you again for your help.
from samsa2.
Hello again Sam,
After replacing my old Fungi database with the one I downloaded from Ensembl, I've got list index out of range error in line 159. :/
Traceback (most recent call last):
File "/projects/tools/samsa2/python_scripts/standardized_DIAMOND_analysis_counter.py", line 159, in <module>
db_org = split_db_org[1] + " " + split_db_org[2]
IndexError: list index out of range
here is the the python script from line 144 to 159:
db_org = splitline[line.count("[")].strip()[:-1]
if db_org[0].isdigit():
split_db_org = db_org.split()
try:
if split_db_org[1] == "sp.":
db_org = split_db_org[0] + " " + split_db_org[1] + " " + split_db_org[2]
else:
db_org = split_db_org[1] + " " + split_db_org[2]
except IndexError:
try:
db_org = split_db_org[1]
except IndexError:
db_org = splitline[line.count("[")-1]
if db_org[0].isdigit():
split_db_org = db_org.split()
db_org = split_db_org[1] + " " + split_db_org[2]
I think the fungi dataset causes this issue, perhaps the annotations in the fasta files are not compatible with your script. Do you know where I could find a fungi dataset compatible with DIAMOND/your script? Thanks.
BTW, here is an example the header/description of Fungi data which causes the error (I think?):
>KGQ13519 pep supercontig:BBA1.0:contig00047:97669:98940:-1 gene:BBAD15_g702 transcript:KGQ13519 gene_biotype:protein_coding transcript_biotype:protein_coding description:tRNA-(ms[2]io[6]A)-hydroxylase
from samsa2.
Any solution for this?
from samsa2.
Related Issues (20)
- Working with Archaea HOT 3
- Header parsing using blast format 6 with salltitles HOT 1
- AttributeError: module 'time' has no attribute 'clock' HOT 5
- Help with script
- Error running DIAMOND_subsystems_analysis_counter.py HOT 9
- About full_database_download.bash HOT 8
- full_database_download.bash which versions? HOT 1
- Error with R script: DESeq_stats. HOT 11
- Combine step 5 outputs from different samples HOT 1
- Issue with colors in 'make_combined_graphs.R HOT 1
- Sequencing depth differs between experimental and control HOT 1
- Error in DIAMOND_analysis_counter.py HOT 1
- issue with get_normalized_count_table.R
- Error in R-analysis step HOT 2
- test_of_master_script.py HOT 2
- update HOT 1
- error while running run_DESeq_stats.R HOT 5
- Information regarding New available databases - Refseq and SEED database HOT 2
- Loosing too many reads after Pear step HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from samsa2.