Comments (10)
from biocode.
Did you make any progress here? Also, the title said you were attempting this using pip2, but the body of your report said pip3. Just in case, biocode isn't supported in Python 2.x
from biocode.
Marking as resolved, since the current code no longer has the dependency which caused this error.
from biocode.
Hi Jorvis,
thanks for the reply, I've been away.
I actually used` pip install biocode, to install biocode-0.5.3
just figuring out how to use it to convert my gff or gtf to GFF3
Kay
from biocode.
OK, so you got a successful installation now? If so, you can just run the convert_augustus_to_gff3.py script, which should now be in your PATH. Run it without arguments to see usage:
$ convert_augustus_to_gff3.py
usage: convert_augustus_to_gff3.py [-h] -i INPUT -o OUTPUT
convert_augustus_to_gff3.py: error: the following arguments are required: -i/--input, -o/--output
from biocode.
what I've is to
$ python
then;
import biocode.gff
(assemblies, features) = biocode.gff.get_gff3_features( "/pub38/kayussky/gff_to_GTF3/augustus_erins.gff" )
says;
Traceback (most recent call last):
File "", line 1, in
File "/pub38/kayussky/biocode/lib/biocode/gff.py", line 252, in get_gff3_features
atts = column_9_dict(cols[8])
File "/pub38/kayussky/biocode/lib/biocode/gff.py", line 101, in column_9_dict
raise Exception("Bad column 9 format: {0}".format(colstring) )
Exception: Bad column 9 format: g1
it does bring me back to why i wanted to install biocode in the first place, seems the last column (9), which contains the attributes is in a bad format, thats why I couldnt use it in the first place. Unfortunately, I just trying to learn how to maybe use python to re-write affected columns.
from biocode.
Do you mind attaching the file or putting it somewhere I can download it? Also, which version of Augustus are you using?
The biocode.gff.get_gff3_features() will only work if the GFF3 is valid already and the output from Augustus isn't. The script I mentioned is used to transform the August GFF into actual, valid GFF3, after which you can use the biocode libraries on it.
from biocode.
so I tried what you suggested:
convert_augustus_to_gff3.py -i augustus_erins.gtf -o new_augustus
my input looks like,
scaffold10x_1 AUGUSTUS gene 3591 4530 0.27 - . g1
scaffold10x_1 AUGUSTUS transcript 3591 4530 0.27 - . g1.t1
scaffold10x_1 AUGUSTUS stop_codon 3591 3593 . - 0 transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1 AUGUSTUS CDS 3591 3859 0.34 - 2 transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1 AUGUSTUS exon 3591 3859 . - . transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1 AUGUSTUS intron 3860 4022 0.28 - . transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1 AUGUSTUS CDS 4023 4530 0.63 - 0 transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1 AUGUSTUS exon 4023 4530 . - . transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1 AUGUSTUS start_codon 4528 4530 . - 0 transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1 AUGUSTUS gene 26186 31433 0.2 - . g2
scaffold10x_1 AUGUSTUS transcript 26186 31433 0.2 - . g2.t1
scaffold10x_1 AUGUSTUS stop_codon 26186 26188 . - 0 transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS CDS 26186 26304 0.37 - 2 transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS exon 26186 26304 . - . transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS intron 26305 29389 0.28 - . transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS CDS 29390 30220 0.39 - 2 transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS exon 29390 30220 . - . transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS intron 30221 30844 0.45 - . transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS CDS 30845 31433 0.41 - 0 transcript_id "g2.t1"; gene_id "g2";
scaffold10x_1 AUGUSTUS exon 30845 31433 . - . transcript_id "g2.t1"; gene_id "g2";
the output just says;
##gff-version 3
and that's it. so I think its the last columns of the input files I need to work on.
Thanks for your assistance so far
from biocode.
OK, that helps. Can you tell me what version of Augustus was used to generate this file? Mario does like changing the output formats occasionally, so I'll need to modify the conversion script to account for the different versions.
from biocode.
Let's please shift this discussion to issue #51, since the installer isn't an issue here.
from biocode.
Related Issues (20)
- NameError: name 'utils' is not defined HOT 13
- convert_gff3_to_ncbi_tbl HOT 5
- Syntax error on gff.py HOT 4
- Exclude mRNA features in bacterial TBL exports
- Attribute error for update_selected_column9_values.py HOT 1
- Biocode.gff module error HOT 2
- [convert_genbank_to_gff3.py] key_error: locus_tag HOT 5
- AttributeError: type object 'str' has no attribute 'maketrans' HOT 2
- AttributeError: 'Gene' object has no attribute 'add_CDS' HOT 4
- Insert EC numbers into chado database issue HOT 5
- convert_augustus_to_gff3.py error HOT 6
- Conda based install HOT 2
- convert_gff3_to_ncbi_tbl.py HOT 2
- convert_gff_to_ncbi_tbl.py HOT 3
- Incorrect parent features from convert_tRNAScanSE_to_gff3.pl HOT 2
- Formatting Issue? HOT 2
- biocode error HOT 16
- product info not printout in tbl HOT 2
- fasta/fasta_simple_stats.py fails on any file with only one sequence
- [convert_genbank_to_gff3.py] No Locus_tag present in my genbank file HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biocode.