nal-i5k / gff3toolkit Goto Github PK
View Code? Open in Web Editor NEWPython programs for processing GFF3 files
License: Other
Python programs for processing GFF3 files
License: Other
Related to sphinx-doc/sphinx#5490
Hi GFF3toolkit developers. I'd like to seek for your help on this issue. QC part went well, but running the fix step raises a ValueError exception. For reference, I'm attaching the gff file and the error.txt here.
Thank you.
How can I use this command to merge 3 or more gff files with different formats(like gff3/gff/gbff or gff3.gz/gff3)
Thank you for your development of the great gff3 tools.
While I use "gff3_to_fasta -st user_defined -u mRNA CDS" to get fasta file, it seems to get child (this is CDS) fasta.
I'm just wondering about how to get parent fasta (this is mRNA) in this situation.
Any reply will be welcome.
At least gff3-py will continue to be maintained, we should switch back to it: https://github.com/hotdogee/gff3-py/releases/tag/1.0.0 rather than including another copy inside the repo.
I've run into a problem with the statistics output file and having trouble figuring out the cause. @tony006469 can you help? I'll send you the input files and command line in an email.
Here's the relevant stderror:
INFO Print QC report at diaall_apollo_annotations_1-28-2019-QC.txt
INFO Print QC statistic report at diaall_apollo_annotations_1-28-2019_stats.txt
Traceback (most recent call last):
File "/home/mpoelchau/.local/bin/gff3_QC", line 9, in <module>
load_entry_point('gff3tool==1.4.4', 'console_scripts', 'gff3_QC')()
File "/home/mpoelchau/.local/lib/python2.7/site-packages/gff3tool/bin/gff3_QC.py", line 134, in script_main
error_counts[s['eCode']]= {'count':0,'etag':ERROR_INFO[s['eCode']]}
KeyError: 'Esf0012'
For gff3_to_fasta.py: If no exons are present in the input gff3 file, and 'trans', 'pep', or 'cds' are specified as sequence types, the program will throw a warning (WARNING There is no exon feature for rna88 in the input gff. CDS features are used for splicing instead.), and will use CDS features for splicing. It would be better for the program to throw the same warning, and then not produce any output for the offending gene models.
I got the following error with gff3_fix:
Traceback (most recent call last):
File "/app/data/mpoelchau/python3_venv/bin/gff3_fix", line 9, in <module>
load_entry_point('gff3tool==2.0.1', 'console_scripts', 'gff3_fix')()
File "/app/data/mpoelchau/GFF3toolkit/gff3tool/bin/gff3_fix.py", line 95, in script_main
gff3_fix.fix.main(gff3=gff3, output_gff=args.output_gff, error_dict=error_dict, line_num_dict=line_num_dict, logger=logger_null)
File "/app/data/mpoelchau/GFF3toolkit/gff3tool/lib/gff3_fix/fix.py", line 689, in main
split(gff3=gff3, error_list=error_dict[error_code], logger=logger)
File "/app/data/mpoelchau/GFF3toolkit/gff3tool/lib/gff3_fix/fix.py", line 180, in split
childgroup = connected_compoents(childrenlist, hitpair)
File "/app/data/mpoelchau/GFF3toolkit/gff3tool/lib/gff3_fix/fix.py", line 275, in connected_compoents
for v in nodelist.itervalues():
AttributeError: 'dict' object has no attribute 'itervalues'
It looks like the problem is with the connected_compoents
function, which is used by the split
function. This function is not currently tested with our test files.
When I run gff3_merge I get the following error
Traceback (most recent call last):
File "/tools/python/3.6.3/bin/gff3_merge", line 8, in
sys.exit(script_main())
File "/tools/python/3.6.3/lib/python3.6/site-packages/gff3tool/bin/gff3_merge.py", line 229, in script_main
main(args.gff_file1, args.gff_file2, args.fasta, report_fh, args.output_gff, args.all, args.auto_assignment, args.user_defined_file1, args.user_defined_file2, logger=logger_stderr)
File "/tools/python/3.6.3/lib/python3.6/site-packages/gff3tool/bin/gff3_merge.py", line 85, in main
gff3_merge.merge.main(autoReviseGff, gff_file2, output_gff, report, user_defined1, user_defined2, logger)
File "/tools/python/3.6.3/lib/python3.6/site-packages/gff3tool/lib/gff3_merge/merge.py", line 22, in main
gff3_sort.main(gff_file1, output='WA_sorted.gff', logger=logger)
File "/tools/python/3.6.3/lib/python3.6/site-packages/gff3tool/bin/gff3_sort.py", line 279, in main
report.write(TwoParent(child['attributes']['ID'],exon))
File "/tools/python/3.6.3/lib/python3.6/site-packages/gff3tool/bin/gff3_sort.py", line 138, in TwoParent
attributes_line = ";".join("=".join((str(k),str(v))) for k,v in attributes.iteritems())
AttributeError: 'dict' object has no attribute 'iteritems'
There seems to be a python2 syntax used in the gff3_sort.py script.
Hi,
I installed gff3toolkit by pip install, using python 3.5.
My code is
/work/LAS/mash-lab/jing/bin/Anaconda3/envs/mypy3.5/bin/gff3_merge -g1 transdecoder_final.gff3 -g2 Zm_B73.gff3 -f Zm-B73-REFERENCE-NAM-5.0.fa -og merged_final+anno.gff3 -r merged_fina_anno_report.txt
It's failed when identify types of replacement based on replace tag.
INFO Extract sequences from transdecoder_final.gff3...
INFO Extract CDS sequences...
INFO Extract premature transcript sequences...
INFO Extract sequences from Zm_B73.gff3...
INFO Extract CDS sequences...
INFO Extract premature transcript sequences...
INFO Catenate transdecoder_final.gff3 and Zm_B73.gff3...
INFO Make blastDB for CDS sequences from auto_replace_tag/tmp/gff2_cds.fa...
INFO Sequence alignment for cds fasta files between transdecoder_final.gff3 and Zm_B73.gff3...
INFO Find CDS matched pairs between transdecoder_final.gff3 and Zm_B73.gff3...
INFO Make blastDB for premature transcript sequences from auto_replace_tag/tmp/gff2_pre_trans.fa...
INFO Sequence alignment for premature transcript fasta files between transdecoder_final.gff3 and Zm_B73.gff3...
INFO Find premature transcript matched pairs between transdecoder_final.gff3 and Zm_B73.gff3...
INFO Generate auto_replace_tag/check1.txt for Check Point 1 internal reviewing...
INFO Reading revision file... (auto_replace_tag/check1.txt)
INFO Reading gff3 file... (transdecoder_final.gff3)
INFO Writing summary report (auto_replace_tag/replace_tag_report.txt)...
INFO Writing revised gff: (auto_replace_tag/Revised_transdecoder_final.gff3)...
INFO ========== Check whether there are missing replace tags ==========
INFO - All models have replace tags.
INFO ========== Merge the two gff files ==========
INFO Sorting the WA gff by following the order of Scaffold number and coordinates...
INFO Sorting and printing out...
INFO Sorting the other gff by following the order of Scaffold number and coordinates...
INFO Sorting and printing out...
INFO Reading WA gff3 file...
INFO Reading the other gff3 file...
INFO Identifying types of replacement based on replace tag...
Traceback (most recent call last):
File "/work/LAS/mash-lab/jing/bin/Anaconda3/envs/mypy3.5/bin/gff3_merge", line 8, in <module>
sys.exit(script_main())
File "/work/LAS/mash-lab/jing/bin/Anaconda3/envs/mypy3.5/lib/python3.5/site-packages/gff3tool/bin/gff3_merge.py", line 229, in script_main
main(args.gff_file1, args.gff_file2, args.fasta, report_fh, args.output_gff, args.all, args.auto_assignment, args.user_defined_file1, args.user_defined_file2, logger=logger_stderr)
File "/work/LAS/mash-lab/jing/bin/Anaconda3/envs/mypy3.5/lib/python3.5/site-packages/gff3tool/bin/gff3_merge.py", line 85, in main
gff3_merge.merge.main(autoReviseGff, gff_file2, output_gff, report, user_defined1, user_defined2, logger)
File "/work/LAS/mash-lab/jing/bin/Anaconda3/envs/mypy3.5/lib/python3.5/site-packages/gff3tool/lib/gff3_merge/merge.py", line 34, in main
ReplaceGroups = replace_OGS.Groups(WAgff=gff3, Pgff=gff3M, outsideNum=1, user_defined1=user_defined1, user_defined2=user_defined2, logger=logger_null)
File "/work/LAS/mash-lab/jing/bin/Anaconda3/envs/mypy3.5/lib/python3.5/site-packages/gff3tool/lib/replace_OGS.py", line 253, in __init__
self.name2id(Pgff, user_defined2)
File "/work/LAS/mash-lab/jing/bin/Anaconda3/envs/mypy3.5/lib/python3.5/site-packages/gff3tool/lib/replace_OGS.py", line 483, in name2id
idprefix = tmp.groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'
Please help me to solve the probelm.
Thanks,
Jing
Thanks for developing this tool. May I know how gff3_to_fasta handles iupac bases in the genome fasta for translation?
When using gff3_merge, I had to use the user-defined file option. The sequence extraction and blast seemed to run fine, but the replacement still didn't work. I tried the same command with a python 2 installation and the expected models were replaced.
The dataset I was working with is private, so I can't post it - but I will keep it around until we have capacity to fix this (probably in February).
Hi-
just looking over https://buildmedia.readthedocs.org/media/pdf/gff3toolkit/latest/gff3toolkit.pdf
I noticed that sections like:
2.1.3 Single feature (Esf)
4.2 gff3_fix
and some others appear to be missing tabular formatting, making them a little difficult to assimilate by the prospective user. Looks like it is likely a case of :
https://stackoverflow.com/questions/44461762/sphinx-is-not-recognising-my-markdown-tables
the md from which I'm guessing it was generated renders quite nicely in
https://github.com/NAL-i5K/GFF3toolkit/blob/master/docs/gff3_fix.py-documentation.md
so I'm just going to refer to those, but thought I'd mention it anyway since I had stumbled across the pdf first. may not be worth fixing, but https://pypi.org/project/sphinx-markdown-tables/ as mentioned in the SO post may be worth a try?
thanks for a nicely documented set of tools!
Hello,
I ran gff3_sort using the command below and got the error that follows
gff3_sort --gff_file mysample_results20220802/annot.gff --output_gff mysample_sort.gff3
ERROR [SeqID] SeqID does not end with a number.
I went ahead and added the flag -r
gff3_sort --gff_file mysample_results20220802/annot.gff --output_gff mysample_sort.gff3 -r
But I got this
Traceback (most recent call last):
File "/apps/gff3toolkit/2.0.3/bin/gff3_sort", line 8, in
sys.exit(script_main())
File "/apps/gff3toolkit/2.0.3/lib/python3.9/site-packages/gff3tool/bin/gff3_sort.py", line 437, in script_main
main(args.gff_file, output=args.output_gff, isoform_sort=args.isoform_sort, sorting_order=sorting_order, logger=logger_stderr, reference=args.reference)
File "/apps/gff3toolkit/2.0.3/lib/python3.9/site-packages/gff3tool/bin/gff3_sort.py", line 223, in main
sequence_regions[sequence_region['seqid']] = (sequence_region['start'], sequence_region['end'])
KeyError: 'end'
It seems to me that the above "Line 6" must be skipped in the file annot.gff
Any thought on that?
Thanks,
TJ
Intra-model: Multiple features within a model (Ema)
The error category 'Intra-model' collects formatting errors that can be
found by jointly considering multiple features within a gene model, such
as gene, mRNA, exon, and CDS features. Errors in this category are given
an 'Error\_Code' starting with 'Ema'.
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Error\_Code | Error\_Tag | Flag type |
+===============+=========================================================================================+============================+
| Ema0001 | Parent feature start and end coordinates exceed those of child features | Warning |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0002 | Protein sequence contains internal stop codons | Warning |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0003 | This feature is not contained within the parent feature coordinates | Warning |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0004 | Incomplete gene feature that should contain at least one mRNA, exon, and CDS | Info |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0005 | Pseudogene has invalid child feature type | Info (we need to replace this function in the future) |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0006 | Wrong phase | Info (we need to replace this function in the future) |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0007 | CDS and parent feature on different strands | Warning |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0008 | Warning for distinct isoforms that do not share any regions | Warning |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
| Ema0009 | Incorrectly merged gene parent? Isoforms that do not share coding sequences are found | Warning |
+---------------+-----------------------------------------------------------------------------------------+----------------------------+
Inter-model: Multiple features across models (Emr)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The error category 'Inter-model' collects formatting errors that can be
found by comparing multiple gene models. Errors in this category are
given an 'Error\_Code' starting with 'Emr'.
+---------------+----------------------------------+----------------------------+
| Error\_Code | Error\_Tag | Checked if non-canonical |
+===============+==================================+============================+
| Emr0001 | Duplicate transcript found | Warning |
+---------------+----------------------------------+----------------------------+
| Emr0002 | Incorrectly split gene parent? | Warning |
+---------------+----------------------------------+----------------------------+
| Emr0003 | Duplicate ID | Error |
+---------------+----------------------------------+----------------------------+
Single feature (Esf)
~~~~~~~~~~~~~~~~~~~
The error category 'Single Feature' collects formatting errors that can
be found by searching the GFF3 file line by line. Errors in this
category are given an 'Error\_Code' starting with 'Esf'.
+---------------+--------------------------------------------------------------------------+----------------------------+
| Error\_Code | Error\_Tag | Checked if non-canonical |
+===============+==========================================================================+============================+
| Esf0001 | Feature type may need to be changed to pseudogene | Info |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0002 | Start/Stop is not a valid 1-based integer coordinate | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0003 | strand information missing | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0004 | Seqid not found in any ##sequence-region | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0005 | Start is less than the ##sequence-region start | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0006 | End is greater than the ##sequence-region end | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0007 | Seqid not found in the embedded ##FASTA | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0008 | End is greater than the embedded ##FASTA sequence length | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0009 | Found Ns in a feature using the embedded ##FASTA | Info |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0010 | Seqid not found in the external FASTA file | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0011 | End is greater than the external FASTA sequence length | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0012 | Found Ns in a feature using the external FASTA | Info |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0013 | White chars not allowed at the start of a line | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0014 | ##gff-version" missing from the first line | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0015 | Expecting certain fields in the feature | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0016 | ##sequence-region seqid may only appear once | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0017 | Start/End is not a valid integer | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0018 | Start is not less than or equal to end | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0019 | Version is not "3" | Info (we'll need to look into this later) |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0020 | Version is not a valid integer | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0021 | Unknown directive | Info |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0022 | Features should contain 9 fields | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0023 | escape certain characters | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0024 | Score is not a valid floating point number | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0025 | Strand has illegal characters | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0026 | Phase is not 0, 1, or 2, or not a valid integer | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0027 | Phase is required for all CDS features | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0028 | Attributes must escape the percent (%) sign and any control characters | Info |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0029 | Attributes must contain one and only one equal (=) sign | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0030 | Empty attribute tag | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0031 | Empty attribute value | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0032 | Found multiple attribute tags | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0033 | Found ", " in a attribute, possible unescaped | Info |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0034 | attribute has identical values (count, value) | Info |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0035 | attribute has unresolved forward reference | Info (for now, need to look into this more) |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0036 | Value of a attribute contains unescaped "," | Info (for now, need to check whether multiple Target values are possible) |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0037 | Target attribute should have 3 or 4 values | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0038 | Start/End value of Target attribute is not a valid integer coordinate | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0039 | Strand value of Target attribute has illegal characters | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0040 | Value of Is\_circular attribute is not "true" | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
| Esf0041 | Unknown reserved (uppercase) attribute | Error |
+---------------+--------------------------------------------------------------------------+----------------------------+
I was trying to check out the sequences from a gff and the program stopped with an error, even through I supplied the requested flag:
$ python -V
Python 2.7.13
$ python ../../bin/gff3_to_fasta.py -f Ofas.scaffolds.fa -st all -g oncfas_OGSv1.2_original.gff -o test
INFO Checking gff file (oncfas_OGSv1.2_original.gff)...
INFO Checking genome fasta (Ofas.scaffolds.fa)...
INFO Specifying sequence type: (all)...
usage: gff3_to_fasta.py [-h] [-g GFF] [-f FASTA] [-st SEQUENCE_TYPE]
[-d DEFLINE] [-o OUTPUT_PREFIX] [-noQC] [-v]
Extract sequences from specific regions of genome based on gff file.
Testing enviroment:
1. Python 2.7
Required inputs:
1. GFF3: specify the file name with the -g argument
2. Fasta file: specify the file name with the -f argument
3. Output prefix: specify with the -o argument
Outputs:
1. Fasta formatted sequence file based on the gff3 file.
Example command:
python2.7 bin/gff3_to_fasta.py -g example_file/example.gff3 -f example_file/reference.fa -st all -d simple -o test_sequences
optional arguments:
-h, --help show this help message and exit
-g GFF, --gff GFF Genome annotation file in GFF3 format
-f FASTA, --fasta FASTA
Genome sequences in FASTA format
-st SEQUENCE_TYPE, --sequence_type SEQUENCE_TYPE
Type of sequences you would like to extract:
"all" - FASTA files for all types of sequences listed below;
"gene" - gene sequence for each record;
"exon" - exon sequence for each record;
"pre_trans" - genomic region of a transcript model (premature transcript);
"trans" - spliced transcripts (only exons included);
"cds" - coding sequences;
"pep" - peptide sequences.
-d DEFLINE, --defline DEFLINE
Defline format in the output FASTA file:
"simple" - only ID would be shown in the defline;
"complete" - complete information of the feature would be shown in the defline.
-o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
Prefix of output file name
-noQC, --quality_control
Specify this option if you do not want to excute quality control for gff file. (default: QC is excuted)
-v, --version show program's version number and exit
ERROR Required field -st missing...
I cannot install gff3tool in a freshly created conda env (including python=2.7 and perl):
ERROR: Failed building wheel for gff3tool
Running setup.py clean for gff3tool
Failed to build gff3tool
Installing collected packages: gff3tool
Running setup.py install for gff3tool ... error
ERROR: Complete output from command /mnt/scratch_dir/userconda/envs/gff3tool_env/bin/python -u -c 'import setuptools, tokenize;file='"'"'/tmp/pip-install-m6SnWy/gff3tool/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-R72i2q/install-record.txt --single-version-externally-managed --compile:
ERROR: running install
running build
error: [Errno 17] File exists: '/tmp/pip-install-m6SnWy/gff3tool/gff3tool/lib/ncbi-blast+'
----------------------------------------
ERROR: Command "/mnt/scratch_dir/user/conda/envs/gff3tool_env/bin/python -u -c 'import setuptools, tokenize;file='"'"'/tmp/pip-install-m6SnWy/gff3tool/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-R72i2q/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-m6SnWy/gff3tool/
Hello,
I would like to use the GFF3toolkit to remove some gene models (all with one isoform, from an external list) from a gff3 file. I first run
gff3_QC -g assembly_MAKER1.gff -f assembly.fa -o QC_report1 -s QC_stats1
and got this report:
==> QC_report <==
Line_num Error_code Error_level Error_tag
['Line 1'] Esf0014 Error ["##gff-version" missing from the first line]
['Line 15079'] Esf0012 Info [Found 5 Ns in CDS feature of length 296 using the external FASTA, consists of 1 segment (start, length): (210940, 5)]
==> QC_stats <==
Error_code Number_of_problematic_models Error_level Error_tag
Esf0014 1 Error ##gff-version" missing from the first line
Esf0012 1 Info Found Ns in a feature using the external FASTA
(I can fix the header myself)
I wonder how I can use gff3_fix
to remove ~1500 genes (gene, mRNA, exon, and CDS lines): is it possible to create a 4-column file to submit to -qc_r
? Can I use any of the error codes that have a "delete_model" function? Is there a way to specify the gene ID instead of the line number?
Also, is there a feature to remove gene models whose protein sequence does not start with M?
Thanks,
Dario
Thanks you for your wonderful tool kit!
When I run gff3_fix follow your instruction, I encountered a error:
This is the error report generated by gff3_QC:
This is my gff file:
Finalv2.rep.genename.v2.nored.zip
Can you help me fix this problem?
Hello,
So I have a gff3 file where there are lines missing ID (column 9). Those lines mainly concerns CDS' and stop codons. I tried to use "gff3_ID_generator.py" with the command :
"python3 gff3_ID_generator.py -g ../../../UK0001.gff3 -og UK0001Mo.gff3" .
With UK0001.gff3 being my gff file with missing IDs and UK0001Mo.gff3 being my desired output.
I get this error as the output:
INFO Reading input gff3 file: (../../../UK0001.gff3)
INFO Generate new ID for features in (../../../UK0001.gff3)
Traceback (most recent call last):
File "/home/mestiri/GFF3toolkit/gff3tool/lib/gff3_ID_generator.py", line 333, in
main(in_gff=args.gff, merge_report=args.merge_report, out_merge_report=args.out_merge_report, out_gff=args.output_gff, uuid_on=args.universally_unique_identifier, prefix=args.idprefix, digitlen=args.digitlen, report=args.report, alias=args.alias)
File "/home/mestiri/GFF3toolkit/gff3tool/lib/gff3_ID_generator.py", line 260, in main
if descend['attributes']['ID'] not in ID_dict:
KeyError: 'ID'
I don't understand what it means, to be honest. Is there something that I misunderstood concerning this python script ?
Thanks for your help ! Have a nice day !
Hey All,
Really like the tool and am preferring its runtime over things like genome tools, which terminates after the first incongruity is found.
One thing, many fasta and gff3 files are conveniently compressed and indexed using bgzip to be served in various browsers using block compression; as well as conserve disk space. It would be awesome if the tool could read these if provided. The tool wouldn't really need to know if they were block compressed as gzip is compatible for the decompression.
Below I modified the necessary bits to run gff3_QC on compressed, uncompressed or combinations.
--- a/gff3tool/lib/check_gene_parent/find_wrongly_split_gene_parent.pl
+++ b/gff3tool/lib/check_gene_parent/find_wrongly_split_gene_parent.pl
@@ -19,7 +19,11 @@ my %id2owner = ();
my $line = 0;
my $typeflag = 0;
print "Reading the gff file: $gff...\n";
-open FI, "$gff" or die "[Error] Cannot open $gff.";
+if ( $gff =~ /\.gz$/ ){ # gzip support
+ open FI, "<:gzip", $gff or die "[Error] Cannot open $gff.";
+} else {
+ open FI, "$gff" or die "[Error] Cannot open $gff.";
+}
--- a/gff3tool/lib/gff3/gff3.py
+++ b/gff3tool/lib/gff3/gff3.py
@@ -16,6 +16,7 @@ try:
except ImportError:
from urllib.parse import quote, unquote
import re
+import gzip
import string
import logging
import gff3tool.lib.ERROR as ERROR
@@ -69,7 +70,10 @@ def fasta_file_to_dict(fasta_file, id=True, header=False, seq=False):
"""
fasta_file_f = fasta_file
if isinstance(fasta_file, str):
- fasta_file_f = open(fasta_file, 'r')
+ if fasta_file.endswith('.gz'):
+ fasta_file_f = gzip.open(fasta_file, 'rt') # gzip support
+ else:
+ fasta_file_f = open(fasta_file, 'r')
fasta_dict = OrderedDict()
keys = ['id', 'header', 'seq']
@@ -528,7 +532,10 @@ class Gff3(object):
gff_fp = gff_file
if isinstance(gff_file, str):
- gff_fp = open(gff_file, 'r')
+ if gff_file.endswith('.gz'):
+ gff_fp = gzip.open(gff_file, 'rt') # gzip support
+ else:
+ gff_fp = open(gff_file, 'r')
Anyway, really like the tool and the general idea of the various classes of incongruity with the sequence ontology. Looking forward to further development.
Thank you
Hi,
when i run the command "python bin/gff-QC.py -g test.gff3 -f ~/data/rice_genome/test.fasta -o test.txt",
i get the follow error information.
INFO Checking errors in the gff files: (test.gff3)...
Traceback (most recent call last):
File "bin/gff-QC.py", line 90, in
if not gff3.check_parent_boundary():
File "bin/../lib/gff3_modified/gff3_modified.py", line 243, in check_parent_boundary
self.add_line_error(line, {'message': '{2:s}: {0:s}: {1:s}'.format(parent_feature[0]['attributes']['ID'], ','.join(['({0:s}, {1:d}, {2:d})'.format(line['seqid'], line['
start'], line['end']) for line in parent_feature]), ERROR_INFO['Ema0003']), 'error_type': 'BOUNDS', 'location': 'parent_boundary', 'eCode': 'Ema0003'})IndexError: list index out of rang
Could i know which lines in the gff have the error to fix it ?
Thanks.
I ran gff3_QC with no problems.
When I run gff3_fix i get the following error:
INFO Checking QC report file (error.txt)...
INFO Checking GFF3 file (DUL_02_latest_Melon_V4_liftoff.gff)...
INFO Reading QC report file: (error.txt)...
INFO Reading GFF3 file: (DUL_02_latest_Melon_V4_liftoff.gff)...
Traceback (most recent call last):
File "/home/eoren/miniconda3/bin/gff3_fix", line 8, in <module>
sys.exit(script_main())
File "/home/eoren/miniconda3/lib/python3.9/site-packages/gff3tool/bin/gff3_fix.py", line 95, in script_main
gff3_fix.fix.main(gff3=gff3, output_gff=args.output_gff, error_dict=error_dict, line_num_dict=line_num_dict, logger=logger_null)
File "/home/eoren/miniconda3/lib/python3.9/site-packages/gff3tool/lib/gff3_fix/fix.py", line 683, in main
fix_phase(gff3=gff3, error_list=error_dict[error_code], line_num_dict=line_num_dict, logger=logger)
File "/home/eoren/miniconda3/lib/python3.9/site-packages/gff3tool/lib/gff3_fix/fix.py", line 437, in fix_phase
phase = (3 - ((CDS['end'] - CDS['start'] + 1 - phase) % 3)) % 3
TypeError: unsupported operand type(s) for -: 'int' and 'str'
Hi,
After successfully using gff3_QC, gff3_fix is giving me the following error:
(genometools) [safiand@login001 grass]$ gff3_fix -qc_r test.txt -g turneri_annotation.gff3 -og new_corrected.gff3
INFO Checking QC report file (test.txt)...
INFO Checking GFF3 file (turneri_annotation.gff3)...
INFO Reading QC report file: (test.txt)...
INFO Reading GFF3 file: (turneri_annotation.gff3)...
Traceback (most recent call last):
File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/bin/gff3_fix", line 8, in <module>
sys.exit(script_main())
File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/lib/python3.10/site-packages/gff3tool/bin/gff3_fix.py", line 95, in script_main
gff3_fix.fix.main(gff3=gff3, output_gff=args.output_gff, error_dict=error_dict, line_num_dict=line_num_dict, logger=logger_stderr)
File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/lib/python3.10/site-packages/gff3tool/lib/gff3_fix/fix.py", line 692, in main
split(gff3=gff3, error_list=error_dict[error_code], logger=logger)
File "/camp/home/safiand/home/users/safiand/.conda/envs/genometools/lib/python3.10/site-packages/gff3tool/lib/gff3_fix/fix.py", line 165, in split
childrenlist.append(c1['attributes']['ID'])
KeyError: 'ID'
So I tried the gff3_ID_generator.py, but this one also give me a similar message:
(genometools) [safiand@login001 grass]$ python gff3_ID_generator.py -g turneri_annotation.gff3 -og new.gff3
INFO Reading input gff3 file: (turneri_annotation.gff3)
INFO Generate new ID for features in (turneri_annotation.gff3)
Traceback (most recent call last):
File "/camp/lab/cardoso-moreiam/home/users/safiand/genome_annotation/turneri/busco/turneri_rna_prot_multiples_species/grass/gff3_ID_generator.py", line 333, in <module>
main(in_gff=args.gff, merge_report=args.merge_report, out_merge_report=args.out_merge_report, out_gff=args.output_gff, uuid_on=args.universally_unique_identifier, prefix=arg
s.idprefix, digitlen=args.digitlen, report=args.report, alias=args.alias)
File "/camp/lab/cardoso-moreiam/home/users/safiand/genome_annotation/turneri/busco/turneri_rna_prot_multiples_species/grass/gff3_ID_generator.py", line 238, in main
ID_dict[child['attributes']['ID']] = [newcID]
KeyError: 'ID'
What can I do to solve this problem? Am I doing something wrong?
My gff3 file look like this:
(genometools) [safiand@login001 grass]$ head turneri_annotation.gff3 -n 20
# gffread augustus.hints.gtf -o turnerifiltered.gff3 --merge -L -g GCA_922788865.1_HVK001PTURNERI_genomic.shortID.fna
# gffread v0.11.6
##gff-version 3
CAKLNU010000942.1 gffcl locus 724 2835 . + . ID=RLOC_00000001;transcripts=jg1.t1
CAKLNU010000942.1 AUGUSTUS transcript 724 2835 . + . ID=jg1.t1;geneID=jg1;locus=RLOC_00000001
CAKLNU010000942.1 AUGUSTUS CDS 724 1083 . + 0 Parent=jg1.t1
CAKLNU010000942.1 AUGUSTUS CDS 1181 1625 0.34 + 0 Parent=jg1.t1
CAKLNU010000942.1 AUGUSTUS CDS 2270 2835 0.42 + 2 Parent=jg1.t1
CAKLNU010000422.1 gffcl locus 1528 9153 . + . ID=RLOC_00000002;transcripts=jg2.t1
CAKLNU010000422.1 AUGUSTUS transcript 1528 9153 . + . ID=jg2.t1;geneID=jg2;locus=RLOC_00000002
CAKLNU010000422.1 AUGUSTUS CDS 1528 1574 0.69 + 1 Parent=jg2.t1
CAKLNU010000422.1 AUGUSTUS CDS 1718 1788 0.68 + 2 Parent=jg2.t1
CAKLNU010000422.1 AUGUSTUS CDS 9010 9153 0.6 + 0 Parent=jg2.t1
CAKLNU010000746.1 gffcl locus 834 3644 . - . ID=RLOC_00000003;transcripts=jg3.t1
CAKLNU010000746.1 AUGUSTUS transcript 834 3644 . - . ID=jg3.t1;geneID=jg3;locus=RLOC_00000003
CAKLNU010000746.1 AUGUSTUS CDS 834 878 0.96 - 2 Parent=jg3.t1
CAKLNU010000746.1 AUGUSTUS CDS 988 1011 1 - 2 Parent=jg3.t1
CAKLNU010000746.1 AUGUSTUS CDS 1310 1336 1 - 2 Parent=jg3.t1
CAKLNU010000746.1 AUGUSTUS CDS 2483 2518 1 - 2 Parent=jg3.t1
CAKLNU010000746.1 AUGUSTUS CDS 2597 2695 1 - 2 Parent=jg3.t1
Thanks!
Hi,
I tried to run gff3_fix an ran in following error
Before I of course ran QC:
gff3_QC -g in.gff3 -f in.fasta -i -o QC-Check.out -s QC_Check.stats
Afterwards I extracted only the wrong phases into wrongPhase.out
grep "Wrong phase" QC-Check.out > wrongPhase.out
It looks like this:
head wrongPhase.out
Line_num Error_code Error_level Error_tag
['Line 1030'] Ema0006 Info [Wrong phase 1, should be 0]
['Line 1102'] Ema0006 Info [Wrong phase 2, should be 0]
['Line 2797'] Ema0006 Info [Wrong phase 1, should be 0]
['Line 3384'] Ema0006 Info [Wrong phase 0, should be 1]
['Line 3408'] Ema0006 Info [Wrong phase 2, should be 0]
['Line 3414'] Ema0006 Info [Wrong phase 0, should be 1]
['Line 3504'] Ema0006 Info [Wrong phase 1, should be 2]
['Line 3528'] Ema0006 Info [Wrong phase 1, should be 0]
['Line 3530'] Ema0006 Info [Wrong phase 0, should be 1]
['Line 3552'] Ema0006 Info [Wrong phase 2, should be 0]
Then running gff3_fix:
gff3_fix -qc_r wrongPhase.out -g in.gff3 -og out.gff3
INFO Checking QC report file (wrongPhase.out)...
INFO Checking GFF3 file (in.gff3)...
INFO Reading QC report file: (wrongPhase.out)...
INFO Reading GFF3 file: (in.gff3)...
Traceback (most recent call last):
File "/home/user/.local/bin/gff3_fix", line 8, in <module>
sys.exit(script_main())
File "/home/user/.local/lib/python3.10/site-packages/gff3tool/bin/gff3_fix.py", line 95, in script_main
gff3_fix.fix.main(gff3=gff3, output_gff=args.output_gff, error_dict=error_dict, line_num_dict=line_num_dict, logger=logger_stderr)
File "/home/user/.local/lib/python3.10/site-packages/gff3tool/lib/gff3_fix/fix.py", line 686, in main
fix_phase(gff3=gff3, error_list=error_dict[error_code], line_num_dict=line_num_dict, logger=logger)
File "/home/user/.local/lib/python3.10/site-packages/gff3tool/lib/gff3_fix/fix.py", line 424, in fix_phase
phase = list(map(int,re.findall(r'\d',line_num_dict[sorted_CDS_list[0]['line_index']+1]['Ema0006']))[1])
TypeError: 'map' object is not subscriptable
The error stays the same when the complete report file is input.
I'm running this in a conda environment :
python --version
Python 3.12.0
Also tried it in a non-conda environment:
python --version
Python 2.7.18
gff3_fix --version
gff3_fix 2.1.0
gff3_QC --version
gff3_QC 2.1.0
What could be the problem here? Thank you a lot in advance.
Best,
Nadine
Dear,
I get follows error when use gff3_merge for two gff3 files.
INFO ========== Merge the two gff files ==========
INFO Sorting the WA gff by following the order of Scaffold number and coordinates...
INFO Sorting and printing out...
INFO Sorting the other gff by following the order of Scaffold number and coordinates...
INFO Sorting and printing out...
INFO Reading WA gff3 file...
INFO Reading the other gff3 file...
INFO Identifying types of replacement based on replace tag...
INFO Replacing...
Traceback (most recent call last):
File "/public/home/zpxu/miniconda2/bin/gff3_merge", line 11, in <module>
load_entry_point('gff3tool==1.3.0', 'console_scripts', 'gff3_merge')()
File "/public/home/zpxu/miniconda2/lib/python2.7/site-packages/gff3tool/bin/gff3_merge.py", line 230, in script_main
main(args.gff_file1, args.gff_file2, args.fasta, report_fh, args.output_gff, args.all, args.auto_assignment, args.user_defined_file1, args.user_defined_file2, logger=logger_stderr)
File "/public/home/zpxu/miniconda2/lib/python2.7/site-packages/gff3tool/bin/gff3_merge.py", line 85, in main
gff3_merge.merge.main(autoReviseGff, gff_file2, output_gff, report, user_defined1, user_defined2, logger)
File "/public/home/zpxu/miniconda2/lib/python2.7/site-packages/gff3tool/lib/gff3_merge/merge.py", line 145, in main
ans = ReplaceGroups.replacer_multi(root, ReplaceGroups, gff3M, u1_types, u2_types)
File "/public/home/zpxu/miniconda2/lib/python2.7/site-packages/gff3tool/lib/gff3_merge/../replace_OGS.py", line 874, in replacer_multi
self.info.append('{0:s}\t{1:s}\t{2:s}\t{3:s}'.format(originalID, newtarget['attributes']['ID'], newtarget['attributes']['replace'], newtarget['attributes']['modified_track']))
KeyError: 'replace'
Hi,
I am experiencing the following error with gff3_sort:
ERROR [Missing SeqID] Missing SeqID. - Line 2246892: chrX CAT gene 44080 557481 . - . ID=OLDFIELD_G0043730;Name=Aff2;source_gene_common_name=Aff2;source_gene=ENSMUSG00000031189.12;transcript_modes=transMap;gene_biotype=protein_coding
As the line is not missing anything as far as I can tell, it seems that the error is caused by the naming of the seqID ("chrX").
Should we refactor this to make unit testing easier?
Was looking at how to implement test coverage and add unit tests.
We have 40 repositories in the NAL_i5k org.
When I run “gff3_sort -g Cl.rename.FINAL.gff3 -og sorted.gff3”
INFO Checking GFF3 file (Cl.rename.FINAL.gff3)...
INFO Reading gff3 file...
INFO Sorting and printing out...
Traceback (most recent call last):
File "/Users/wjx/miniconda3/bin/gff3_sort", line 8, in
sys.exit(script_main())
File "/Users/wjx/miniconda3/lib/python3.7/site-packages/gff3tool/bin/gff3_sort.py", line 437, in script_main
main(args.gff_file, output=args.output_gff, isoform_sort=args.isoform_sort, sorting_order=sorting_order, logger=logger_stderr, reference=args.reference)
File "/Users/wjx/miniconda3/lib/python3.7/site-packages/gff3tool/bin/gff3_sort.py", line 257, in main
otherlines.extend(gff3.collect_descendants(grandchild))
File "/Users/wjx/miniconda3/lib/python3.7/site-packages/gff3tool/lib/gff3/gff3.py", line 172, in collect_descendants
collected_list.extend(self.collect_descendants(child))
File "/Users/wjx/miniconda3/lib/python3.7/site-packages/gff3tool/lib/gff3/gff3.py", line 172, in collect_descendants
collected_list.extend(self.collect_descendants(child))
File "/Users/wjx/miniconda3/lib/python3.7/site-packages/gff3tool/lib/gff3/gff3.py", line 172, in collect_descendants
collected_list.extend(self.collect_descendants(child))
[Previous line repeated 993 more times]
File "/Users/wjx/miniconda3/lib/python3.7/site-packages/gff3tool/lib/gff3/gff3.py", line 171, in collect_descendants
collected_list.append(child)
RecursionError: maximum recursion depth exceeded while calling a Python object
Hi there,
Thank you very much for helping!
I am trying to use the GFF3toolkit in my MacOS, installing using pip install. I wonder what is the reason for that? Thank you very much for helping!
The following are my errors:
apple@pc-206-171 ~ % python3 -m pip install git+https://github.com/NAL-i5K/GFF3toolkit.git
Collecting git+https://github.com/NAL-i5K/GFF3toolkit.git
Cloning https://github.com/NAL-i5K/GFF3toolkit.git to /private/var/folders/c_/qk6ctf6513q1ygygbvsw15880000gn/T/pip-req-build-wynu3ymp
Running command git clone -q https://github.com/NAL-i5K/GFF3toolkit.git /private/var/folders/c_/qk6ctf6513q1ygygbvsw15880000gn/T/pip-req-build-wynu3ymp
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/c_/qk6ctf6513q1ygygbvsw15880000gn/T/pip-req-build-wynu3ymp/setup.py'"'"'; file='"'"'/private/var/folders/c_/qk6ctf6513q1ygygbvsw15880000gn/T/pip-req-build-wynu3ymp/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/c_/qk6ctf6513q1ygygbvsw15880000gn/T/pip-pip-egg-info-sbqce5fp
cwd: /private/var/folders/c_/qk6ctf6513q1ygygbvsw15880000gn/T/pip-req-build-wynu3ymp/
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/c_/qk6ctf6513q1ygygbvsw15880000gn/T/pip-req-build-wynu3ymp/setup.py", line 22, in
from wheel.bdist_wheel import bdist_wheel as _bdist_wheel
ModuleNotFoundError: No module named 'wheel'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
run
pip install gff3tool
Stopped
Building wheel for gff3tool (setup.py) ... -
The gff3 specification states that discontinuous features, such as CDS, need not have unique IDs. Instead they can share an ID to indicate that they are all part of a discontinuous feature. Whether or not you'll want unique or the same IDs for individual CDS lines of a given CDS feature usually depends on what you'll do with the gff downstream - for example, for Tripal ingest, CDS lines corresponding to a single feature should share an ID. So, it would be great if gff3_ID_generator.py had an option to not generate unique IDs for features that share a parent feature. For the user, I'd envision this as something like '-n'. Then, the program would only generate 1 ID for all CDS features that share a parent feature.
Example result one 1 gene with 2 isoforms using the proposed flag '-n CDS':
KZ848496.1 . gene 715 17058 . + . ID=LSTR000001;
KZ848496.1 . mRNA 715 7345 . + . Parent=LSTR000001;ID=LSTR000001-RA;
KZ848496.1 . exon 715 899 . + . ID=LSTR000001-RA-exon001;Parent=LSTR000001-RA
KZ848496.1 . CDS 1418 1584 . + 0 ID=LSTR000001-RA-CDS001;Parent=LSTR000001-RA
KZ848496.1 . exon 7255 7345 . + . ID=LSTR000001-RA-exon002;Parent=LSTR000001-RA
KZ848496.1 . CDS 7255 7345 . + 1 ID=LSTR000001-RA-CDS001;Parent=LSTR000001-RA
KZ848496.1 . mRNA 13242 17058 . + . Parent=LSTR000001;ID=LSTR000001-RB;
KZ848496.1 . exon 13242 13331 . + . ID=LSTR000001-RB-exon001;Parent=LSTR000001-RB;
KZ848496.1 . CDS 13242 13331 . + 1 ID=LSTR000001-RB-CDS001;Parent=LSTR000001-RB;
KZ848496.1 . exon 15348 17058 . + . ID=LSTR000001-RB-exon002;Parent=LSTR000001-RB;
KZ848496.1 . CDS 15348 15540 . + 1 ID=LSTR000001-RB-CDS001;Parent=LSTR000001-RB;
I ran gff3_merge program, and I got the error
Traceback (most recent call last):
File "/home/meiyang/anaconda3/bin/gff3_merge", line 33, in <module>
sys.exit(load_entry_point('gff3tool==2.1.0', 'console_scripts', 'gff3_merge')())
File "/home/meiyang/anaconda3/lib/python3.9/site-packages/gff3tool-2.1.0-py3.9.egg/gff3tool/bin/gff3_merge.py", line 229, in script_main
main(args.gff_file1, args.gff_file2, args.fasta, report_fh, args.output_gff, args.all, args.auto_assignment, args.user_defined_file1, args.user_defined_file2, logger=logger_stderr)
File "/home/meiyang/anaconda3/lib/python3.9/site-packages/gff3tool-2.1.0-py3.9.egg/gff3tool/bin/gff3_merge.py", line 70, in main
gff3_merge.revision.main(gff_file=gff_file1, revision_file=autoFILE, output_gff=autoReviseGff, report_file=autoReviseReport, user_defined1=user_defined1, auto=auto, logger=logger)
File "/home/meiyang/anaconda3/lib/python3.9/site-packages/gff3tool-2.1.0-py3.9.egg/gff3tool/lib/gff3_merge/revision.py", line 227, in main
tag = ','.join(child['attributes']['replace']).replace(' ','')
KeyError: 'replace'
and my gff3 files were like these:
one:
##gff-version 3
##sequence-region BMSK_chr10_RagTag 64593 17730062
BMSK_chr10_RagTag Liftoff gene 64593 65277 . + . ID=gene5668;name=BMSK0005247;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=BMSK0005247_0
BMSK_chr10_RagTag Liftoff mRNA 64593 65277 . + . ID=mRNA5668;Parent=gene5668;name=BMSK0005247.1;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
BMSK_chr10_RagTag Liftoff exon 64593 64628 . + . Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag Liftoff five_prime_UTR 64593 64628 . + . Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag Liftoff five_prime_UTR 64752 64818 . + . Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag Liftoff exon 64752 65277 . + . Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag Liftoff CDS 64819 65028 . + 0 Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag Liftoff three_prime_UTR 65029 65277 . + . Parent=mRNA5668;extra_copy_number=0
two:
##gff-version 3
##sequence-region BMSK_chr10_RagTag 60014 17730543
BMSK_chr10_RagTag Liftoff gene 60014 62435 . - . ID=gene6047;Name=KWMTBOMO05391;coverage=0.997;sequence_ID=0.996;valid_ORFs=0;extra_copy_number=0;copy_num_ID=KWMTBOMO05391_0
BMSK_chr10_RagTag Liftoff mRNA 60014 62435 . - . ID=mRNA6047;Name=KWMTBOMO05391;Parent=gene6047;matches_ref_protein=False;valid_ORF=False;missing_stop_codon=True;extra_copy_number=0
BMSK_chr10_RagTag Liftoff transcription_end_site 60014 60014 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff exon 60014 60869 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff stop_codon 60169 60171 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff CDS 60169 60869 . - 1 Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff terminal 60169 60869 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff CDS 62184 62227 . - 0 Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff initial 62184 62227 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff exon 62184 62338 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff start_codon 62225 62227 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff exon 62427 62435 . - . Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag Liftoff transcription_start_site 62435 62435 . - . Parent=mRNA6047;extra_copy_number=0
anyone can help?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.