Giter Site home page Giter Site logo

Comments (6)

tony006469 avatar tony006469 commented on June 27, 2024

The statistic function I wrote was based on the error module, so I might need to find a way to write an exception because the error code 'Esf0012' has been comment out in the error module.

https://github.com/NAL-i5K/GFF3toolkit/blob/master/gff3tool/lib/ERROR/ERROR.py

line26.png

from gff3toolkit.

tony006469 avatar tony006469 commented on June 27, 2024

I tried changing something between gff3.py, error.py and gff3_QC.py.
I found that when I modified the comment of error.py I mentioned yesterday, the statistics function can be used normally, but I am still not sure whether the result is correct, so I uploaded the screenshot and output file here.
I also test the example file of GFF3toolkit. It is going well.

screenshot(s).png

https://app.zenhub.com/files/45868018/da2feb53-7eae-4139-ad9c-546c0ebeb54a/download

https://app.zenhub.com/files/45868018/1cace338-331d-4e8d-9306-102943f153e3/download

from gff3toolkit.

mpoelchau avatar mpoelchau commented on June 27, 2024

Resolved in #84

from gff3toolkit.

hsiaoyi0504 avatar hsiaoyi0504 commented on June 27, 2024

Sorry for late response, but it seems to me that this might be related to this, this error code is specially handled in other places.

https://gitlab.com/search?utf8=%E2%9C%93&snippets=&scope=&search=Esf0012&project_id=1602091

and

if error_set and len(error_set):
escaped_error = ['Esf0012','Esf0033']
eSet = list()
for e in error_set:
if not e['eCode'] in escaped_error:
eSet.append(e)
if len(eSet):
logger.warning('The extracted sequences might be wrong for the following features which have formatting errors...')
print('ID\tError_Code\tError_Tag')
for e in eSet:
tag = '[{0:s}]'.format(e['eTag'])
print e['ID'], e['eCode'], tag

from gff3toolkit.

tony006469 avatar tony006469 commented on June 27, 2024

I executed the gff3_to_fasta.
I think this part of the output may be related to this issue.

Command:
gff3_to_fasta -g diaall_apollo_annotations_1-28-2019_nofasta.gff3 -f GCF_001412515.1_Dall1.0_genomic.fna -st all -d simple -o test_sequences

The result:
ID Error_Code Error_Tag
['2CD3D3CFEDBCA62F27212F6E8D5141C4'] Ema0002 [Protein sequence contains internal stop codons at bp 31840]
['E5C5AC34E694DD9C21F38B6030CF4A1A'] Ema0002 [Protein sequence contains internal stop codons at bp 382893, and 382911, and 382986, and 383010, and 383091, and 383094, and 383109, and 383121, and 383133, and 383148, and 383618, and 383768, and 383810, and 383819, and 383822, and 384721, and 384730, and 384739, and 384748]
['9A7320593595480F81C3677B0B495047'] Ema0001 [Parent feature start and end coordinates exceed those of child features]
['D905D6E7092F0DB8B735AE790E5FD636'] Ema0001 [Parent feature start and end coordinates exceed those of child features]
['0128370F0C807B8E76577D3BFEEB517A'] Ema0002 [Protein sequence contains internal stop codons at bp 381284, and 380195, and 380165, and 380126, and 380048, and 380024, and 379961, and 379958, and 379931, and 379841, and 379763, and 379751, and 379745, and 379691, and 379091, and 379052, and 379016, and 379001]
['BF0A3C8EBC03D902551D917CCB718A5B'] Ema0002 [Protein sequence contains internal stop codons at bp 373533, and 373470, and 373467, and 373458, and 373416]
['28F51C9A88FB32FE2FEF53ABFA9D08FE'] Ema0002 [Protein sequence contains internal stop codons at bp 18998]
['C0F5D99F90174892227783D4BEB9E6B7'] Ema0002 [Protein sequence contains internal stop codons at bp 982392]
['6C10D466513380786704B9312C6DED35'] Ema0009 [Incorrectly merged gene parent? Isoforms that do not share coding sequences are found: Between Line [724, 731]]
['46FAC1E471CBF0124EB39C45C110DE14'] Ema0002 [Protein sequence contains internal stop codons at bp 24817]
['A3901D087072C49CB90432C96549F440'] Ema0002 [Protein sequence contains internal stop codons at bp 153002, and 153005, and 153086, and 153137, and 153149, and 153203, and 153233, and 153317, and 153329, and 153359, and 153446, and 153630, and 153636, and 153642]
['A8A8AF53619020FAA1FD02B8D8E5D38F'] Ema0002 [Protein sequence contains internal stop codons at bp 1296792, and 1296831, and 1296852, and 1296864, and 1296894, and 1296906, and 1296921, and 1296978, and 1296984, and 1297002, and 1297032, and 1297068, and 1297077, and 1297080, and 1297098, and 1297215, and 1297296, and 1297398]
['E82EACA7A59EA11C7EF6C2949F5DE9B7'] Ema0004 [Incomplete gene feature that should contain at least one mRNA, exon, and CDS]
['F1B5024FDE60B7543149F84AD0CD1067'] Ema0002 [Protein sequence contains internal stop codons at bp 36587]
['FBCB4D5D69F51248ED07BF35106B0E6F']Ema0002 [Protein sequence contains internal stop codons at bp 1412762]
['1DA76B2275ADE8080097FA8CA3A98643'] Ema0002 [Protein sequence contains internal stop codons at bp 339347, and 339401, and 339488, and 339497, and 339524, and 339548, and 339647, and 339722, and 339851, and 339884, and 339896, and 339908, and 339917, and 339926, and 339929, and 339968, and 339998, and 340025, and 340037, and 340070, and 340073, and 340094, and 340100, and 340115, and 340145, and 340232, and 340235, and 340271, and 340319, and 340346, and 340349, and 340364, and 340394, and 340430]
['A5D88CB824303822DDC5C29B42C7A6FB'] Ema0002 [Protein sequence contains internal stop codons at bp 22977, and 22959, and 22917, and 21819, and 21816, and 21783, and 21768, and 21717, and 21627, and 21624, and 21615, and 21576, and 21573, and 21558, and 21543, and 18782, and 18761, and 18758, and 18755, and 18743, and 18740, and 18671, and 18665, and 18629, and 18626, and 18444, and 18438, and 18411, and 18221]
['292DA28C21E5927C5CE5513E99758901'] Ema0002 [Protein sequence contains internal stop codons at bp 1266346, and 1266349, and 1266385, and 1266475, and 1266520, and 1266535, and 1266675, and 1266696, and 1266705, and 1266711, and 1267700, and 1267706, and 1267721, and 1267730, and 1267781, and 1267947, and 1267974, and 1267983, and 1267989, and 1268013, and 1268348, and 1268375]
['A472DE5AA33E66B41153060DBFEEF13A'] Ema0002 [Protein sequence contains internal stop codons at bp 111934]
['877092F9829DCBD597A49145B5B69731'] Ema0009 [Incorrectly merged gene parent? Isoforms that do not share coding sequences are found: Between Line [2828, 2839]]
['BBB638E4E0D4C0FA1040749E8C66CECA'] Ema0009 [Incorrectly merged gene parent? Isoforms that do not share coding sequences are found: Between Line [2889, 2900], and Line[2878, 2900]]
['C1A643F9F8643FDB6353D64E6F06AAEA'] Ema0002 [Protein sequence contains internal stop codons at bp 4344635]
['2DEFA1F69A2FDD1AE567B740AABF31B2'] Ema0002 [Protein sequence contains internal stop codons at bp 22491]
['1E8DAFA5C2A2E661043D6AD55E553EAC'] Ema0001 [Parent feature start and end coordinates exceed those of child features]
['23D1E38D10389CE3DA357D29E896017B'] Ema0001 [Parent feature start and end coordinates exceed those of child features]
['02630E40CC3B86D91379BA780B26BBF3'] Ema0002 [Protein sequence contains internal stop codons at bp 2009081]
['F708D8C5131CB5D7049A278CCC2F12A0'] Ema0001 [Parent feature start and end coordinates exceed those of child features]
['CD93E0AA4002D2A235F44298B1B8F3E6'] Ema0002 [Protein sequence contains internal stop codons at bp 3067, and 3043, and 3022, and 2998, and 2962, and 2947, and 2935, and 2911, and 2872, and 2854, and 2812, and 2776, and 2737, and 2725, and 2722, and 2659, and 2644, and 2629, and 2533, and 2521, and 2500, and 2371, and 2326, and 2308, and 2278, and 1521, and 1346, and 1328]
['620442AD633F78E619051191D0F21DF3'] Ema0002 [Protein sequence contains internal stop codons at bp 1703408, and 1703429, and 1703471, and 1703740, and 1703782, and 1703791, and 1703827, and 1704195]
['7691D0C1C08EF5F236B08FE49A6D22DA'] Ema0002 [Protein sequence contains internal stop codons at bp 1571282]
['12132A2AB8E8A13DE1D56C56A20EEC5D'] Ema0002 [Protein sequence contains internal stop codons at bp 1575910, and 1576593, and 1576614, and 1576623, and 1576632, and 1576733, and 1576745]
['FF2A0E57D228A85FA4E6CD97736DEE0A'] Ema0002 [Protein sequence contains internal stop codons at bp 1563012, and 1563015, and 1563018, and 1563021, and 1563024, and 1563155, and 1563179, and 1563212, and 1563269, and 1563305, and 1563329, and 1565229, and 1565250, and 1565259, and 1565295, and 1565298, and 1568077, and 1568237, and 1568267, and 1568270, and 1568282, and 1570066]
['63A9610CB4224034130B6CBC04A98BD4'] Ema0002 [Protein sequence contains internal stop codons at bp 1624734]
['1566BD81828DF739695C92F87DB86CEF'] Ema0009 [Incorrectly merged gene parent? Isoforms that do not share coding sequences are found: Between Line [5474, 5487]]
['6B217D40A533BDF89D2BE96D1CF79F9A'] Ema0001 [Parent feature start and end coordinates exceed those of child features]
['7F8A7D0412164F04F1D93A15E112972C'] Ema0001 [Parent feature start and end coordinates exceed those of child features]
['A2868A6698F1A942610961DE0CE549B1'] Ema0009 [Incorrectly merged gene parent? Isoforms that do not share coding sequences are found: Between Line [5571, 5584], and Line[5558, 5584]]
['D19E5FE39AD05ABC9E6F49CF359ECD11'] Ema0002 [Protein sequence contains internal stop codons at bp 16688]
['0C5B5FB9E0768A3611B6A66AA5B7D168'] Ema0002 [Protein sequence contains internal stop codons at bp 26915]
['DB8ECE5D3B2DE42AA7FA95F92A7D304A'] Ema0002 [Protein sequence contains internal stop codons at bp 1847909, and 1847237, and 1847234, and 1847105, and 1847072, and 1847054, and 1847048, and 1847030, and 1847027, and 1846988, and 1846886, and 1846853, and 1846829, and 1846820, and 1846772, and 1846739, and 1846667, and 1846664, and 1846643, and 1846634, and 1846631, and 1846460, and 1846403, and 1846349]
['A3901D087072C49CB90432C96549F440'] Esf0001 [Feature type may need to be changed to pseudogene]
['C1A643F9F8643FDB6353D64E6F06AAEA'] Esf0001 [Feature type may need to be changed to pseudogene]
['DB12B4D9A6E1404690280E69194C439A'] Esf0001 [Feature type may need to be changed to pseudogene]
['7691D0C1C08EF5F236B08FE49A6D22DA'] Esf0001 [Feature type may need to be changed to pseudogene]
['975F4915146FAFDCDC77FFB6971DA097'] Esf0001 [Feature type may need to be changed to pseudogene]
The screenshot of statistics file
Screen.png

from gff3toolkit.

mpoelchau avatar mpoelchau commented on June 27, 2024

@tony006469 could confirm that the change in #84 does not affect the QC output from gff3_to_fasta. Closing. Thanks @tony006469!

from gff3toolkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.