Giter Site home page Giter Site logo

Comments (21)

ChaoTang-SCU avatar ChaoTang-SCU commented on August 13, 2024 7

Here,
When I used Ensembl GTF file, I also get the same error. Finally, I found that the GTF only have transcript and exon rows works well.
awk -F "\t" '$3=="exon"||$3=="transcript"' Homo_sapiens.GRCh38.87.gtf > Homo_sapiens.GRCh38.87.transccript.exon.gtf

from ggsashimi.

ManavalanG avatar ManavalanG commented on August 13, 2024 1

Just wanted to add that gencode gtf runs into same issue.

from ggsashimi.

dgarrimar avatar dgarrimar commented on August 13, 2024 1

Dear Lea @bellenger-l,

You could try using GENCODE annotation files. The release corresponding to mouse ensembl 83 is GENCODE M8. Alternatively, could you provide some lines of your GTF to check what is the problem? As stated in previous comments, make sure that the file follows the proper format. Specially, the transcript_id attribute should be present in every line of the GTF.

from ggsashimi.

KrotosBenjamin avatar KrotosBenjamin commented on August 13, 2024 1

I've fixed this issue with gencodeID, but still works as originally intended with a try statement. This is easier than editing a GTF file.

replace:
transcript_id = d["transcript_id"]

with try statement below.

try:
    transcript_id = d["transcript_id"]
except KeyError:
    transcript_id = d["gene_id"]

from ggsashimi.

emi80 avatar emi80 commented on August 13, 2024

The problem is that we assume the transcript_id attribute is present in every line of the GTF (except for comments of course). I see here that the gene line does not have it. One solution: you could preprocess the annotation and add the transcript_id field where it is not present.

@abreschi what do you suggest?

from ggsashimi.

abreschi avatar abreschi commented on August 13, 2024

Hi! Sorry about this issue. Unfortunately the transcript_id is a required field in the GTF format (https://genome.ucsc.edu/FAQ/FAQformat.html#format4), even in gene rows. So, I would modify the Ensembl file like @emi80 said. Hope it helps.

from ggsashimi.

sridhar0605 avatar sridhar0605 commented on August 13, 2024

Hello @emi80 @abreschi ,

Thank you for your reply. I will modify and try this again, currently i do not see any issue if i change the build to 37.67 may be its build specific.

I guess you can close this issue.

thanks

from ggsashimi.

bellenger-l avatar bellenger-l commented on August 13, 2024

Here,
When I used Ensembl GTF file, I also get the same error. Finally, I found that the GTF only have transcript and exon rows works well.
awk -F "\t" '$3=="exon"||$3=="transcript"' Homo_sapiens.GRCh38.87.gtf > Homo_sapiens.GRCh38.87.transccript.exon.gtf

I'm sorry it didn't fix the issue for me, I have a new error :

Traceback (most recent call last):
  File "./sashimi-plot.py", line 612, in <module>
    transcripts, exons = read_gtf(args.gtf, args.coordinates)
  File "./sashimi-plot.py", line 283, in read_gtf
    d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #17 has length 7; 2 is required

I am using the Mus_musculus.GRCm38.83 annotation.

If you have a solution, I would appreciate it.

Thanks,
Lea

from ggsashimi.

bellenger-l avatar bellenger-l commented on August 13, 2024

Dear @dgarrimar,

Thanks a lot ! It works like a charm with the Gencode annotation, but it doesn't print the different transcripts under sashimi plots. I can't figure it out which option can do that.

Best,
Lea

from ggsashimi.

dgarrimar avatar dgarrimar commented on August 13, 2024

In principle it should, could you send the command that you are using and the output that you generated? Thanks!

from ggsashimi.

bellenger-l avatar bellenger-l commented on August 13, 2024

I'm sorry, I didn't check gencode GTF and the chromosome names were different from Ensembl GTF ("chr1" against "1"), I remove "chr" from first column and now I have the transcripts...

Thanks a lot for your help anyway,
Best
Lea

from ggsashimi.

kylinson avatar kylinson commented on August 13, 2024

just use transcript_id = d.get("transcript_id","transcript_id_missing") to replace the original code 284th line.

from ggsashimi.

PhKoch avatar PhKoch commented on August 13, 2024

@kylinson this hack didn't work for me. The following error occured:

Traceback (most recent call last):
  File "./sashimi-plot.py", line 612, in <module>
    transcripts, exons = read_gtf(args.gtf, args.coordinates)
  File "./sashimi-plot.py", line 283, in read_gtf
    d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #10 has length 7; 2 is required

I'll edit my gtf as suggested earlier.

from ggsashimi.

archana433 avatar archana433 commented on August 13, 2024

@tangchao7498
I'm sorry it didn't fix the issue for me, I also have a new error :
I am using the Mus_musculus.GRCm38.99 annotation

Traceback (most recent call last):
  File "./sashimi-plot.py", line 612, in <module>
    transcripts, exons = read_gtf(args.gtf, args.coordinates)
  File "./sashimi-plot.py", line 283, in read_gtf
    d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #11 has length 7; 2 is required

from ggsashimi.

dgarrimar avatar dgarrimar commented on August 13, 2024

Dear @archana433, have you tried to use gencode anotation? I believe this is the equivalent to the one you use. Give it a try and let me know!

from ggsashimi.

archana433 avatar archana433 commented on August 13, 2024

thanks , it worked. now got this error

Error in seq.default(start, max(start + 1, end - 4), by = 2425) : 
  'from' must be of length 1
Calls: rbind -> [ -> [.data.table -> seq -> seq.default
Execution halted

from ggsashimi.

dgarrimar avatar dgarrimar commented on August 13, 2024

Great, as the annotation issue is solved, let's continue the discussion regarding this error in #33.

from ggsashimi.

antonioggsousa avatar antonioggsousa commented on August 13, 2024

Hi,

I faced the same problem. I'm trying to run the python script, but instead of changing the GTF file, I added a couple of code lines to ignore the absence of "transcript_id" and, also concatenate gene names with a space:

                   `#--------------------------------------------------------------------------------

                    ## AGGS: skip lines without "transcript_id" tag

                    if "transcript_id" not in tags: 

                            continue

                    dict_list = [] # concatenate "gene_name" with space, e.g., "PDH-E1 ALPHA" into "PDH-E1_ALPHA"

                    for ele in tags.strip(";").split("; "):

                            l = ele.strip().split(" ")

                            if len(l[1::]) > 1: 

                                    gene_name = "_".join(ele.strip().split(" ")[1::])

                                    l = [l[0], gene_name]

                            dict_list.append(l)

                    dic_tuple = tuple(dict_list)

                    d = dict(dic_tuple)

                    #-------------------------------------------------------------------------------- 

                    #d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; ")) #aggs: commented line`

You might consider adding these couple of lines (from lines: 283-297) to your python script. I know that is not very pythonic.

António

from ggsashimi.

antonioggsousa avatar antonioggsousa commented on August 13, 2024

Thx @KrotosBenjamin is by far much more pythonic and elegant.

António

from ggsashimi.

stale avatar stale commented on August 13, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from ggsashimi.

dgarrimar avatar dgarrimar commented on August 13, 2024

Following suggestions in PR #52 by @ygidtu, with minor modifications, GTFs with gene rows without the transcript_id attribute will not throw an error anymore. However, the transcript_id attribute will still be required in transcript/exon rows. We included a more informative error message for this case.

from ggsashimi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.