From <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Dear Lea <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

The problem is that we assume the tran_id attri

Hi! Sorry about this issue. Unfortunately the tran_id is a required field in the

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

KeyError: 'transcript_id' with Ensemble human annotation about ggsashimi HOT 21 CLOSED

guigolab commented on August 13, 2024

KeyError: 'transcript_id' with Ensemble human annotation

from ggsashimi.

Comments (21)

ChaoTang-SCU commented on August 13, 2024 7

Here,
When I used Ensembl GTF file, I also get the same error. Finally, I found that the GTF only have transcript and exon rows works well.
awk -F "\t" '$3=="exon"||$3=="transcript"' Homo_sapiens.GRCh38.87.gtf > Homo_sapiens.GRCh38.87.transccript.exon.gtf

from ggsashimi.

ManavalanG commented on August 13, 2024 1

Just wanted to add that gencode gtf runs into same issue.

from ggsashimi.

dgarrimar commented on August 13, 2024 1

Dear Lea @bellenger-l,

You could try using GENCODE annotation files. The release corresponding to mouse ensembl 83 is GENCODE M8. Alternatively, could you provide some lines of your GTF to check what is the problem? As stated in previous comments, make sure that the file follows the proper format. Specially, the transcript_id attribute should be present in every line of the GTF.

from ggsashimi.

KrotosBenjamin commented on August 13, 2024 1

I've fixed this issue with gencodeID, but still works as originally intended with a try statement. This is easier than editing a GTF file.

replace:
transcript_id = d["transcript_id"]

with try statement below.

try:
    transcript_id = d["transcript_id"]
except KeyError:
    transcript_id = d["gene_id"]

from ggsashimi.

emi80 commented on August 13, 2024

The problem is that we assume the transcript_id attribute is present in every line of the GTF (except for comments of course). I see here that the gene line does not have it. One solution: you could preprocess the annotation and add the transcript_id field where it is not present.

@abreschi what do you suggest?

from ggsashimi.

abreschi commented on August 13, 2024

Hi! Sorry about this issue. Unfortunately the transcript_id is a required field in the GTF format (https://genome.ucsc.edu/FAQ/FAQformat.html#format4), even in gene rows. So, I would modify the Ensembl file like @emi80 said. Hope it helps.

from ggsashimi.

sridhar0605 commented on August 13, 2024

Hello @emi80 @abreschi ,

Thank you for your reply. I will modify and try this again, currently i do not see any issue if i change the build to 37.67 may be its build specific.

I guess you can close this issue.

thanks

from ggsashimi.

bellenger-l commented on August 13, 2024

Here,
When I used Ensembl GTF file, I also get the same error. Finally, I found that the GTF only have transcript and exon rows works well.
awk -F "\t" '$3=="exon"||$3=="transcript"' Homo_sapiens.GRCh38.87.gtf > Homo_sapiens.GRCh38.87.transccript.exon.gtf

I'm sorry it didn't fix the issue for me, I have a new error :

Traceback (most recent call last):
  File "./sashimi-plot.py", line 612, in <module>
    transcripts, exons = read_gtf(args.gtf, args.coordinates)
  File "./sashimi-plot.py", line 283, in read_gtf
    d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #17 has length 7; 2 is required

I am using the Mus_musculus.GRCm38.83 annotation.

If you have a solution, I would appreciate it.

Thanks,
Lea

from ggsashimi.

bellenger-l commented on August 13, 2024

Dear @dgarrimar,

Thanks a lot ! It works like a charm with the Gencode annotation, but it doesn't print the different transcripts under sashimi plots. I can't figure it out which option can do that.

Best,
Lea

from ggsashimi.

dgarrimar commented on August 13, 2024

In principle it should, could you send the command that you are using and the output that you generated? Thanks!

from ggsashimi.

bellenger-l commented on August 13, 2024

I'm sorry, I didn't check gencode GTF and the chromosome names were different from Ensembl GTF ("chr1" against "1"), I remove "chr" from first column and now I have the transcripts...

Thanks a lot for your help anyway,
Best
Lea

from ggsashimi.

kylinson commented on August 13, 2024

just use transcript_id = d.get("transcript_id","transcript_id_missing") to replace the original code 284th line.

from ggsashimi.

PhKoch commented on August 13, 2024

@kylinson this hack didn't work for me. The following error occured:

Traceback (most recent call last):
  File "./sashimi-plot.py", line 612, in <module>
    transcripts, exons = read_gtf(args.gtf, args.coordinates)
  File "./sashimi-plot.py", line 283, in read_gtf
    d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #10 has length 7; 2 is required

I'll edit my gtf as suggested earlier.

from ggsashimi.

archana433 commented on August 13, 2024

@tangchao7498
I'm sorry it didn't fix the issue for me, I also have a new error :
I am using the Mus_musculus.GRCm38.99 annotation

Traceback (most recent call last):
  File "./sashimi-plot.py", line 612, in <module>
    transcripts, exons = read_gtf(args.gtf, args.coordinates)
  File "./sashimi-plot.py", line 283, in read_gtf
    d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #11 has length 7; 2 is required

from ggsashimi.

dgarrimar commented on August 13, 2024

Dear @archana433, have you tried to use gencode anotation? I believe this is the equivalent to the one you use. Give it a try and let me know!

from ggsashimi.

archana433 commented on August 13, 2024

thanks , it worked. now got this error

Error in seq.default(start, max(start + 1, end - 4), by = 2425) : 
  'from' must be of length 1
Calls: rbind -> [ -> [.data.table -> seq -> seq.default
Execution halted

from ggsashimi.

dgarrimar commented on August 13, 2024

Great, as the annotation issue is solved, let's continue the discussion regarding this error in #33.

from ggsashimi.

antonioggsousa commented on August 13, 2024

Hi,

I faced the same problem. I'm trying to run the python script, but instead of changing the GTF file, I added a couple of code lines to ignore the absence of "transcript_id" and, also concatenate gene names with a space:

                   `#--------------------------------------------------------------------------------

                    ## AGGS: skip lines without "transcript_id" tag

                    if "transcript_id" not in tags: 

                            continue

                    dict_list = [] # concatenate "gene_name" with space, e.g., "PDH-E1 ALPHA" into "PDH-E1_ALPHA"

                    for ele in tags.strip(";").split("; "):

                            l = ele.strip().split(" ")

                            if len(l[1::]) > 1: 

                                    gene_name = "_".join(ele.strip().split(" ")[1::])

                                    l = [l[0], gene_name]

                            dict_list.append(l)

                    dic_tuple = tuple(dict_list)

                    d = dict(dic_tuple)

                    #-------------------------------------------------------------------------------- 

                    #d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; ")) #aggs: commented line`

You might consider adding these couple of lines (from lines: 283-297) to your python script. I know that is not very pythonic.

António

from ggsashimi.

antonioggsousa commented on August 13, 2024

Thx @KrotosBenjamin is by far much more pythonic and elegant.

António

from ggsashimi.

stale commented on August 13, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from ggsashimi.

dgarrimar commented on August 13, 2024

Following suggestions in PR #52 by @ygidtu, with minor modifications, GTFs with gene rows without the transcript_id attribute will not throw an error anymore. However, the transcript_id attribute will still be required in transcript/exon rows. We included a more informative error message for this case.

from ggsashimi.

KeyError: 'transcript_id' with Ensemble human annotation about ggsashimi HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent