Comments (21)
Here,
When I used Ensembl GTF file, I also get the same error. Finally, I found that the GTF only have transcript and exon rows works well.
awk -F "\t" '$3=="exon"||$3=="transcript"' Homo_sapiens.GRCh38.87.gtf > Homo_sapiens.GRCh38.87.transccript.exon.gtf
from ggsashimi.
Just wanted to add that gencode gtf runs into same issue.
from ggsashimi.
Dear Lea @bellenger-l,
You could try using GENCODE annotation files. The release corresponding to mouse ensembl 83 is GENCODE M8. Alternatively, could you provide some lines of your GTF to check what is the problem? As stated in previous comments, make sure that the file follows the proper format. Specially, the transcript_id
attribute should be present in every line of the GTF.
from ggsashimi.
I've fixed this issue with gencodeID, but still works as originally intended with a try statement. This is easier than editing a GTF file.
replace:
transcript_id = d["transcript_id"]
with try statement below.
try:
transcript_id = d["transcript_id"]
except KeyError:
transcript_id = d["gene_id"]
from ggsashimi.
The problem is that we assume the transcript_id
attribute is present in every line of the GTF (except for comments of course). I see here that the gene
line does not have it. One solution: you could preprocess the annotation and add the transcript_id
field where it is not present.
@abreschi what do you suggest?
from ggsashimi.
Hi! Sorry about this issue. Unfortunately the transcript_id is a required field in the GTF format (https://genome.ucsc.edu/FAQ/FAQformat.html#format4), even in gene rows. So, I would modify the Ensembl file like @emi80 said. Hope it helps.
from ggsashimi.
Thank you for your reply. I will modify and try this again, currently i do not see any issue if i change the build to 37.67 may be its build specific.
I guess you can close this issue.
thanks
from ggsashimi.
Here,
When I used Ensembl GTF file, I also get the same error. Finally, I found that the GTF only have transcript and exon rows works well.
awk -F "\t" '$3=="exon"||$3=="transcript"' Homo_sapiens.GRCh38.87.gtf > Homo_sapiens.GRCh38.87.transccript.exon.gtf
I'm sorry it didn't fix the issue for me, I have a new error :
Traceback (most recent call last):
File "./sashimi-plot.py", line 612, in <module>
transcripts, exons = read_gtf(args.gtf, args.coordinates)
File "./sashimi-plot.py", line 283, in read_gtf
d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #17 has length 7; 2 is required
I am using the Mus_musculus.GRCm38.83 annotation.
If you have a solution, I would appreciate it.
Thanks,
Lea
from ggsashimi.
Dear @dgarrimar,
Thanks a lot ! It works like a charm with the Gencode annotation, but it doesn't print the different transcripts under sashimi plots. I can't figure it out which option can do that.
Best,
Lea
from ggsashimi.
In principle it should, could you send the command that you are using and the output that you generated? Thanks!
from ggsashimi.
I'm sorry, I didn't check gencode GTF and the chromosome names were different from Ensembl GTF ("chr1" against "1"), I remove "chr" from first column and now I have the transcripts...
Thanks a lot for your help anyway,
Best
Lea
from ggsashimi.
just use transcript_id = d.get("transcript_id","transcript_id_missing") to replace the original code 284th line.
from ggsashimi.
@kylinson this hack didn't work for me. The following error occured:
Traceback (most recent call last):
File "./sashimi-plot.py", line 612, in <module>
transcripts, exons = read_gtf(args.gtf, args.coordinates)
File "./sashimi-plot.py", line 283, in read_gtf
d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #10 has length 7; 2 is required
I'll edit my gtf as suggested earlier.
from ggsashimi.
@tangchao7498
I'm sorry it didn't fix the issue for me, I also have a new error :
I am using the Mus_musculus.GRCm38.99 annotation
Traceback (most recent call last):
File "./sashimi-plot.py", line 612, in <module>
transcripts, exons = read_gtf(args.gtf, args.coordinates)
File "./sashimi-plot.py", line 283, in read_gtf
d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))
ValueError: dictionary update sequence element #11 has length 7; 2 is required
from ggsashimi.
Dear @archana433, have you tried to use gencode anotation? I believe this is the equivalent to the one you use. Give it a try and let me know!
from ggsashimi.
thanks , it worked. now got this error
Error in seq.default(start, max(start + 1, end - 4), by = 2425) :
'from' must be of length 1
Calls: rbind -> [ -> [.data.table -> seq -> seq.default
Execution halted
from ggsashimi.
Great, as the annotation issue is solved, let's continue the discussion regarding this error in #33.
from ggsashimi.
Hi,
I faced the same problem. I'm trying to run the python script, but instead of changing the GTF file, I added a couple of code lines to ignore the absence of "transcript_id" and, also concatenate gene names with a space:
`#--------------------------------------------------------------------------------
## AGGS: skip lines without "transcript_id" tag
if "transcript_id" not in tags:
continue
dict_list = [] # concatenate "gene_name" with space, e.g., "PDH-E1 ALPHA" into "PDH-E1_ALPHA"
for ele in tags.strip(";").split("; "):
l = ele.strip().split(" ")
if len(l[1::]) > 1:
gene_name = "_".join(ele.strip().split(" ")[1::])
l = [l[0], gene_name]
dict_list.append(l)
dic_tuple = tuple(dict_list)
d = dict(dic_tuple)
#--------------------------------------------------------------------------------
#d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; ")) #aggs: commented line`
You might consider adding these couple of lines (from lines: 283-297) to your python script. I know that is not very pythonic.
António
from ggsashimi.
Thx @KrotosBenjamin is by far much more pythonic and elegant.
António
from ggsashimi.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from ggsashimi.
Following suggestions in PR #52 by @ygidtu, with minor modifications, GTFs with gene rows without the transcript_id
attribute will not throw an error anymore. However, the transcript_id
attribute will still be required in transcript/exon rows. We included a more informative error message for this case.
from ggsashimi.
Related Issues (20)
- Error in R HOT 2
- Error running docker latest docker image (v.1.1.0) HOT 2
- error regarding intersect_introns HOT 17
- IndexError: list index out of range HOT 3
- C stack usage error HOT 3
- Error in FUN(X[[i]], ...) : object 'V1' not found HOT 2
- Individual exon junction cut-off
- Duplicate coverage output HOT 3
- Program hangs / freezes HOT 2
- Help with example HOT 1
- Output PDF HOT 2
- So many junctions and intron-like gaps HOT 2
- Colors not working HOT 2
- Junctions not being plotted or generated when running HOT 1
- Executing ggsashimi with singularity HOT 4
- How assign color if I am interested only in one sample HOT 1
- ggsashimi can easy read from remote BAMs if allowed to do it
- Duplicate arc appears for each junction using example_run.sh HOT 1
- ERROR: No available bam files. HOT 1
- Labels of sashimi plot HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ggsashimi.