Comments (4)
Hi,
This is a little tricky as for the refound genes there is no original locus tag. At the moment we output the protein and DNA sequences to gene_data.csv but not the location information. The final number in the refound gene name is just the order in which panaroo found it and so is not particularly helpful.
As the sequence in gene_data.csv should exactly match that in the original GFF (or its reverse complement) you could use this to search for its location. I will have a think about how best to retain this information.
Its location in relation to other genes is available in the 'final_graph.gml' file which can be viewed in cytoscape.
Hopefully this helps a bit
from panaroo.
Hi, thank you for the quick response. Yeah, I understand why it is difficult now, and yes it is possible to see where the refound genes are on the graph but I think in some applications it could be very useful to keep track of the locations as knowing where the gene is located within the isolate, in the context of the whole genome. So I guess I'll leave this as a feature request.
As the sequence in gene_data.csv should exactly match that in the original GFF (or its reverse complement) you could use this to search for its location. I will have a think about how best to retain this information.
Yes, this works. I can track down where the gene is by just aligning it against the whole genome sequence. For instance, the one in my question (40_refound_2166
) is actually on the plasmid, this could be interesting, I'll see if I can interpret what it means. Thanks!
from panaroo.
Hi, I am also interested in knowing whether it would be possible to add the gene locations to the gene_data.csv
file. It could be useful for some applications to have this info for all genes and not just the refound genes. Would it be ok if I worked on a PR to add this feature? I think I understood where the changes should go, but if you have strong opinions about having such a feature please let me know :)
from panaroo.
Ok, I managed to add the location information for the original genes, but for the refound ones I was only able to get the scaffold id, as the location info I was able to find seems to cover a larger area than the actual nucleotide sequence reported in the file. Is that the reason why it's not easy to report this information?
Here's the changes I've made so far in case they are useful: mgalardini@d487c42
from panaroo.
Related Issues (20)
- core genome HOT 5
- Using PRANK causes thousands of tmp fastas to be written in working directory HOT 3
- the form of core genes in gene_presence_absence.Rtab HOT 1
- Problem reading data from RefSeq HOT 3
- Question about multithreading during paralog phase HOT 3
- Question: What are the parameters in IMG output? HOT 1
- Meaning of multiple prokka IDs in a cell in presence/absence.csv? HOT 1
- gene names separated by ~~~ HOT 2
- Unmatched aligned gene location on gff and panaroo graph HOT 4
- Error with panaroo-generate-gffs HOT 4
- duplicated clusters with same gene annotations HOT 1
- Error / processing paralogs -> collapse mistranslations HOT 8
- Error while collapsing gene families HOT 1
- How to plot a simple pangenome accumulation curve (numb genes vs numb genomes) using the output of panaroo? HOT 3
- Pangenome graph has single unconnected COGs HOT 1
- Trouble with Panaroo output HOT 1
- panaroo-extract-gene HOT 1
- Question regarding alignment output and -a pan HOT 5
- Uncollapsed gene families HOT 1
- Conda build outdated (Bio.Alphabet issue) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from panaroo.