Comments (3)
Hi Michael,
I'm not sure if I'm understanding the problem correctly, so apologies if this isn't what you mean. In the example you've given, the 'NCBI-ified' version of Genome2 differs from the true map in only two ways: the direction of the entire molecule is reversed, and the gene locations are numbered from a different starting point:
library(tidyverse)
library(gggenes)
true_sub_genome_A <- sub_genomeA <- example_genes %>% filter(molecule == "Genome2")
sub_genomeA$start <- -sub_genomeA$start
sub_genomeA$end <- -sub_genomeA$end
baseval <- min(c(sub_genomeA$start,sub_genomeA$end))
sub_genomeA$start <- sub_genomeA$start - baseval
sub_genomeA$end<- sub_genomeA$end - baseval
true_sub_genome_A$molecule <- "TrueGenome2"
sub_genome <- rbind(sub_genomeA,true_sub_genome_A)
ggplot(sub_genome, aes(xmin = start, xmax = end, y = molecule, fill = gene,forward = orientation)) +
geom_gene_arrow() +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
Created on 2022-03-24 by the reprex package (v2.0.1)
If it's unclear whether or not this 'NCBI-ification has been applied', or perhaps has been applied to some molecules but not others, you can't automatically detect and correct for this without some assumption about the true biology – for example, 'every molecule has a genA oriented in a positive direction'. In that situation would would need some code to go though each molecule, detect which ones have a genA in the 'wrong' direction, and reverse the orientation of those molecules. Similarly, to correct for the change in how the gene locations are numbered, you would need to have an assumption based on information from outside the dataset e.g. 'the true location of the 5′ end of protB in each molecule is 14,000'.
If I am understanding correctly, you are looking for some code to perform the first task only (i.e. reverse the orientation of molecules that have a certain known gene oriented the 'wrong' way)?
from gggenes.
Thanks for looking at the example. In my examples, mainly from bacterial genomes, it is actually the first point you recognized :
the entire molecule with all genes are reversed, the cluster starts from a different starting point.
Therefore, only the positions have to be recalculated based on the new starting point.
For most of my research partners, the orientation and exact position (14,000 etc.) of the individual genes dont even matter. They want to show the co-linearity of the gene cluster und this means some of the input genomes need to be reversed, that is the start and end points have to be recalculated.
My question is :
(1) What would be a robust measure to detect if the genome cluster is reversed (gene positions) ?
(2) What is the best way to transform start and end coordinates ? One way I used in the example above, maybe there are others.
I probably can also find an example from real-world E.coli, to show an example.
from gggenes.
Here are a couple of options. First, I'll set up a data frame containing a 'true' and an 'NCBI-ified' version of Genome1.
library(tidyverse)
library(gggenes)
genome1 <- example_genes %>%
filter(molecule == "Genome1") %>%
mutate(molecule = "TrueGenome1") %>%
select(-strand, -orientation)
ncbi_genome1 <- genome1 %>%
mutate_at(c("start", "end"), ~ -.x) %>%
mutate_at(c("start", "end"), ~ .x - min(c(start, end))) %>%
mutate(molecule = "NCBIifiedGenome1")
genomes <- bind_rows(genome1, ncbi_genome1)
ggplot(genomes, aes(xmin = start, xmax = end, y = molecule, fill = gene)) +
geom_gene_arrow() +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
The first method looks at each molecule and checks whether a selected gene (in this example, genA) is oriented in the correct direction. If not, it marks the entire molecule as 'NCBI-ified'.
# Method 1: detect if a certain gene (genA) is oriented in the correct direction
genomes %>%
nest(gene, start, end) %>%
mutate(genAstart = map_dbl(data, ~ filter(.x, gene == "genA") %>% pull(start))) %>%
mutate(genAend = map_dbl(data, ~ filter(.x, gene == "genA") %>% pull(end))) %>%
mutate(isNCBIified = genAstart > genAend) %>%
select(-genAstart, genAend) %>%
unnest(data)
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(gene, start, end)`?
#> # A tibble: 20 × 6
#> molecule gene start end genAend isNCBIified
#> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
#> 1 TrueGenome1 genA 15389 17299 17299 FALSE
#> 2 TrueGenome1 genB 17301 18161 17299 FALSE
#> 3 TrueGenome1 genC 18176 18640 17299 FALSE
#> 4 TrueGenome1 genD 18641 18985 17299 FALSE
#> 5 TrueGenome1 genE 18999 20078 17299 FALSE
#> 6 TrueGenome1 genF 20086 20451 17299 FALSE
#> 7 TrueGenome1 protC 22777 22989 17299 FALSE
#> 8 TrueGenome1 protD 22986 24023 17299 FALSE
#> 9 TrueGenome1 protE 24024 25010 17299 FALSE
#> 10 TrueGenome1 protF 20474 22720 17299 FALSE
#> 11 NCBIifiedGenome1 genA 9621 7711 7711 TRUE
#> 12 NCBIifiedGenome1 genB 7709 6849 7711 TRUE
#> 13 NCBIifiedGenome1 genC 6834 6370 7711 TRUE
#> 14 NCBIifiedGenome1 genD 6369 6025 7711 TRUE
#> 15 NCBIifiedGenome1 genE 6011 4932 7711 TRUE
#> 16 NCBIifiedGenome1 genF 4924 4559 7711 TRUE
#> 17 NCBIifiedGenome1 protC 2233 2021 7711 TRUE
#> 18 NCBIifiedGenome1 protD 2024 987 7711 TRUE
#> 19 NCBIifiedGenome1 protE 986 0 7711 TRUE
#> 20 NCBIifiedGenome1 protF 4536 2290 7711 TRUE
The second method looks at two different genes and checks if they appear the correct order. In this example, protF should start after genC.
# Method 2: detect if a certain gene (protF) starts after another gene (genC)
genomes %>%
nest(gene, start, end) %>%
mutate(protFstart = map_dbl(data, ~ filter(.x, gene == "protF") %>% pull(start))) %>%
mutate(genCend = map_dbl(data, ~ filter(.x, gene == "genC") %>% pull(end))) %>%
mutate(isNCBIified = protFstart < genCend) %>%
select(-protFstart, genCend) %>%
unnest(data)
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(gene, start, end)`?
#> # A tibble: 20 × 6
#> molecule gene start end genCend isNCBIified
#> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
#> 1 TrueGenome1 genA 15389 17299 18640 FALSE
#> 2 TrueGenome1 genB 17301 18161 18640 FALSE
#> 3 TrueGenome1 genC 18176 18640 18640 FALSE
#> 4 TrueGenome1 genD 18641 18985 18640 FALSE
#> 5 TrueGenome1 genE 18999 20078 18640 FALSE
#> 6 TrueGenome1 genF 20086 20451 18640 FALSE
#> 7 TrueGenome1 protC 22777 22989 18640 FALSE
#> 8 TrueGenome1 protD 22986 24023 18640 FALSE
#> 9 TrueGenome1 protE 24024 25010 18640 FALSE
#> 10 TrueGenome1 protF 20474 22720 18640 FALSE
#> 11 NCBIifiedGenome1 genA 9621 7711 6370 TRUE
#> 12 NCBIifiedGenome1 genB 7709 6849 6370 TRUE
#> 13 NCBIifiedGenome1 genC 6834 6370 6370 TRUE
#> 14 NCBIifiedGenome1 genD 6369 6025 6370 TRUE
#> 15 NCBIifiedGenome1 genE 6011 4932 6370 TRUE
#> 16 NCBIifiedGenome1 genF 4924 4559 6370 TRUE
#> 17 NCBIifiedGenome1 protC 2233 2021 6370 TRUE
#> 18 NCBIifiedGenome1 protD 2024 987 6370 TRUE
#> 19 NCBIifiedGenome1 protE 986 0 6370 TRUE
#> 20 NCBIifiedGenome1 protF 4536 2290 6370 TRUE
In terms of transforming the start and end coordinates, the simplest way would be to just flip the sign of all the start and end coordinates for molecules that have been identified as 'NCBI-ified'.
# Identify NCBIified molecules using Method 1, then flip the sign of all start
# and end coordinates for those molecules only
genomes %>%
nest(gene, start, end) %>%
mutate(genAstart = map_dbl(data, ~ filter(.x, gene == "genA") %>% pull(start))) %>%
mutate(genAend = map_dbl(data, ~ filter(.x, gene == "genA") %>% pull(end))) %>%
mutate(isNCBIified = genAstart > genAend) %>%
select(-genAstart, genAend) %>%
unnest(data) %>%
mutate(start = if_else(isNCBIified, -start, start)) %>%
mutate(end = if_else(isNCBIified, -end, end)) %>%
ggplot(aes(xmin = start, xmax = end, y = molecule, fill = gene)) +
geom_gene_arrow() +
facet_wrap(~ molecule, scales = "free", ncol = 1) +
scale_fill_brewer(palette = "Set3") +
theme_genes()
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(gene, start, end)`?
Created on 2022-03-26 by the reprex package (v2.0.1)
You would probably also want to transpose the coordinates by adding or subtracting a constant, using a similar method.
from gggenes.
Related Issues (20)
- theme_genes() produces error HOT 3
- Can you modify the length of the following genomic region HOT 3
- Terminator point feature HOT 5
- Deprecate `size` in favour of `linewidth`
- Implement SBOL sequence feature glyphs HOT 3
- Break strand in intra-CDS regions HOT 2
- geom_feature_label text size very small when ncol=1 in facet_wrap HOT 1
- How to prepare for gggenes HOT 3
- Gene Orientation not correct HOT 1
- An incomprehensible error HOT 4
- would be nice for protein people HOT 2
- How about an option to draw an interrupted X-axis? HOT 1
- Genome alignment HOT 1
- error: No "geom_terminator" functions HOT 2
- Can I draw two different arrows simultaneously on a gene HOT 8
- how to rescale and align genes of different size? HOT 1
- Gene length issue HOT 1
- How to deal with larger length for the genes HOT 1
- Calculate the direction of data obtained from gff file HOT 4
- How to use scale_colour_manual to specify color for individual genes HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gggenes.