A tree can be considered as pangenome metadata. Provide a method to add a <code class=

Should contain: <div class="highlight highlight-source-r notranslate position-rela

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you very much for the great response <a class="user-mention notranslate" data-ho

Great! Yes, the ggtree::gheatmap function uses <code

Add `phylo` object as metadata about pagoo HOT 5 OPEN

iferres commented on May 24, 2024

Add `phylo` object as metadata

from pagoo.

Comments (5)

iferres commented on May 24, 2024

Should contain:

p$phylo # or p$tree ? A phylo (ape) object
p$gg_tree() # A ggtree object

from pagoo.

shigdon commented on May 24, 2024

I am really enjoying my time exploring how to use this R package and like the concept of your suggested enhancement. Would this be something that would provide functionality to say, order the rows of $gg_binmap based on a tree? Thanks for your contributions to this space - so helpful!

from pagoo.

iferres commented on May 24, 2024

Hi @shigdon , I'm glad you found pagoo useful! Yes, that's exactly one of the applications I see of incorporating the phylo object to the pagoo classes. To add this field should be straight-forward, and I haven't done it because we have focused on getting it published (now finishing revisions, so we hope to announce the publication soon). The thing is if I should let this field alone, or to also make other methods "aware" as I said in the issue description. This last thing is much more difficult if I'd want to provide a stable and robust method, and I'm not sure to implement it because there's no obvious or natural way to do it, and will probably look very arbitrary. Still thinking, tho.

For now you can use the gheatmap function (ggtree package) to obtain a tree attached to a heatmap. Here I copy-paste an example I had, with some notes in case you want to try and adapt the recipe:

# Extract only the shell clusters from the panmatrix
pm <- p$pan_matrix[,  as.character(p$shell_clusters$cluster) ] 
pm[which(pm >= 1, arr.ind = TRUE)] <- 1L # If you just want to show presence/absence, not abundance

# The following is to order the columns to make the hearmap "pretty"
csm <- colSums(pm)
spl <- split(names(csm), csm)
tpm <- t(pm)
norder <- lapply(spl, function(x) {
  if (length(x)<2){
    x
  }else{
    d <- vegan::vegdist(tpm[x, , drop=F], method = "jaccard", binary = T, na.rm = T)
    hc <- hclust(d, "single")
    x[hc$order]
  }
})
norder <- norder[order(as.integer(names(norder)), decreasing = T)]
forder <- unlist(norder)
pm <- pm[, forder,  drop = F]

# Now transform matrix from integer to character
pm[which(pm == 1 , arr.ind = TRUE)] <- "Present"
pm[which(pm == 0, arr.ind = TRUE)] <- "Absent"

library(phangorn)
library(ggtree)
library(magrittr)

# Suppose you already have a `phylo` object: 
tree <- midpoint(phylo) %>%         # midpoint root
  ggtree() %<+%                     # Create ggtree
  as.data.frame(p$organisms)        # Attach organism metadata

tree + geom_tippoint(aes(color = Host)) %>%
   gheatmap(pm, colnames = F, color = NA) +
      scale_fill_manual(breaks=c("Present", "Absent"), 
                        values=c("darkblue", "white"), 
                        name="genotype")

This method works if the selected clusters (columns) are not too many. I usually avoid showing also the core clusters as they would appear as a big square, but under the hood there are potentially hundreds or thousands of small tiles which slow down the rendering.

Let me know if this helps you.
Bests!

from pagoo.

shigdon commented on May 24, 2024

Thank you very much for the great response @iferres! The approach worked very well. The only change I had to make was changing pm[which(pm == 1 , arr.ind = TRUE)] <- "Present" to pm[which(pm >= 1 , arr.ind = TRUE)] <- "Present" because I read in my p/a csv file from roary and some of the values were integers > 1. The pan genome I was working with had ~ 10,000 gene clusters total of which ~ 1000 made up the core. Just a detail in response to your comment that the method works if the selected clusters are not too many. This recipe worked great for me and hopefully other people will navigate here if they want to do something similar. Cheers!

from pagoo.

iferres commented on May 24, 2024

Great! Yes, the ggtree::gheatmap function uses ggplot2's geom_tile() under the hood. It would probably be more efficient to use geom_raster() instead, which is a high performance version of geom_tile(), and would suit better these cases. In their defence, it wasn't implemented to render these type of huge arrays we are asking for :P
Bests!

from pagoo.

Add `phylo` object as metadata about pagoo HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent