jimwhiting91 / genotype_plot Goto Github PK

A set of functions to visualise genotypes based on a VCF

License: Other

R 100.00%

genotype_plot's Introduction

Genotype Plot

Visualise and cluster genotypes based on a VCF

This function can be used to subset VCFs for regions and individuals of interest and produce high-quality figures for publications.

The function depends on R being able to access a local installation of bcftools in order to handle VCFs outside of R prior to reading in and visualising. This functionality is only available therefore on UNIX systems. However the function can be used on other systems by using the vcf_object functionality (see below).

For a demo of the plotting options, see the shiny app: https://jimw91.shinyapps.io/genotype_plot_demo

Citation

If you use Genotype Plot, please cite Zenodo DOI using the following (BibTeX):

@software{james_r_whiting_2022_5913504,
  author       = {James R Whiting},
  title        = {JimWhiting91/genotype\_plot: Genotype Plot},
  month        = jan,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {v0.2.1},
  doi          = {10.5281/zenodo.5913504},
  url          = {https://doi.org/10.5281/zenodo.5913504}
}

Installation

install.packages("remotes")
remotes::install_github("JimWhiting91/genotype_plot")

Tutorial

The package is really just a single function that handles everything. A typical call requires the following coding (NOTE: you can just give the full path to your vcf rather than use system.file):

library(GenotypePlot)

# popmap = two column data frame with column 1 for individual IDs as they appear in the VCF and column 2 for pop labels
our_popmap <- data.frame(ind = c("HG01914", "HG01985", "HG01986", "HG02013", "HG02051", "HG01879", "HG01880"),
                         pop = c(rep("CEU", 5), "XAA", "LL1"),
                         stringsAsFactors = FALSE)

# Make the genotype plot
new_plot <- genotype_plot(vcf    = system.file("example.vcf.gz",            # bgzipped VCF
                                               package = "GenotypePlot"),   
                          chr    = 1,                                       # chr or scaffold ID
                          start  = 11700000,                                # start of region
                          end    = 11800000,                                # end = end of region
                          popmap = our_popmap,                              # population membership
                          cluster        = FALSE,                           # whether to organise haplotypes by PCA clustering
                          snp_label_size = 10000,                          # breaks for position labels, eg. plot a position every 100,000 bp
                          colour_scheme=c("#FCD225","#C92D59","#300060"),   # character vector of colour values
                          invariant_filter = TRUE)                         # Filter any invariant sites before plotting

This can be minimally assembled into a plot for output.

combine_genotype_plot(new_plot)

This function uses cowplot to combine all the elements from the genotype plot (genotypes and positions, or genotypes, positions, and dendrogram if cluster=TRUE). The relative widths and heights flags can be used to edit the width of the dendogram and height of the SNP positions respectively.

Depending on the size of the region you’re looking at, these can take a long time to plot within R, so I tend to plot them directly to a PDF, e.g.

pdf("your_genotype_plot.pdf",width=10,height=8)
combine_genotype_plot(new_plot)
dev.off()

Genotypes are plotted according to the popmap order, so genotypes are visualised within populations, labelled according to unique(popmap[,2]), and are plotted in the order they appear in popmap[,1], unless cluster=TRUE (see below).

If cluster=TRUE, individuals are clustered according to PCA analysis (dudi.pca()) and hclust. This is not designed to be an explicit test of phylogeny, but can be useful to quickly visualise haplotype relationships. If cluster=TRUE, haplotypes are no longer labelled because the ordering is no longer user-defined but defined by the clustering. New labels in clustered order are returned in the output as dendro_labels (see Outputs). The clustering PCA output is also returned as cluster_pca if needed, for example to plot PC scores (cluster_pca$li)

The script first uses bcftools to subset the VCF based on the path given and co-ordinates. This is simply so we’re not reading a VCF larger than needs be into R.

# Subset our VCF
vcf_tempfile <- tempfile(pattern = "gt_plot", fileext = '.vcf')
system(glue("bcftools view -r {chr}:{start}-{end} {vcf} > {vcf_tempfile}"), wait=TRUE)

This command therefore writes a new file to the session temporary directory, which is read in and then removed from the system using unlink. The location of this directory is changed by modifying the environment variable TMPDIR, usually in ~/.Renviron.

After this, all plots are generated.

No bcftools?

No problem. It is possible to use the package on vcfR objects directly by reading any pre-made VCF into R with my_vcf <- read.vcfR("path/to/vcf") and then parsing this object to genotype_plot() with vcf_object = my_vcf. If doing this, the chr, start, and end variables are set automatically using the chromosome ID and min and max BP positions of the vcf object. For example:

new_plot <- genotype_plot(vcf_object  =  my_vcf,
                          popmap = our_popmap,                              
                          cluster        = FALSE,                           
                          snp_label_size = 10000,                          
                          colour_scheme=c("#d4b9da","#e7298a","#980043"))

As for plotting VCFs generally, only vcfR objects with one chromosome or scaffold ID are permitted.

This functionality can also be applied to more simply plotting VCFs being manipulated/analysed in R with vcfR.

For more info on vcfR, see here: https://knausb.github.io/vcfR_documentation/

Inputs

The function handles the VCF outside of R using calls to system(), so the input in terms of the VCF and region of interest are just character strings, as described above.

Preparing the VCF

The VCF needs to be sorted, bgzipped and indexed prior to use. Sorting can be done either bcftools sort or vcftool’s vcf-sort (if your VCF was produced by stacks, you’ll need to use vcf-sort). To zip and index, the following should work:

bgzip -c your_data.vcf > your_data.vcf.gz
tabix -f -p vcf your_data.vcf.gz

For multi-allelic VCFs, these are coerced to bi-allelic VCFs with BCFtools, such that multi-allelic variants are recoded as biallelic variants at the same position. This is done based on the following:

bcftools norm -m your_multi.vcf > your_bi.vcf

Popmap

The popmap should be a data.frame object with two columns: first column = individual IDs as they appear in the VCF, and second column = population label. The values from column 2 are used as labels in the final plot. Column names are irrelevant, but the order must be column 1 for inds and column 2 for pop. This can either be made within R or read in from a file with read.table().

An example popmap should look like this:

> head(popmap)
     ind pop
1 LT_F16  TACHP
2 LT_F18  TACHP
3 LT_F19  TACHP
4 LT_F23  TACHP
5 LT_F24  TACHP
6  LT_M1  TACHP

The VCF is also filtered using the popmap, so you can read in a VCF with many samples but only plot individuals in the popmap. When this is the case, the VCF is also filtered for invariant sites between the remaining individuals with a call to vcfR::is.polymorphic().

If you want to produce a figure with a row per individual, rather than a row with only a population label (as default), you can just give each individual a unique value in the pop column for example to edit the popmap above we can just do popmap[,2] <- popmap[,1] and produce a popmap that looks like this:

> popmap[,2] <- popmap[,1]
> head(popmap)
      ind pop
1 LT_F16 LT_F16
2 LT_F18 LT_F18
3 LT_F19 LT_F19
4 LT_F23 LT_F23
5 LT_F24 LT_F24
6  LT_M1  LT_M1

Alternative Plotting options

The easiest way to explore plotting options is to use the demo shiny app at https://jimw91.shinyapps.io/genotype_plot_demo

Polarisation

Typically, genotypes in a VCF are polarised realtive to the reference genome. However in some cases it can be desirable to re-polarise genotypes to the major allele in a given population. This is done by parsing the name of the focal population as a character string to polarise_genotypes for eg. polarise_genotypes="pop1".

Population Allele Frequencies

Per-population allele frequencies can be visualised by setting plot_allele_frequency=TRUE. These are based on the frequency of alleles within the populations defined in the popmap, providing a value between 0 (absent) to 1 (fixed). This flag is compatible with polarisation, so the frequency of the major allele within a focal population can be visualised. This parameter is not compatible with clustering, as the returning genotypes object only includes one row per population, rather than one row per individual.

Phased haplotypes

If your data is phased, it can be desirable to visualise/cluster haplotypes as opposed to genotypes. This can be done by setting plot_phased=TRUE, but is contingent on genotypes being coded in phased format (ie. 0|0, not 0/0). Phased genotypes are separated into two haplotypes per individual, and are plotted in the first and last colour given to colour_scheme. plot_phased is not compatible with plot_allele_frequency.

Outputs

The function returns a list where elements correspond to different parts of the plots. As a standard, all plots return a positions and genotypes element which correspond to the main genotype figure (genotypes) and the genome position labels (positions).

If cluster=TRUE, the function also returns dendrogram and dendro_labels, which correspond to a dendrogram of the clustered haplotypes and the tip labels, respectively. The PCA output cluster_pca used for clustering is also returned.

Each element is a ggplot object that can be modified as an individual object in order for the user to modify any aspect of the plot as they wish.

Manipulating the dendrogram

The dendrogram is outputted with minimal formatting by default, but it may be desirable to format this in such a way as to highlight populations or individuals etc. The dendrogram object is just a ggplot object made with the ggdendro package, so can be edited however you wish (for examples, see http://www.sthda.com/english/wiki/beautiful-dendrogram-visualizations-in-r-5-must-known-methods-unsupervised-machine-learning#ggdendro-package-ggplot2-and-dendrogram).

For e.g., if we want to add back in the tip labels, we can plot as such

# Add dendrogram tips
dendro_with_tips <- new_plot$dendrogram +
                    geom_text(aes(x=1:length(new_plot$dendro_labels),
                    y=-2.5,
                    label=new_plot$dendro_labels))

Note here that x and y are inverted because the dendrogram has been rotated 90 degrees. So here we are simply adding the new_plot$dendro_labels back in at inverted x positions of 1 to however many tips we have, and plotting them at an inverted y position of -2.5 so they don’t overlap with the plot. In the example figure below, where individuals are represented by points, this is done by using the new_plot$dendro_labels to build a metadata data.frame in which individuals have a Predation and River label that is then added in a similar way to the above but with geom_point().

Another way to sort of add tip labels to the dendrogram would be to run genotype_plot twice, the first time with cluster=TRUE and use the output1$dendro_labels to build a new popmap, e.g. popmap2 <- data.frame(ind=output1$dendro_labels,pop=output1$dendro_labels). Then run again with the cluster-ordered popmap2 in which individuals are labelled in the pop column as individuals and set cluster=FALSE. You can then plot the output1$dendrogram with output2$genotypes as below.

Like all outputs from genotype_plot(), you can simply overwrite the dendrogram slot with the new edited version and combine the plots as normal.

Change Log

#### v0.2.1
  * is.haploid flag for handling VCFs with haploid genotypes
  * Bug fix for cases where VCF IDs are numbers rather than characters
  
#### v0.2
  * Includes plotting options: phasing, allele frequencies, polarisation
  * Handling for multi-allelic VCFs
  * Missing data and invariant filtering
  * Update to clustering method based on PCA of genotype matrix
  * Updated documentation and shiny app demo
  
#### v0.1
  * Individual filtering/reordering performed with bcftools, resolves issues with filtering vcf with vcfR.
  * Popmap is checked against the VCF prior to reading in, and errors are caught.
  * `vcf_object` flag added, can be used to read vcfR objected already in R and allows package to be used with windows
  * Error messages and general reporting fixes
  * `invariant_filter` flag added, so invariant filtering now optional but still default.
  * SNP marker labels improved so are added at regular intervals

#### v0.0.9
Original R package

genotype_plot's People

Contributors

Stargazers

Watchers

Forkers

tclin422 kovimallik oyvindbusk hui-liu kiwiroy jpuritz joscolgan gaworj mariesaitou josieparis merrbii nitinra canales-aguirrecb jiangzy26 solaymane tauqeer9 duttaanik

genotype_plot's Issues

ERROR The following inds are not in vcfR object

Hey Jim,

I'm trying to use your repo to plot some haplotypes but the popmap validation is failing and I can't seem to figure out why. It happens whether I'm using a vcf_object (i.e., though vcfR) or vcf (i.e., using bcftools) as input.

For reference, here are the rownames in the transposed genotype matrix extracted from the vcf_object:

> rownames(test)
  [1] "s_40_1"   "s_40_3"   "s_40_6"   "s_40_7"   "s_40_8"   "s_40_10"  "s_40_12"  "s_40_17"  "s_40_19"  "s_41_1"   "s_41_2"   "s_41_7"   "s_41_8"  
 [14] "s_41_12"  "s_41_13"  "s_41_14"  "s_41_16"  "s_41_18"  "s_42_5"   "s_42_9"   "s_42_10"  "s_42_11"  "s_42_13"  "s_42_17"  "s_42_20"  "s_43_4"  
 [27] "s_43_5"   "s_43_6"   "s_43_8"   "s_43_10"  "s_43_12"  "s_43_13"  "s_43_14"  "s_43_15"  "s_1_9"    "s_2_3"    "s_3_5"    "s_4_18"   "s_5_16"  
 [40] "s_97_3"   "s_97_6"   "s_97_7"   "s_97_10"  "s_97_11"  "s_97_13"  "s_97_14"  "s_98_1"   "s_99_5"   "s_100_13" "s_101_17" "s_6_8"    "s_7_4"   
 [53] "s_7_6"    "s_7_7"    "s_7_11"   "s_7_13"   "s_7_16"   "s_7_19"   "s_7_20"   "s_21_18"  "s_22_15"  "s_23_4"   "s_23_6"   "s_23_7"   "s_23_9"  
 [66] "s_23_15"  "s_24_6"   "s_37_6"   "s_37_8"   "s_37_10"  "s_37_12"  "s_37_13"  "s_37_15"  "s_37_16"  "s_52_3"   "s_53_11"  "s_54_7"   "s_54_8"  
 [79] "s_54_15"  "s_54_16"  "s_54_18"  "s_54_20"  "s_56_1"   "s_77_18"  "s_78_4"   "s_79_17"  "s_80_18"  "s_81_3"   "s_82_19"  "s_83_5"   "s_83_9"  
 [92] "s_83_10"  "s_83_13"  "s_83_14"  "s_83_15"  "s_83_17"  "s_83_18"  "s_95_15"  "s_96_5"   "s_115_6"  "s_116_1"  "s_116_3"  "s_116_7"  "s_116_12"
[105] "s_116_15" "s_116_17" "s_116_18" "s_117_6"  "s_119_19"

And here is the start of the popmap file:

> head(my_popmap)
# A tibble: 6 × 2
  ind     pop  
  <chr>   <chr>
1 s_40_1  Urban
2 s_40_3  Urban
3 s_40_6  Urban
4 s_40_7  Urban
5 s_40_8  Urban
6 s_40_10 Urban

You can see that those first six samples match the first six samples in the rownames of the genotype matrix. Nonetheless, I get the following error when trying to generate a plot:

> new_plot <- genotype_plot(vcf_object = vcf, popmap = my_popmap, snp_label_size = 50000)
Removing 0 SNPs with > 50% missing data
Plotting SNP label markers
Error in genotype_plot(vcf_object = vcf, popmap = my_popmap, snp_label_size = 50000) : 
  ERROR The following inds are not in vcfR object: c("s_40_1", "s_40_3", "s_40_6", "s_40_7", "s_40_8", "s_40_10", "s_40_12", "s_40_17", "s_40_19", "s_41_1", "s_41_2", "s_41_7", "s_41_8", "s_41_12", "s_41_13", "s_41_14", "s_41_16", "s_41_18", "s_42_5", "s_42_9", "s_42_10", "s_42_11", "s_42_13", "s_42_17", "s_42_20", "s_43_4", "s_43_5", "s_43_6", "s_43_8", "s_43_10", "s_43_12", "s_43_13", "s_43_14", "s_43_15", "s_37_6", "s_37_8", "s_37_10", "s_37_12", "s_37_13", "s_37_15", "s_37_16", "s_1_9", "s_2_3", "s_3_5", "s_4_18", "s_5_16", "s_97_3", "s_97_6", 
"s_97_7", "s_97_10", "s_97_11", "s_97_13", "s_97_14", "s_98_1", "s_99_5", "s_100_13", "s_101_17", "s_6_8", "s_7_4", "s_7_6", "s_7_7", "s_7_11", "s_7_13", "s_7_16", "s_7_19", "s_7_20", "s_77_18", "s_78_4", "s_79_17", "s_80_18", "s_81_3", "s_82_19", "s_83_5", "s_83_9", "s_83_10", "s_83_13", "s_83_14", "s_83_15", "s_83_17", "s_83_18", "s_95_15", "s_96_5")

Any idea what might be going on here?

Thanks in advance for your help!

James

Feature Request: multiple chromosomes

It would be great to be able to look at data from multiple chromosome simultaneously. This would be especially great for RRL applications where de novo assemblies are not very contiguous.

bcftools command not found

It is a wonderful tool and thanks for posting this. However, when I run the example code, it shows an error message below,

sh: bcftools: command not found
Error in if (substr(vcf, start = 1, stop = 17) != "##fileformat=VCFv") { :

I then install the bcftools on my Mac OS but still see the error message, any ideas?

Thanks.
HG

Problem with phased genotypes

Hi,

In line 139 you separate genotypes into alleles with the function "separate" and use the "/" as separator.
This will not work for phased genotypes where the 2 alleles are separated with "|".

May be a solution is to not use a sep at all, this will use the default value which is a regular expression that matches any sequence of non-alphanumeric values ?

Regards,

Khalid BELKHIR

Genotype labels

Thank you for the nice tool. I'm wondering if there is a possibility to adjust the size of the labels and also to add those labels to the dendrogram:

VCF file not bgzipp

Hi,

I am having a small issue, I have bgzipped and index my vcf file using the following code:
bgzip -c my_data.vcf > my_data.vcf.gz
tabix -f -p vcf my_data.vcf.gz

However, whenever I try to use the file in genotype_plot, I received the error "vcf needs to be bgzipped"
Here is what I enter into R:
new_plot <- genotype_plot(vcf = file("ECA13_260_new.vcf.gz"), chr = 1, start = 11700000, end = 11800000, popmap = popmap, cluster = FALSE, snp_label_size = 10000, colour_scheme = c("#d4b9da","#e7298a","#980043"))

I do have bgzip properly installed as I've used this method before, so I'm wondering if I am putting the wrong information into the R code above. I made the popmap using data.frame().

Thank you for any help,
Caitlin

Plotting Error

Hi,

I'm trying to view the genotype plot, however I always get this error this error: Error: Insufficient values in manual scale. 13 needed but only 3 provided. If I run cluster = TRUE, then I can view the dendograms perfectly although I still get the same error with trying to view the genotype plot. I'm sure it's a small thing that I'm not typing in correctly or changing in your provided script :
geno_and_labels <- cowplot::plot_grid(new_plot$positions,

                                  new_plot$genotypes,

                                  axis="tblr",align="v",nrow=9,ncol=2,rel_heights=c(1,9))

I have 9 individuals, and the new_plot generated by GenotypePlot has 4 elements, so I assume that is why the error says that 13 is needed. Could you please help? Below is a more details on the error message I received. I do have ggplot2 installed/activated.

Error: Insufficient values in manual scale. 13 needed but only 3 provided.
Run rlang::last_error() to see where the error occurred.

rlang::last_error()
<error/rlang_error>
Insufficient values in manual scale. 13 needed but only 3 provided.
Backtrace:

(function (x, ...) ...
ggplot2:::print.ggplot(x)
ggplot2:::ggplot_build.ggplot(x)
base::lapply(data, scales_map_df, scales = npscales)
ggplot2:::FUN(X[[i]], ...)
base::lapply(scales$scales, function(scale) scale$map_df(df = df))
ggplot2:::FUN(X[[i]], ...)
scale$map_df(df = df)
ggplot2:::f(..., self = self)
base::lapply(aesthetics, function(j) self$map(df[[j]]))
ggplot2:::FUN(X[[i]], ...)
self$map(df[[j]])
ggplot2:::f(..., self = self)
self$palette(n)
ggplot2:::f(...)
Run rlang::last_trace() to see the full context.

Fail to open vcf.gz: could not load index

Hi Jim,

First of all, thank you very much for making this nice function!

I am trying to apply it to my project right now but there is one major error that I can not fix, although which may not directly be related to the function (I do googled hard but still no idea).

My dataset is generated by ddRAD and the vcf file was made by the populations program in STACKS. The samples were all reference mapped but the genome is under scaffold stage.

I first bgzip my vcf and then as it asked for index, I did one using bgzip -i to make a vcf.gz.i. However the function didn´t take it so I tried again with bcftools index, then the error is
Chromosome blocks not continuous
index: failed to create index

So I am wondering what I should do now, any suggestions will be highly appreciated!

P.S. I really hope that it can also applied to reduced genome sequencing like my case!

Best regards,
Han Xiao

shiny app interface code

Hi @JimWhiting91,
really liking this code base. Would you consider making the shiny app interface available as well? It would be really helpful for data exploration during collaboration.
thanks,
@stsmall

Clustering method

Hi Jim,

The clustering method used here seems to be based on the hclust procedure applied to a matrix distance.
This distance matrix is calculated from an object returned by the vcfR function extract.gt.
This function when called with as.numeric = TRUE seems to just return the first allele of each genotype :
data(vcfR_test)
extract.gt(vcfR_test)
NA00001 NA00002 NA00003
rs6054257 "0|0" "1|0" "1/1"
20_17330 "0|0" "0|1" "0/0"
rs6040355 "1|2" "2|1" "2/2"
20_1230237 "0|0" "0|0" "0/0"
microsat1 "0/1" "0/2" "1/1"

extract.gt(vcfR_test, as.numeric=T)
NA00001 NA00002 NA00003
rs6054257 0 1 1
20_17330 0 0 0
rs6040355 1 2 2
20_1230237 0 0 0
microsat1 0 0 1

I'm not sure how this can reflect distance between the underlying pairwise genotypes ?

Regards,

Khalid BELKHIR

genotype_plot can't deal with haploid vcf files

Hi! I am trying to run genotype_plot from a vcfR object with haploid individuals. The vcf has only biallelic SNPs without missing data of a single chromosome for 112 individuals. If I run the plotting as instructed I get:

> new_plot <- genotype_plot(vcf_object  =  my_vcf,
                          popmap = our_popmap,                              
                          cluster        = FALSE,                           
                          snp_label_size = 10000,                          
                          colour_scheme=c("#d4b9da","#e7298a","#980043"))
Removing 0 SNPs with > 50% missing data
83 invariants have been pruned
Plotting SNP label markers
Converting genotypes for plotting...
Expected 2 pieces. Missing pieces filled with `NA` in 3016384 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].Plotting genotypes without clustering

The function seems to try to make the general structure of the plot, with the chromosome, the lines of the SNPs and the two populations panels, but no variants are plotted (since they were all set as NA, it seems).

Thanks in advance!

Plot Appearance

Hello,
Thank you for developing this wonderful tool! I have two questions regarding the plots I have been generating:

A) Regardless of changing width and height, the y-axis labels are not lining up with the appropriate row. In the attached pdf you can see that the first two labels are collapsed and there are rows where there is no label present. How can I fix this?

B) When I have visualized this same vcf in IGV I can see data for all 94 samples. However when I plot the same file here in GenotypePlot there is a great deal of missing individuals. I know that there are filters applied on a variant-site basis, but are there filters being applied on the individual level (removing full individuals from what is plotted)?

Thanks!
EX1_3000000.pdf

[error] VCF needs to be bgzipped, or path mis-specified

Hi. I am trying to run GenotypePlot after preparing the input files following the recommendation.

bcftools norm -m - ssa21.vcf > ssa21bi.vcf
bgzip -c ssa21bi.vcf > ssa21bi.vcf.gz
tabix -f -p vcf ssa21bi.vcf.gz

# Make the genotype plot
new_plot <- genotype_plot(vcf    = system.file("ssa21bi.vcf.gz",            # bgzipped VCF
                                               package = "GenotypePlot"),   
                          chr    = ssa21,                                       # chr or scaffold ID
                          start  = 18798698,                                # start of region
                          end    = 18808698,                                # end = end of region
                          popmap = our_popmap,                              # population membership
                          cluster        = FALSE,                           # whether to organise haplotypes by PCA clustering
                          snp_label_size = 10000,                          # breaks for position labels, eg. plot a position every 100,000 bp
                          colour_scheme=c("#FCD225","#C92D59","#300060"),   # character vector of colour values
                          invariant_filter = TRUE)                         # Filter any invariant sites before plotting

But I got the following error, though I have the bgzipped file and index file.

Error in genotype_plot(vcf = system.file("ssa21bi.vcf.gz", package = "GenotypePlot"),  : 
  VCF needs to be bgzipped, or path mis-specified

I appreciate your suggestions to solve it. Thank you very much.

Issue adding dendogram tip labels: "could not find function "geom_text" "

Hi @JimWhiting91!

Love your tool, has been really helpful for visualising fixed SNPs in my data!
I'm having an issue trying to plot the dendogram tip labels back on the clustered plot.
Have been following your suggested code on your page, but for some reason ggplot2 can't find the function geom_text.

  dendro_with_tips <- new_plot$dendrogram +
+                     geom_text(aes(x=1:length(new_plot$dendro_labels),
+                     y=-2.5,
+                     label=new_plot$dendro_labels))
Error in geom_text(aes(x = 1:length(new_plot$dendro_labels), y = -2.5,  :
  could not find function "geom_text"

Any suggestions on how to fix this?

Thanks!