The ggoncoplot from selkamand

Update combine_plots to respect gg_tmb_height and friends

gg_tmb_height and gg_gene_width are currently ignored by combine_plots

similarly, there is no option gg_metadata_height option yet.

We need to

add gg_metadata_height as a user-configurable paramater
rework combine_plots to respect gg_tmb_height, gg_gene_width and gg_metadata_height

Shorten error message when lots of samples with metadata have no mutations

[X] samples with metadata have no mutations. Fitering these out
ℹ To keep these samples, set metadata_require_mutations = FALSE. To view them in the oncoplot ensure you additionally set show_all_samples = TRUE

Then lists all samples - even if there are hundreds.

If theres > 10 samples missing, just print the number

Remove commented code from minimal usage example in usage vignette

gbm_df |> 
  [ggoncoplot](https://selkamand.github.io/ggoncoplot/reference/ggoncoplot.html)(
    col_genes = 'Hugo_Symbol', 
    col_samples = 'Tumor_Sample_Barcode', 
    #col_mutation_type = 'Variant_Classification', 
    # topn = 10, 
    # interactive = TRUE
  )

to

gbm_df |> 
  [ggoncoplot](https://selkamand.github.io/ggoncoplot/reference/ggoncoplot.html)(
    col_genes = 'Hugo_Symbol', 
    col_samples = 'Tumor_Sample_Barcode'
  )

Create Gene Barplot Functionality

To the right side of an oncoplot, we should optionally plot a barplot showing # of samples with gene mutated (fill colour based on mutation type)

Add option to show samples with no mutations in genes

Rendering of Static version of oncoplot shows lines collapsing into one another

Add option to oncoplot specific genes, but still sort based on mutational status

Add option to automatically filter out any samples with metadata but ZERO mutations, even if `show_all_samples = TRUE`

Add option to avoid collapsing mutation_type into 'multi-hit' if all mutations are the same type

should default to false.

Reason for implementing:
It allows col_mutation_type to relate to anything -e.g. colour by pathways

Avoid .data in tidyselect expressions

Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
i Please use all_of(var) (or any_of(var)) instead of .data[[var]]

Oncoplots upside down

Oncoplot gene rankings are inverted. Tests should have picked this up.

fix unit tests to pick up gene ranking order appropriately

Change legend layout

As we add sample annotations, each needing their own legends (for non-interactive plots at least), it might be better to collect all legends, on the right side of the plot:
example below

Original Source:

https://www.researchgate.net/figure/The-mutational-landscape-of-33-cancer-genes-in-Chinese-breast-tumors-n305-Genomic_fig1_332730584

How I found it
https://www.biostars.org/p/9473274/

This solution is clear, easy to implement and solves the core problem

Add arguments for controlling non-interactive ggoncoplot aspect ratio

Recreate all tests using simulated data instead of large GBM dataset

Now that we've created datasets for testing oncoplots that covers all the edge-cases I can think of, we should recreate our unit tests using just this smaller dataset

Separate logic for identifying topN mutated genes from a mutations db.

Building a general function to do this will be useful in segmenting testing logic but also re-usable for other plots

Enforce fixed colour scheme across different calls

Problem:

To maximise flexibility of ggoncoplot, we don't force the mutation types defined by col_mutation_type to align to any ontology. The end-user can use whatever mutation types they like. The problem with this is that this makes it difficult to automatically choose colours for these different mutation types in a manner thats consistent across different datasets.

Currently, we use an RColorBrewer palette and decide which colour is attached to each mutation type based on the frequency of the mutation types. To demonstrate why this is not ideal lets go through an example. Say you produce an oncoplot for two different cohorts, one of which is dominated by missense mutatons, the other by silent mutations. In one of these oncoplots missense mutations will be the same colour as silent mutations in the other. This would be extremely confusing.

potential solutions

Force users to use some ontology for 'mutation_type'. Then we'll know all the possible mutation types in advanced and can make a single manual palette that maps each value to a colour consistently no matter what data is input. Major downside is the lack of choice for the end user. It may also be a lot of work for end-users to convert their mutation_type ontology to whatever we enforce. What ontology should be enforced? Should we try and guess at the mapping based on names of mutation_type? we might be able to provide mappings from one ontology to another to help users streamline data preprocessing
We force users to define a mapping of mutation_types to colours. We make sure they have accounted for every value in their dataset. We could help with this by providing users with a basic example palette they should supply ggoncoplot. ggoncoplot would error unless user supplied this mapping.
Both -- force an ontology UNLESS user supplies a palette mapping all mutation_types to colours. Best of both worlds

Each potential solution has its benefits and drawbacks. 1 is more work for the end-user but will make it easier to integrate ggoncoplot in shiny apps and pipelines. 2 is easier and more flexible for end-user, and allows domain-specific mutation_types to be used (e.g. there'd be the option to colour mutations based on germline/somatic origins in cancer data visualisation). 3 Is more work for me, and adds some complexity to the usage BUT with some careful info/warning messages sent to cli we could probably make this quite intuitive for end-users

Plan of attack

Start implementing (1) as step 1. If I have time I'll work towards (3)

Pull out sample order code to its own function

On click copy sample ID to clipboard

ggiraph supports running javascript on click events (without shiny)

See below for details
https://davidgohel.github.io/ggiraph/articles/offcran/using_ggiraph.html#using-onclick-1

One typically annoying thing about oncoplots is seeing interesting samples and having to copy out sample IDs. It would be more convenient to just click the id and automatically copy the sample name.

In javascript, you can write text to clipboard

navigator.clipboard.writeText('text')

Could fire this on an onclick event

Add ability to control legend size - either through controlling rows/columns of table or size of text

Or potentially add greater margins around legend so that its forced away from edges of drawing screen

Clinical annotation y axis text too far awar

Looks like there's a problem with patchwork alignment forcing y axis text of clinical annotations

separate business vs vis logic in ggoncoplot

I need to be able to unit test the data transformation code required to plot an oncoplot.

Currently data transformation code is packaged in the same function as ggoncoplot. I should pull out data transformation into a separate
function e.g. ggoncoplot_data_prep - then i can unit test that separately to the visualisation code

show_all_samples bug - shows sample with mutation last, after all the samples missing mutational data were shown

Originally discovered with unpublished data, will need to find a public / simulated dataset to replicate

unit test clinical annotations

Control fontsize of axis titles + gene names (y axis text) from `ggoncoplot` call

expose margin control and na_marker options of gg

Expose gg1d_plot `palettes` option to allow users to customise metadata tile colours

Switch assertthat to assertions

Documentation: fix 'Add both TMB and ' section

title is wrong and code doesn't show both TMB and gene barplots

Sort by clinical annotations

Should be powered by the rank package.

There's a commented section that indicates where sample sorting code should go (right before refactor of clinical & mutational dataframes. No need to use inbuilt sorting functionality of gg1d package

Deal with size issue

CRAN is the best place for this package, but currently package size is > 5mb limit.

Whats blowing out our size:

documentation (3.8MB)
testdata (1.6MB)

Both of these have nothing to do with the actual package functionality, so should be super solvable
2 is the easiest to solve - just move MAF csv files / R dataframes that we're using for testing into its own github R package with functions that stream the data. We can then install this package and since the data is only used for testing and docs we can add as a suggests (not an import a.k.a required dependency)

1 is a little trickier. Its probably all the interactive plots in the vignette. storing these will require some space. Rendering static plots would save the space but really take away from the documentation. Best solution would be to keep docs big but decouple from the R package. Not sure the best way to do this without causing too much pain long term. Its just so convenient to use vignettes and CI workflows. Will need more thought

Change custom tooltip multi-mutation collapse seperator from ; to \n

Should multiple mutations with same class in same gene be collapsed to 'multiple' or unique('variant class')

When collapsing multiple mutations in the same gene down to 1 row - when all mutations have the same classification do we classify as multiple or just the classification itself

Add option to Sort pathways by total distinct samples with any gene in pathway mutated

Dont drop mutations if col_mutation_type = NA

Currently mutations are droped if col_mutation_type is NA. This is horribly incorrect behaviour. Add a unit test to detect it, then fix it

Run usethis::use_github_action("test-coverage") to setup coverage github action

Unify look and scale of ggplot vs ggiraph

Add support for grouping oncoplot by pathways

Think about:

how should sorting be affected.
how should you chose which pathways to show first/second/third etc.
What should input look like (almost certainly a 2-column dataframe: 1 with genes, 1 with pathways)

Add option for 'coding only' that defaults to true

Make ggiraph hover functionality stick on click

https://stackoverflow.com/questions/67478061/how-do-i-make-the-ggiraph-hover-functionality-stick-on-click

opts_selection(only_shiny = FALSE, type = "single", css = "stroke:yellow;")

Maybe should do multiple selection. Would be useful if you could select a bunch of samples, so that you could see where they fall on a matched RNAseq tsne, for example

Add data_id to ggoncoplot_prep_df returned dataframes and add test

Current tests

# Check dataframe has required names
  expect_named(prepped_df, expected = c('Sample', 'Gene', 'MutationType', 'Tooltip'), ignore.order = TRUE)
  expect_named(prepped_df_no_mutation_type, expected = c('Sample', 'Gene', 'MutationType', 'Tooltip'), ignore.order = TRUE)

Should we add test for data_id column

selkamand / ggoncoplot Goto Github PK

ggoncoplot's People

Contributors

Stargazers

Watchers

ggoncoplot's Issues

Recommend Projects

Recommend Topics

Recommend Org