d3b-center / annofuse Goto Github PK
View Code? Open in Web Editor NEWFilter and prioritize fusion calls
License: Other
Filter and prioritize fusion calls
License: Other
Full set of examples for the functions exported
star_fusion_calls <- read.csv("NTRK_control.star-fusion.fusion_predictions.abridged.tsv", sep="\t")
fusion_standardization(
fusion_calls=star_fusion_calls,
caller = c("STARFUSION"),
tumorID = "NTRK_control"
)
Most recent version as of Dec 17 2020
See attached screenshot
Run sessionInfo()
and post the output below
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin19.5.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/openblas/0.3.10_1/lib/libopenblasp-r0.3.10.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] grid stats graphics grDevices
[5] utils datasets methods base
other attached packages:
[1] data.table_1.13.2 stringr_1.4.0
[3] arsenal_3.5.0 annoFuse_0.90.0
[5] dplyr_1.0.1 ggrepel_0.8.2
[7] kableExtra_1.1.0 knitr_1.29
[9] ggplot2_3.3.2
loaded via a namespace (and not attached):
[1] readxl_1.3.1
[2] backports_1.1.8
[3] BiocFileCache_1.12.1
[4] plyr_1.8.6
[5] lazyeval_0.2.2
[6] shinydashboard_0.7.1
[7] BiocParallel_1.22.0
[8] usethis_1.6.3
[9] GenomeInfoDb_1.24.2
[10] digest_0.6.25
[11] ensembldb_2.12.1
[12] htmltools_0.5.0
[13] magick_2.4.0
[14] fansi_0.4.1
[15] magrittr_1.5
[16] memoise_1.1.0
[17] openxlsx_4.1.5
[18] remotes_2.2.0
[19] Biostrings_2.56.0
[20] readr_1.3.1
[21] matrixStats_0.57.0
[22] askpass_1.1
[23] prettyunits_1.1.1
[24] colorspace_1.4-1
[25] blob_1.2.1
[26] rvest_0.3.6
[27] rappdirs_0.3.1
[28] haven_2.3.1
[29] xfun_0.16
[30] callr_3.5.1
[31] crayon_1.3.4
[32] RCurl_1.98-1.2
[33] jsonlite_1.7.1
[34] glue_1.4.1
[35] gtable_0.3.0
[36] zlibbioc_1.34.0
[37] XVector_0.28.0
[38] webshot_0.5.2
[39] DelayedArray_0.14.1
[40] car_3.0-10
[41] pkgbuild_1.1.0
[42] BiocGenerics_0.34.0
[43] abind_1.4-5
[44] scales_1.1.1
[45] qdapRegex_0.7.2
[46] DBI_1.1.0
[47] ggthemes_4.2.0
[48] rstatix_0.6.0
[49] Rcpp_1.0.5
[50] viridisLite_0.3.0
[51] xtable_1.8-4
[52] progress_1.2.2
[53] foreign_0.8-80
[54] bit_4.0.4
[55] stats4_4.0.2
[56] DT_0.16
[57] htmlwidgets_1.5.1
[58] httr_1.4.2
[59] ellipsis_0.3.1
[60] pkgconfig_2.0.3
[61] XML_3.99-0.5
[62] farver_2.0.3
[63] dbplyr_1.4.4
[64] reshape2_1.4.4
[65] tidyselect_1.1.0
[66] labeling_0.3
[67] rlang_0.4.7
[68] later_1.1.0.1
[69] AnnotationDbi_1.50.3
[70] cellranger_1.1.0
[71] munsell_0.5.0
[72] tools_4.0.2
[73] cli_2.0.2
[74] generics_0.0.2
[75] RSQLite_2.2.0
[76] devtools_2.3.2
[77] rintrojs_0.2.2
[78] broom_0.7.3
[79] shinyBS_0.61
[80] evaluate_0.14
[81] fastmap_1.0.1
[82] yaml_2.2.1
[83] processx_3.4.4
[84] bit64_4.0.2
[85] fs_1.5.0
[86] shinycssloaders_1.0.0
[87] zip_2.1.0
[88] purrr_0.3.4
[89] AnnotationFilter_1.12.0
[90] mime_0.9
[91] xml2_1.3.2
[92] biomaRt_2.44.1
[93] compiler_4.0.2
[94] shinythemes_1.1.2
[95] rstudioapi_0.11
[96] curl_4.3
[97] EnsDb.Hsapiens.v86_2.99.0
[98] testthat_2.3.2
[99] ggsignif_0.6.0
[100] tibble_3.0.3
[101] stringi_1.4.6
[102] highr_0.8
[103] ps_1.3.3
[104] GenomicFeatures_1.40.1
[105] desc_1.2.0
[106] forcats_0.5.0
[107] lattice_0.20-41
[108] ProtGenerics_1.20.0
[109] Matrix_1.2-18
[110] vctrs_0.3.2
[111] pillar_1.4.6
[112] lifecycle_0.2.0
[113] bitops_1.0-6
[114] httpuv_1.5.4
[115] rtracklayer_1.48.0
[116] GenomicRanges_1.40.0
[117] R6_2.4.1
[118] promises_1.1.1
[119] rio_0.5.16
[120] IRanges_2.22.2
[121] sessioninfo_1.1.1
[122] assertthat_0.2.1
[123] pkgload_1.1.0
[124] SummarizedExperiment_1.18.2
[125] openssl_1.4.2
[126] rprojroot_1.3-2
[127] withr_2.2.0
[128] GenomicAlignments_1.24.0
[129] Rsamtools_2.4.0
[130] S4Vectors_0.26.1
[131] GenomeInfoDbData_1.2.3
[132] parallel_4.0.2
[133] hms_0.5.3
[134] tidyr_1.1.2
[135] rmarkdown_2.3
[136] carData_3.0-4
[137] ggpubr_0.4.0
[138] Biobase_2.48.0
[139] shiny_1.5.0
[140] base64enc_0.1-3
[141] tinytex_0.25
Branch: exemplary
Fusion breakpoint plots not displaying.
Warning in plot_breakpoints(domainDataFrame = breakpoints_info, exons = values$data_exons, :
FusionName not provide; using first row of domainDataFrame
Warning: Error in if: argument is of length zero
[No stack trace available]
Plot link in the wiki (item 4) is broken.
Add options to customize plot_summary ?
This topic came up while we reviewed output from a different project, if the user wants to gather summary for specific columns only for example if we should get distribution of fusion type per a user provided input ( maybe like subgroup) instead of fusion caller in the current top right corner.
Not sure if the pdf generated in the best option then maybe would need widget with column options and generic bar plots per column . This is up for discussion so adding @jharenza for input
develop by @kgaonkar6 and @jharenza for input
from annoFuse to shinyFuse
This can go in the About section
Hi @federicomarini - this is a quick writeup of some of the things we can add to the text boxes.
I think we can have 2 sections in About
, then News
, in a separate tab, and Contact
in another tab, if possible.
FusionExplorer is an interactive way to explore the putative oncogenic fusion results from annoFuse.
Required input: PutativeOncogenicFusion.tsv
Features:
Filtering: Select a column and enter one or more attributes on which to filter the rows.
Data export: Export the filtered table using the download button above.
Fusion visualization: To view the gene fusion, exons, protein domains, and breakpoints:
exons
and pfam
domains by clicking the buttons on the upper right side.Plot export: To save the fusion plot, select the green download arrow below the plot.
Features:
Recurrent fusions: To analyze recurrent fusions within a cohort:
Grouping Column
. This will be the column used for analysis of recurrent fusions. For example, if your dataset has multiple histologies and you are interested in recurrent fusions by histology, this value would be histology.Counting column
. This is usually a patient-level identifier.Fusion visualization: To view the recurrent fusions results, click on the FusionSummary
tab, where plots for recurrent fusions and recurrently-fused genes will be generated.
Plot export: To save the fusion plot, select the green download arrow below the plot.
[Date of Launch]. Welcome to ShinyFuse, an application for exploration of fusion output derived from annoFuse! The purpose of this application is Please take the tour, activated by clicking on the question mark in the upper right corner, and test out the demo dataset by clicking on Load demo data
in the upper left corner. Thanks for visiting!
Please file application issues, suggestions, or bug reports here.
I was wondering if we should allow users to provide the grouping column for plot_summary() call here:
Line 799 in 92cdd69
groupby <- "Sample"
looks quite busy when we have a lot of samples.
Can we add groupby=gby_rf
to the function call to use the reactive value from the user as we do for the recurrent plots? This would be useful when the user has subgroup columns like we had "broad_histology" to plot fusion summary and recurrent fusion/gene per subgroup which might look cleaner.
From @jharenza regarding reviewer's comment:
How the authors handle the annotation of fusions if they don't necessarily fall in exon-exon boundaries? For example in MALAT1-GLI1 fusions, the break happens in the middle of the gene body of MALAT1 (a non-coding gene) and the break on the GLI1 can happen some bases upstream on exon 5-6. Would an important fusion like be annotated correctly and not filtered away? How would you classify this fusion since it is not "in-frame" (there is no frame on the MALAT1, since it is non-coding)? Would it provide some annotation saying that is likely a "promoter-swap" kind of event? I understand most fusions occur in exons borders but I would like to know if this tool is able to handle interesting real biology cases like the one described
Can we add some annotation saying that a fusion is likely to be a "promoter-swap" kind of event?
For this and similar fasta sequence(epitope etc) analysis standard fusion format will also need to extract fusion sequence from caller input
shinyFuseshinyfuse.csv
download table from shinyFuse
v0.90.0
Rename output tables to something more descriptive like shinyFuseFusionTable.csv
(?)
Branches tested: loading_progress
and exemplary
Insufficient values in manual scale. 6 needed but only 3 provided.
and
Insufficient values in manual scale. 4 needed but only 3 provided.` are errors when doing grouping.
Perhaps also rename TableSummary
to Recurrent Fusions
and TableExplorer
to Fusion Explorer
.
See below when trying to install via GitHub:
library(devtools)
install_github("d3b-center/annoFuse")
ERROR: dependency ‘EnsDb.Hsapiens.v86’ is not available for package ‘annoFuse’
Add templates for issues and PRs - model after OpenPBTA analysis?
Hello,
Thanks for this useful resource.
I am trying to find ways to use annoFuse for our internal RNAseq samples.
We use bcbio-RNAseq pipeline for the processing of samples in a single sample setting. Also, our focus is on the fusion calls from Arriba.
I was thinking of starting with annotate_fusion_calls
function to annotate Arriba calls and then later apply prioritisation. However, I have few questions;
Would it be reasonable to use annoFuse on a single sample fusion calls? Or the tool is more useful for cohort based analysis? From the annoFuse paper it seems you had tested the tool on a cohort of samples from OpenPBTA project.
If yes - can you please clarify what is expected as input for geneListReferenceDataTab
and fusionReferenceDataTab
in the annotate_fusion_calls
function? I could not figure this out from the description in the man page. Can you please share any examples?
This will be much appreciated.
Sehrish K
Idea: to have a quick numeric summary returned after processing files
Rationale: often one might overlook what happened and "just takes" the output. Exposing a compact summary of what was retained/filtered out can help to be aware of parameter values.
Implementation: could be a combination of message
calls, that e.g. can be printed out to console without messing around with the returned output 😉
General issue:
Updates for plots for use in shiny_fuse and app
Specifics:
Domain plots
Recurrent fusion/genes plots:
How will work on this:
@kgaonkar6 will work on the topics which are "internal to annoFuse" functions
@federicomarini can take a look at the "shiny_fuse" specific tasks
add kinaseGeneDomainRetained
and rename GeneLoc
-> kinaseDomainGeneLoc
by row
functionalityRemove this filter
Few suggestions:
@federicomarini @jharenza
It seems annoFuse imports tidyr::one_of() which was only added to tidyr1.02 or later but we have tidyr0.8.3 in openpbta docker so I was trying to update the version but without luck.
I think changing the @importFrom to dplyr::one_of() fixed the issue while loading annoFuse in openPBTA docker
Run sessionInfo()
and post the output below
Add function to only generate the pfambioMart and exon once when needed within the package.
@federicomarini can we add BreakpointLocation as a filter?
Basically by doing the following filters using the demo file we will be able to reproduce the figure in the MS.
sfc <- as.data.frame(sfc[ which(sfc$Fusion_Type == "in-frame" & sfc$BreakpointLocation == "Genic"),])
before runnning
plot_recurrent_genes(sfc, groupby = "broad_histology", countID = "Kids_First_Participant_ID")
@jharenza should we also make BreakpointLocation a standard fusion call column applicable so that the filter is applicable to all annoFuse output?
Be able to filter prior to plotting:
Seeing instances of intergenic
gene fusions being called intragenic
in V16:
BS_03FT4S8B: LINC01019--LINC01019/IRX1
BS_7ZV3DPGT: CNTN6--CNTN6/RPL23AP38
Comparing the intergenic
annotations to these, it seems that these are being called intragenic
due to the fact that Gene1A and Gene1B are identical.
If all intergenic fusions will always have a /
, then I suggest simple logic built around that.
Delete DS_store and remove one of the license files
Rproj.user folder needs to be removed
Clean up doc and build vignettes from them
Depending on results of comparison between TCGA and PBTA, and potentially another longread dataset with high confidence fusion calls, we can make this an option with default filtering turned on (could make this by read length or not - need to inspect fusion calls in PBTA for false +).
I found pcawg fusion dataset (hg19)
standardFusioncalls_pcawg.txt
https://dcc.icgc.org/releases/PCAWG/transcriptome/fusion
Just wanted to add here that the hg19 support is mostly important because published PCAWG/DKFZ workflows are hg19 based.
Note: It's already filtered I didn't find raw calls
This resulted in detection of 3540 fusion events, from these 2268 were detected by
both FusionCatcher/STAR-Fusion and FusionMap (from these 1821 had SV support)
and 1112 were detected by only one method and had SV support. All fusions are
available in Synapse (syn10003873)
Works for most plot except plot_breakpoints since the exon RDS is based on hg38
For example the RET gene location are as follows
chr10:43,077,027-43,130,351(GRCh38/hg38)
chr10:43,572,475-43,625,799(GRCh37/hg19)
So using the current exon RDS file the breakpoint is outside the genebody
Gene1A | Gene1B | RightBreakpoint | LeftBreakpoint |
---|---|---|---|
CCDC6 | RET | 10:61665880 | 10:43612032 |
Hi there, this looks to be an excellent tool, I am hoping I can get it to work!
This is the command I am trying:
standardizedSTARfusion<-fusion_standardization(fusion_calls = ST.STARFusion, caller = "STARfusion")
I do see that caller string options are STARfusion/arriba
#ERROR
Error in fusion_standardization(fusion_calls = ST.STARFusion, caller = "STARfusion") :
STARfusion is not a supported caller string.
I would also like some clarity on the workflow, I have calls from Arriba and STARfusion, I would like to merge the calls and prioritze the calls. Is this done after standardizing and what is the command for that.
I was also not able to run Fusion annototor on Arriba output, is this done after standardization?
Let's use these:
Sample
FusionName
LeftBreakpoint
RightBreakpoint
Fusion_Type
JunctionReadCount
SpanningFragCount
Confidence
CalledBy
General issue:
R score file names and functions don't match up
Specifics:
plotRecurrentFusion
is in plot_recurrent_fusion.R
plotRecurrentGenes
is in plot_recurrent_genes.R
plotBreakpoints
is in 'plot_breakpoints.R
fusion_multifusedis in
fusionMultifused.R
annotate_fusion_callsis in
annotate_fusion.R
expressionFilterFusionis in
expression_filtering.R`
Styler for all scripts
All functions to be in snake_case as per discussion below
How will work on this
@federicomarini will work on changing the names in the code
@kgaonkar6 to update the Fig1 function names
For some reason, this got removed
Branch: takeatour
Some rows error out and default back to plotting multiple breakpoints. For example: sample BS_17WYVEEC
KIAA1549--BRAF
fusion gives error:
Warning: Error in plot_breakpoints: domainDataFrame is empty after selecting for fusionname
[No stack trace available]
and plot:
Related: the plot render here is smushed. @kgaonkar6
Suggestions (all may not be feasible) - edit where necessary @kgaonkar6
Was searching for our demo data (putative oncogenic file we use for shinyfuse) and noticed it, along with the reference folder are missing.
Current master
NA
Run sessionInfo()
and post the output below
NA
When I try to download the table, I get an RStudio popup but no table being saved.
Also, I noticed that when I try to copy to clipboard after filtering, say, for in-frame
fusion_type
, it only copies the visible rows (25) instead of the true selection (1,613) of 4,462. In addition to this, when pasting, the output does not paste properly because of blank cells. Maybe we remove copy
altogether right now?
leftover testing opt_args in annoFuseSingleSample function needs to be removed
@kgaonkar6 will work on this
Add TCGA validation nb and files as vignette
https://cavatica.sbgenomics.com/u/kfdrc-harmonization/sd-bhjxbdqk-ad-hoc-annofuse/tasks/a5fc74b3-da0e-4683-ad2d-4298f1469d8f/stats/
2021-01-06T12:41:57.101146919Z 4: In fusion_filtering_QC(standardFusioncalls = standardFusioncalls, :
2021-01-06T12:41:57.101151409Z No fusion calls with readingframe: in-frameNo fusion calls with readingframe: frameshift
2021-01-06T12:41:57.101155305Z Execution halted
It seems no fusion call has atleast 1 junction read to support the fusion so in fusion_filtering_QC no fusion calls are left as output.
Run sessionInfo()
and post the output below
Was run on cavatica
Minor column name suggestion to standardize. Eg:
caller.count
is the only column name with a period in it
Some columns have no underscore between words, and some do
I've added v16 version of PutativeDriverAnnoFuse files here https://github.com/d3b-center/annoFuse/tree/add_v16_files/inst/extdata
This file needs to be filtered :
Filter clinical columns to only integrated_diagnosis
and broad_histology
Add BreakpointLocation column with values "Genic","Intergenic" and "Intragenic"
And renamed to just PutativeDriverAnnoFuse.tsv to be used in all examples and shiny_fuse demo data
Sliding scales currently, hard to select 1 number for filtering, but can be achieved by 10 … 10
Here's a list of items that likely will need to be addressed if we want annoFuse included in Bioconductor
expand the vignettes (e.g. nice ideas could come from the current doc folder)
roxygen to fully document the package and precisely specifying importFrom
commands
examples to showcase the usage of functions
a NEWS.md file would not hurt?
specification of some biocViews to define the "landing area"
link to some potential output files to use as showcasing? (could be then plugged in e.g. into the shiny app)
version number to be adjusted (0.99.0 for submission
related to the examples - it is a nice thing if we had unit tests, would make the code pretty robust to the nasty edge cases 😃
We could split them up to single issues for easier tracking, if desired!
Federico
Internal check for files that are provided as input in shiny_fuse or plots and other functions in general just 1 to check all columns.
For domain plots basic columns are "LeftBreakpoint", "RightBreakpoint", "FusionName", "Gene1A" and "Gene1B"
For recurrent plots and summary plot the basic columns are "LeftBreakpoint", "RightBreakpoint", "FusionName", "Gene1A" and "Gene1B" and a "grouping variable"
For filtering_fusion_QC the basic columns are "LeftBreakpoint" , "RightBreakpoint" , "FusionName" , "Sample" , "Caller" , "Fusion_Type" , "JunctionReadCount" , "SpanningFragCount" , "annots"
How will work on this
@federicomarini has added an internal function to check and add message if it correct input file.
Not sure how to fix this aside from a dropdown, but for instance in the new BreakpointLocation
column, when searching for Genic
, all 3 values: Genic
, Intergenic
, and Intragenic
come up in the FusionExplorer, due to the match of genic
in each term. Thoughts, @federicomarini ?
Update description with package authors:
Authors@R: c(person("Krutika", "S.", "Gaonkar", role = c("aut", "cre"),
email = "[email protected]",
comment = c(ORCID = "0000-0003-0838-2405")),
person("Federico", "Marini", role = "aut",
email = "[email protected]",
comment = c(ORCID = "0000-0003-3252-7758")),
person("Komal", "S.", "Rathi", role = "ctb",
email = "[email protected]",
comment = c(ORCID = "0000-0001-5534-6904")),
person("Jaclyn", "N.", "Taroni", role = "ctb",
email = "[email protected]",
comment = c(ORCID = "0000-0003-4734-4508")))
@federicomarini - I switched you from ctb
to aut
because using this guidance, you had substantial contributions with shinyFuse and reportFuse.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.