Giter Site home page Giter Site logo

tbaccata / amica Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 5.0 24.28 MB

amica: an interactive and user-friendly web-based platform for the analysis of proteomics data

License: GNU General Public License v3.0

Dockerfile 0.36% R 99.64%
proteomics proteomics-data-analysis maxquant fragpipe shiny-r

amica's Introduction

Project Status: Active – The project has reached a stable, usable state and is being actively developed. minimal R version GPLv3 license bioRxiv BMC Genomics

amica is freely available at https://bioapps.maxperutzlabs.ac.at/app/amica

Check out our wiki and user manual for extensive online documentation.

amica

amica is an interactive and user-friendly web-based platform that accepts proteomic input files from different sources and provides automatically generated quality control, set comparisons, differential expression, biological network and over-representation analysis on the basis of minimal user input.

amica_logo

Functionality

  • Faciliting interactive analyses and visualizations with just a couple of clicks

Input

  • DDA: MaxQuant's proteinGroups.txt, FragPipe's combined_protein.tsv
  • DIA: Spectronaut's PG report, DIA-NN's PG matrix
  • TMT: FragPipe's [abundance/ratio]protein[normalization].tsv
  • or any custom, tab-separated file.
  • Processed data can be downloaded in a developed amica format which can also be used as input
  • Experimental design mapping samples to conditions
  • Contrast matrix file for group comparisons in case of MaxQuant, FragPipe or custom upload
  • Specification file for mapping relevant columns in case of custom file upload

Outputs

  • Analyzed data downloadable as amica format
  • Almost all plots prduced by plotly (hover over plot and download plot as svg or png with the camera icon)
  • All plots have customizable plot parameters (width, height, file format, etc.)
  • Downloadable data tables

Analysis options

  • Remove decoys and proteins only identified by site (MaxQuant)
  • Filter on minimum peptide count and spectral count values
  • Filter on minimum valid values per group
  • Select intensities to:
  • (Re-)normalize intensities (VSN, Quantile, Median centering)
  • Imputate missing values from normal distribution or replace them by constant value (useful for pilots)

QC-plots

For different intensities (Raw intensities, LFQ intensities, imputed intensities)

  • PCA
  • Box plots
  • Density plots
  • Correlation plots (Pearson correlation)
  • Bar plots (identified proteins, % contaminants, most abundant proteins) per sample
  • Scatter plots
  • Automated QC report

Differential abundance analysis

  • Primary filter options (log2FC thresholds, multiple-testing correction, select enriched or reduced proteins)
  • Analyze single - or multiple selected group comparisons
  • Volcano - and MA - plots
  • Set comparisons (UpSet plots and Euler diagrams)
  • Customizable output data table (can be further filtered)
  • Heatmap
  • Dot plot
  • Fold change plot
  • Profile plot
  • Protein-protein interaction (PPI) network
  • Over-Representation Analysis (ORA)
  • Automated Diff. abundance report

Compare multiple amica files

  • Upload a second amica file from another experiment/analysis to combine datasets
  • Download combined dataset
  • Correlate intensities from combined dataset (scatter - and correlation plots)
  • Differential abundance analysis for combined amica dataset

Dependencies

All dependencies can be installed by executing the install_dependencies.R script.

Session info


> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 21.04

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] eulerr_6.1.1       colourpicker_1.1.0 RColorBrewer_1.1-2 dplyr_1.0.7        data.table_1.14.0  Rmisc_1.5         
 [7] plyr_1.8.6         lattice_0.20-45    pheatmap_1.0.12    colourvalues_0.3.7 UpSetR_1.4.0       visNetwork_2.0.9  
[13] igraph_1.2.6       reshape2_1.4.4     bslib_0.2.5.1      gprofiler2_0.2.0   DEqMS_1.10.0       limma_3.48.1      
[19] DT_0.18            heatmaply_1.2.1    viridis_0.6.1      viridisLite_0.4.0  plotly_4.9.4.1     ggfortify_0.4.12  
[25] ggplot2_3.3.5      shinyBS_0.61       shinyjs_2.0.0      shiny_1.6.0       

loaded via a namespace (and not attached):
 [1] httr_1.4.2        sass_0.4.0        tidyr_1.1.3       jsonlite_1.7.2    foreach_1.5.1     assertthat_0.2.1 
 [7] yaml_2.2.1        pillar_1.6.1      glue_1.4.2        digest_0.6.27     promises_1.2.0.1  colorspace_2.0-2 
[13] htmltools_0.5.1.1 httpuv_1.6.1      pkgconfig_2.0.3   purrr_0.3.4       xtable_1.8-4      scales_1.1.1     
[19] webshot_0.5.2     later_1.2.0       tibble_3.1.2      generics_0.1.0    ellipsis_0.3.2    cachem_1.0.5     
[25] withr_2.4.2       lazyeval_0.2.2    magrittr_2.0.1    crayon_1.4.1      mime_0.11         fs_1.5.0         
[31] fansi_0.5.0       registry_0.5-1    lifecycle_1.0.0   stringr_1.4.0     munsell_0.5.0     compiler_4.1.1   
[37] jquerylib_0.1.4   rlang_0.4.11      grid_4.1.1        iterators_1.0.13  htmlwidgets_1.5.3 crosstalk_1.1.1  
[43] miniUI_0.1.1.1    gtable_0.3.0      codetools_0.2-18  DBI_1.1.1         TSP_1.1-10        R6_2.5.0         
[49] seriation_1.3.0   gridExtra_2.3     fastmap_1.1.0     utf8_1.2.1        dendextend_1.15.1 stringi_1.7.3    
[55] Rcpp_1.0.7        vctrs_0.3.8       tidyselect_1.1.1 

Local installation

  • Using git and Rstudio
## Clone the repository
git clone https://github.com/tbaccata/amica.git

## Move to the folder
cd amica

## execute install_dependencies.R


## Inside R console or R studio
> library("shiny")

> runApp()

  • Using Docker

Have docker installed and running (www.docker.com/get-started)

## Clone the repository
git clone https://github.com/tbaccata/amica.git

## Move to the folder
cd amica

## Build amica, the -t flag is the name of the docker image
docker build -t amica .

## Start amica from terminal

docker run -p 3838:3838 amica

## Open local interface

https://localhost:3838/amica


Deploy amica with ShinyProxy

When deploying a Shiny application with ShinyProxy, the application is simply bundled as an R package and installed into a Docker image. Every time a user runs an application, a container spins up and serves the application.

Detailed documentation is provided here (https://www.shinyproxy.io/documentation/).

A minimum working example based on documentation (https://www.shinyproxy.io/documentation/deployment):


## install docker image for amica (follow the above instructions)
git clone https://github.com/tbaccata/amica.git
cd amica
docker build -t amica .
 
## download latest version and install it (for debian based systems)
wget https://www.shinyproxy.io/downloads/shinyproxy_2.5.0_amd64.deb
sudo dpkg -i shinyproxy_2.5.0_amd64.deb

## enable system process
sudo systemctl enable shinyproxy

## Add amica into specs part of the server /etc/shinyproxy/application.yml:
## In this file you can also specify the port for shinyproxy.

specs:
  - id: amica
    display-name: amica Shiny App
    description: Analysis and visualization tool for quantitative MS
    container-cmd: ["R", "-e", "shiny::runApp('/root/amica')"]
    container-image: amica
    access-groups: [scientists, mathematicians]

Used libraries and ressources

  • (Differential expression analysis) limma: Ritchie, Matthew E., et al. "limma powers differential expression analyses for RNA-sequencing and microarray studies." Nucleic acids research 43.7 (2015): e47-e47.
  • (Differential expression analysis) DEqMS: Zhu, Yafeng, et al. "DEqMS: a method for accurate variance estimation in differential protein expression analysis." Molecular & Cellular Proteomics 19.6 (2020): 1047-1057.
  • (ORA) gprofiler2: Raudvere, Uku, et al. "g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)." Nucleic acids research 47.W1 (2019): W191-W198.
  • (PPI Networks) IntAct: Orchard, Sandra, et al. "The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases." Nucleic acids research 42.D1 (2014): D358-D363.
  • (Subcell. localization) Human CellMap: Go, Christopher D., et al. "A proximity-dependent biotinylation map of a human cell." Nature (2021): 1-5.
  • (Heatmaply) heatmaply: Galili, Tal, et al. "heatmaply: an R package for creating interactive cluster heatmaps for online publishing." Bioinformatics 34.9 (2018): 1600-1602.

amica's People

Contributors

tbaccata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

amica's Issues

Heatmap error

Hi, everything works nicely for me and I find amica outstanding, only thing I can't produce is heatmaps. For some reason I get this error "Not all levels of the column_side_colors are mapped in the column_side_palette. Could you help with this?
Thanks a lot!

`RawIntensity` values identical to `ImputedIntensity`

Hello,

During a recent data analysis, I observed that the RawIntensity values in the exported amica file are identical to the ImputedIntensity values. This occurred after a run with vsn normalization and no imputation.

Steps to Reproduce:

  1. Conduct a run with vsn normalization.
  2. Ensure no imputation is selected.
  3. Export the amica file.
  4. Inspect the values of RawIntensity and ImputedIntensity.

I suspect this behavior might be a bug. Would you be able to provide clarification or confirm the issue?

Kind Regards, Bola

Can't visualize tables

Hi, when running the app on RStudio I get the following warning "Warning in is.call(funBody[[idx]]) && as.character(funBody[[idx]][[1]]) == :
'length(x) = 3 > 1' in coercion to 'logical(1)'" repeatedly in the console and I can't visualize the table with the differentially expressed proteins, but all the figures are produced normally. Could you help solving this issue?
Thanks

Error while creating a Heatmap in the Differential Abundance tab

Hello,

I encountered an error when trying to generate a heatmap using the "Differential Abundance" tab.

Steps to Reproduce:

  1. Navigate to the Differential Abundance tab.
  2. After running the DE, Go to Heatmap.
  3. Press "Submit" to create a heatmap.

After pressing "Submit", it gave two errors:
a) in the UI:
Error: NA/NaN/Inf in foreign function call (arg 10)

b) in the cmd:
Warning: Error in hclustfun: NA/NaN/Inf in foreign function call (arg 10)
107: %||%
104: modify_list
103: config
101: renderPlotly [\amica\amica/server.R#1571]
100: func
97: shinyRenderWidget
96: func
83: renderFunc
82: output$compareHeatmap
1: runApp

The "Download heatmap data" function appears to work correctly. However, upon inspection of the downloaded data, there are several NA values present in the table. This might be related to the error encountered.

Regards,
Bola

"No data uploaded" message with custom input

Hi,
Firstly let me say i think this tool looks amazing. I am struggling a bit with the input formats however. I have an output from DIA-NN (report.pg-matrix.tsv), which contains proteinID, gene name and LFQ intensity for my runs. I have formatted this into the format required for custom file upload, including a specifications file to let amica know my headers. When i click analyze, i get a small popup in the right corner saying "no data uploaded". I am not sure what to make of this (amica reports upload complete when i add the proteingroups.txt and other files).
I would process with fragpipe or something else if that was easier, but to my knowledge fragpipe does not yet support DIApasef data.
Thanks in advance
Silas

Amica frozen

Hello,

I am using Amica for the first time. I've uploaded my data (QC'ed MetaMorpheus output) using the custom format option as well as the experimental design, contrast matrix, and specification file. However, after hitting upload I do not get an error and the application appears to be frozen. It is unclear if it is running and even after several hours looks the same. Can you please advise?

Thank you.

Minor bug fix and suggestion

Hello again,
I just wanted to report a minor bug, which I've encountered: It's not possible for me to label individual proteins in the QC correlation dot plots. I can use the selection tools, however no labels appear.

Suggestion (not demand :D):
In regards to the volcano plots, it would be really nice if it was possible to fix labeled protiens in the volcano, so that it's possible to fine tune the labeling.
Best
Andree

Error processing DIA-NN data

Hi,

I managed to upload correctly data from DIA-NN but when I click on analyze the app crashes and the console prints the following error

Warning: Error in :: argument of length 0

Is there anything I'm missing?

Thanks
Lucio

After uploading files, nothing happened in the web and error message in R

Hi, I would like to use Amica for my proteomic data.
However, when I uploaded my files, the screen changed to grey and does not move from there.
So, I tried to run Amica from R and there was an error message like below.

Warning: Error in $<-.data.frame: replacement has 0 rows, data has 48

Could you tell me what I did wrong?

Kind regards,
J.

Warning: Error in [[: attempt to select less than one element in get1index 1: runApp

Hello,
Thank you for developing this tool. I am using FragPipe output. I used to be able to use it but now I am stuck on a faded grey page after clicking on Analyze both on the web version and the R version. On the R version, in the terminal, this error is displayed Warning: Error in [[: attempt to select less than one element in get1index 1: runApp
Could you help please ?
Thank you in advance for your help.

Input file firsttwolines.txt
Contrast file contrast.txt
Design file design.txt

Enrichment analysis and gene names

Hello and great work on this tool! Hope that you are able to get this published very soon :)

I would like to request a new feature for the enrichment analysis. Could you please have the genes from the input that are associated with each enrichment term be output in a column in the output table? This would be great to know which genes are the reason for a certain enrichment.

Thanks!

Sample order on x axis not working

Hello,

I tried to adjust the order of samples on the x-axis for the QC & intensity plots from here
image

but unfortunately this is not reflected in the graph as shown here:
image

am I missing something here?

Thank you so much for your appreciated help.

Best Regards,
Bola

Feature request: custom background in ORA

Add option to choose only quantified proteins as background gene list in over-representation analysis.
By default, all protein-coding genes are considered as background.

Maximum upload size exceeded

Hi,
I am trying to analyze MaxQuant data on local installation of amica. But when I upload the proteinGroups.txt file, a message appears stating that the maximum upload size exceeded, how can I solve this issue?
image

Regards,
Bola

"Friday deploy" amica FragPipe v18 support

This issue was already resolved, see (#15).

But somehow the functionality on the server is completely broken when this fix was applied, hence the version on the server does curr. not support FragPipe v18 output yet.
This does not effect the code on github, running locally should work without issues!
Hopefully, this will be soon resolved. Something must have gone wrong with docker build, the code seems fine.

The full functionality is back again on the public server. Very sorry!

Best,
Sebastian

Error: names in input file does not match with design file

Hello,

I am trying to analyse data using amica: DIA. I uploaded the .pg matrix and design and contrast file in the required format. However, I keep getting error the names of my samples doesn't match in input file and design file. I have tried several times by changing the names as well. Could you please help me where I am going wrong? I have attached the files from two trials (sample names are different in boh the trials). I changed the pg matrix file to .txt as I cannot upload .tsv file here.
contrasts_trial1.txt
design_trial1.txt
contrasts_trial2.txt
design_trial2.txt
[report.pg_matrix.tsv__trial2.txt](https://github.com/tbaccata/amica/files/15292070/report.pg_matrix.ts
report.pg_matrix_trial1.txt
v__trial2.txt)

Thank you

"X" prefix added to sample names upon uploading experimental design file

Hello,

I've encountered an issue where an "X" prefix gets added to sample names when I upload the experimental design file.

Below is a screenshot from the original .txt file, where the sample names don't have the "X" prefix:
image

However, when uploaded to the amica server, the "X" prefix appears:
image

Is there any workaround for this issue? The prefix also gets carried over to the graphs, which is not ideal.

Thank you for your assistance!

Error

Hi I am trying to analyse my data from MQ and getting the following error "error in limma Error in names(x) <- value: 'names' attribute [2] must be the same length as the vector [0]" could you please tell me what the problem could be?

Also, the example files are in .csv format and Amica asks for tsv files. Is it possible to change that?

error in uploading data from DIA-NN

Hello,

when trying to upload DIA-NN output, this error pops up:
"simpleWarning in readLines(inFile$datapath, n = 1): line 1 appears to contain an embedded nul"

How to fix it?

Thanks

analyzing DIA data

Dear amica developer team,
Thank you for developing this really useful tool. The design and implementation of the shiny app is nicely done.
Have you tested amica for DIA data? In particular, I am interested in analyzing DIA output from Spectronaut.
Thanks

Manual submission of protein list for highlighting in volcano plot

Hi Sebastian,

I was asked by a user of Amica if it would be possible to add an additional protein selection option to the Volcano plot in the "Differential abundance" tab? Specifically, it would be useful to have the option to manually submit a list of protein identifiers (Uniprot IDs, Gene names, ...), which are then highlighted and labeled in the volcano plot, similar to what is happening if you use the selection tool. Optimally, the highlighted dots would be plotted above the other points and highlighted with a different color.

Please let me know if it is possible to add this feature.

Best,
David

Error while creating a Dotplot in the Differential Abundance tab

Hello,

I encountered an error when trying to generate a dotplot in the "Differential Abundance" tab.
In the UI: it shows in red: Error: 'list' object cannot be coerced to type 'double'
In the cmd:
Warning: Error in dist: 'list' object cannot be coerced to type 'double'
171:
170: stop
169: dotplot
168: renderPlot [\amica\amica/server.R#1174]
166: func
126: drawPlot
112: reactive:plotObj
96: drawReactive
83: renderFunc
82: output$dotplot
1: runApp

Thanks for your help.

Regards,
Bola

TMT

Is this software compatible with Fragpipe TMT experiments?

Awesome

I just wanted to quickly express my gratitude for your great tool. I really enjoy using is.
I would only like to suggest to enable different types of depicting protein profile plots across several samples. It would be great if you could enable to depcit all individaul data points in each sample. Another great way to depcit the profile plots, would be via violin plots.

For getting a large user base, I think it would be good for this tool to function indepentently on Rstudio etc. I am a little familiar with non R coding and was suffering/troubleshooting a lot to get it running and feel this may be a high entry barrier for people.
Howver, I think it was worth it ;)
Cheers

Custom file uploading issue

HI,
I have t output from DIA-NN, would you able to help for correct formatting.

Protein.Group | Protein.Ids | Protein.Names | Genes | First.Protein.Description | 388 | 399 | 404 | 405 | 407 | 408
388 _Control
399_Control
404_Control
405_KO
407_KO
408_KO

Best,

Dot plots in Diff. abundance tab

Implement Dot plots, from https://prohits-viz.org/help/analysis/dotplot:

Dot plots have the advantage over heat maps in that they use the same amount of space but visualize more information. In addition to raw quantitative values being displayed via coloured circles, dot plots display the relative readout measurement between conditions via circle size and confidence in the measurement via coloured edge. Heat maps, however, are better for presenting very large data sets as the detailed information of a dot plot gets lost in these instances.

image

Things to consider

  • User defined significance thresholds (p-value vs adj.p-value)
  • Pilots without p-values
  • Adjustable color gradients (log2FCs)
  • User selection: group comparisons
  • Hier. clustering (user def. row and column order as input?)
  • Relative abundance: intensities and/or avg. spectral counting (e.g select option?)

requesting a new QC feature

Dear Sebastian and amica team,
Thanks again for all your work to make this useful tool publicly available.
I understand that the imputation is necessary for some of the features in amica, however this may affect the analysis for some of the proteomics data. This includes the analysis of the IP samples, where the bait protein and some potential binders are only present in one group. Or the proteomic analysis of the samples with the knockout genes. Is there any options in amica to count for these cases?
Can we have a new feature to generate new plots (maybe Venn?) for the samples before imputation?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.