romanhaa / cerebro Goto Github PK

Visualization of scRNA-seq data.

License: MIT License

Shell 0.33% HTML 74.61% CSS 1.20% C++ 10.12% C 1.40% R 7.62% Perl 0.01% JavaScript 1.70% Assembly 0.04% TeX 0.49% Makefile 0.05% q 0.02% Scheme 0.01% MATLAB 0.01% Roff 0.05% Tcl 2.30% Lua 0.02% Dockerfile 0.01% TypeScript 0.01% Python 0.04%

cerebro's Introduction

⚠️ Discontinuation notice: Sadly, Cerebro and cerebroApp are no longer in active development. See here for more info.

Cerebro

Table of Contents

Introduction to the Cerebro interface
Motivation
Installation
- Details: cerebroApp R package
- Details: romanhaa/cerebro Docker container
Example data sets
Conversion of other single cell data formats
Technical notes
Building from source
- On macOS
- On Windows
Troubleshooting
Credits
Contribute
Citation
License

This is the standalone version of Cerebro, cell report browser, (currently available for macOS and Windows) which allows users to interactively visualize various parts of single cell transcriptomics data without requiring bioinformatic expertise.

The core of Cerebro is the cerebroApp Shiny application which is bottled into a standalone app using Electron. Therefore, it can also be run on web servers and Linux machines, requiring only R and a set of dependencies.

Input data needs to be prepared using the cerebroApp R package which was built specifically for this purpose. It offers functionality to export a Seurat object (both v2 and v3 are supported) to the correct format in a single step. The file should be saved either with the .crb or .rds extension, indicating that internally it is an RDS object. Furthermore, the cerebroApp package also provides functions to perform a set of (optional) analyses, e.g. gene set enrichment analysis, pathway enrichment analysis based on marker gene lists of groups of cells, and more.

The exported .crb file is then loaded into Cerebro and shows all available information.

Key features:

Interactive 2D and 3D dimensional reductions.
Sample and cluster overview panels.
Tables of most expressed genes and marker genes for samples and clusters.
Tables of enriched pathways for samples and clusters.
Query gene(s) and gene sets from MSigDB and show their expression in dimensional reductions.
NEW Visualize trajectories calculated with Monocle v2.
All plots can be exported to PNG. In addition, 2D dimensional reductions can be exported to PDF.
Tables can be downloaded in CSV or Excel format.

Basic examples for Seurat v2 and v3 and scanpy workflows and subsequent exporting can be found in the examples folder. There you can also find the raw data and the output file that can be loaded into Cerebro.

Further screenshots can be found in the screenshots folder.

Introduction to the Cerebro interface

Below you find a brief description of what each panel of the Cerebro interface shows.

For more detailed description, written for biologists without computational expertise, head over here.

Load data

Select input file (.rds or .crb). Shows number of cells, samples, clusters, as well as experiment name and organism.

Overview

Shows 2D and 3D dimensional reductions. Cells can be colored by meta data variables, automatically coloring the cells using a categorical or continuous scale. Cells can be randomly down-sampled to improve performance.

Samples

Shows sample-centric perspective of data.

Composition of samples by cluster as table and plot.
Distribution of number of transcripts and expressed genes by sample.
Distribution of mitochondrial and ribosomal gene expression by sample (if it was computed with cerebroApp).
Cell cycle by sample, either determined by the Seurat function or using Cyclone (if it was computed and assigned during exporting).

Clusters

Shows cluster-centric perspective of data. See info about Samples panel above for more details.

Most expressed genes

If computed in cerebroApp, provides tables of most expressed genes by sample and cluster.

Marker genes

If computed in cerebroApp, provides tables of marker genes by sample and cluster.

Enriched pathways

If computed in cerebroApp, provides tables of enriched pathways in marker gene lists of samples and clusters.

Gene expression

Allows to show the expression of specified genes (showing the average per cell if multiple genes) in the data set. Calculation is triggered after pressing SPACE or ENTER. Multiple genes must be submitted in separate lines or separated by either space, comma, semicolon. Shows which genes are available or missing (or misspelled) in data set. Expression levels are shown in dimensional reductions and as violin plots for every sample and cluster. Average expression across all cells of the 50 most expressed genes (of the ones specified by the user) are shown as well to quickly spot which genes drive the color scale.

Gene set expression

Basically the same as the gene expression panel except that it allows to select gene sets from MSigDB (requires internet connection). Only available for human and mouse data.

Trajectory

This tab gives access to trajectory information, if data is available. Currently, we support trajectories generated by Monocle v2 which can extracted through cerebroApp::extractMonocleTrajectory(). Multiple trajectories can be added to a single Seurat object so the user here needs to choose which of those available to visualize. Several interactive plots will be shown, including dimensional reduction, distribution of categorial variables along pseudotime, composition of transcriptional states by sample, cluster, as well as distribution of transcript counts and number of expressed genes by state.

Gene ID conversion

Provides table that allows to convert gene IDs and names. Includes GENCODE identifier, ENSEMBL identifier, HAVANA identifier, gene symbol and gene type. Only available for mouse and human. Based on GENCODE annotation version M16 (mouse) and version 27 (human).

Analysis info

Overview of parameters that were used during the analysis, as long as they were provided. Also shows list of mitochondrial and ribosomal genes present in the data set if computed with cerebroApp.

Motivation

Single cell RNA-sequencing data is rich and complex. Allowing experimental biologists to explore the results is beneficial for the iterative scientific process of performing analysis and deriving conclusions. Cerebro provides an easy way to access the data without any bioinformatic expertise.

Installation

For people without any experience in using the command line, getting access to Cerebro is probably easiest by downloading Cerebro for your OS from here, then unpacking and launching it. Currently, Cerebro is available only for macOS and Windows.

More experienced users of all platforms can alternatively launch the app through the dedicated cerebroApp R package - which is the core Cerebro - or the romanhaa/cerebro Docker container.

Please check the image and table below for an overview of the supported operating systems and requirements of each way to start Cerebo.

	Standalone desktop application	cerebroApp R package	Docker container
Link	Releases	GitHub	Docker Hub
Supported OS	macOS, Windows	macOS, Windows, Linux	macOS, Windows, Linux (not all tested)
Requirements	-	R (3.5.1 or higher)	Docker client
Installation	Download current release from GitHub repository	Through BiocManager::install()	Pull container from Docker Hub
Launch Cerebro	Double-click executable	Inside R	Start container

Details: `cerebroApp` R package

Requirements: R (version 3.5.1 or higher)

A convenient IDE would be RStudio but it can be done from any R session. Make sure to install cerebroApp using BiocManager::install() to get the most recent version of dependencies on Bioconductor.

BiocManager::install("romanhaa/cerebroApp")
cerebroApp::launchCerebro()

Details: `romanhaa/cerebro` Docker container

Requirements: Docker client

docker pull romanhaa/cerebro:latest
docker run -p 8080:8080 -v <export_folder>:/plots romanhaa/cerebro
# for example
docker run -p 8080:8080 -v ~/Desktop:/plots romanhaa/cerebro

Then, in your browser you navigate to the address printed in the terminal, e.g. 127.0.0.1:8080.

Note 1: Binding a local directory with -v <export_folder>:/plots is only necessary if you want to export dimensional reductions from Cerebro.

Note 2: If you need to change the port, you can do that like this:

docker run -p <port_of_choice>:8080 -v <export_folder>:/plots romanhaa/cerebro
# OR
docker run -p <port_of_choice>:<port_of_choice> -v <export_folder>:/plots romanhaa/cerebro Rscript -e 'shiny::runApp(cerebroApp::launchCerebro(), port=<port_of_choice>, host="0.0.0.0", launch.browser=FALSE)'

Example data sets

We provide documentation and commands for the following example data sets:

pbmc_10k_v3: single sample of human peripheral blood mononuclear cells
GSE108041: 4 samples of A549 cells before and after infection with influenza virus
GSE129845: 3 samples of human bladder cells from (3 patients)

Conversion of other single cell data formats

Currently, the cerebroApp R package only provides a functions to export a Seurat (v2 or v3) object to the Cerebro input file. However, there are a few other important single cell data storage formats, e.g. AnnData (used by scanpy, SingleCellExperiment (used by scran and scater), and CellDataSet (used by Monocle).

We believe using the existing network of conversion/exporting functions is more efficient than creating a dedicated export function for scanpy data. To highlight how data processed with scanpy (stored in AnnData format) can be prepared for loading into Cerebro, we have prepared a scanpy-based workflow for the pbmc_10k_v3 example data set.

In the figure below, we highlight how you can generate the Cerebro input file from any of the four major formats.

Technical notes

Cerebro is a Shiny app that is bottled into a standalone application using Electron.
Plotting relies heavily on ggplot2 and plotly.
Tables are built with formattable.
Access to MSigDB through msigdbr.

Building from source

On macOS

To package Cerebro you need Git and Node.js (which comes with npm) installed on your computer. Then, from the command line, run:

# clone this repository
git clone https://gitlab.com/romanhaa/Cerebro.git
# install Electron packager
npm install electron-packager --global
# go into the repository
cd Cerebro
# install dependencies
npm install
# run the app
npm start
# build the app
npm run package-mac

To build the Windows version under macOS it is necessary to install Wine. I experienced problems with missing libraries of the stable version (4.0) so I recommend to use the developers version (4.4) using Homebrew:

brew tap caskroom/versions
brew update
brew install caskroom/versions/wine-devel
npm run package-win

On Windows

If you're using Linux Bash for Windows, see this guide or use node from the command prompt.

Troubleshooting

If the app shows a blank/white window, press CMR+R (macOS) or CTRL+R (Windows) to refresh the page. Especially on slower machines it can happen that the interface loads before the Shiny application is launched.

Credits

Columbus Collaboratory laid out the basics of using Electron to create a standalone Shiny application: https://github.com/ColumbusCollaboratory/electron-quick-start
Color palettes were put together with colored from: https://flatuicolors.com/
The initial app icon (until v1.2) was made by Kiranshastry from https://www.flaticon.com and is licensed by CC 3.0 BY

Contribute

To report any bugs, submit patches, or request new features, please log an issue through the issue tracker. For direct inquiries, please send an email to [email protected].

Citation

If you used Cerebro for your research, please cite the following publication:

Roman Hillje, Pier Giuseppe Pelicci, Lucilla Luzi, Cerebro: interactive visualization of scRNA-seq data, Bioinformatics, btz877, https://doi.org/10.1093/bioinformatics/btz877

License

The MIT License (MIT)

cerebro's People

Contributors

Stargazers

Watchers

Forkers

astrovsky01 yandgong307 bacemdatascience shulp2211 bioinfocz kant liuxch5 macdaliot yuewangpanda yangtaossr floudas charlenez95 genomicsnx lwchn heyunchi almartson qindan2008 royfrancis xiaonaofu0566

cerebro's Issues

New panel with useful links about methods.

How UMAP works: https://umap-learn.readthedocs.io/en/latest/how_umap_works.html
How to interpret distances in UMAP : lmcinnes/umap#92
How to effectively use t-SNE: https://distill.pub/2016/misread-tsne/
How to interpret distances in t-SNE: https://stats.stackexchange.com/questions/263539/clustering-on-the-output-of-t-sne
Problems of t-SNE: https://stats.stackexchange.com/questions/270391/should-dimensionality-reduction-for-visualization-be-considered-a-closed-probl/270414

Consider using packrat to control R packages.

https://rstudio.github.io/packrat/

Gene set expression with module score?

Hi and thanks for the very useful tool!

Currently gene set expression is calculated as an arithmetic mean, which has pitfalls like sensitivity to outliers. Could you add a feature where expression for gene sets is calculated like the module score in Seurat AddModuleScore? This would provide a sort of per-cell "enrichment" of a gene set, and would be more useful than just the mean.

Best regards,
Daniel

Hide elements that are missing?

Does it make sense to sense to hide elements that have nothing to show (and would therefore only show a text that info is missing)?

Info text is useful because it shows user that there could be more info, it's just not generated.

Example: Cell cycle boxes for samples and clusters. Especially the Cyclone info will probably often be missing.

Cerebro under proxy server

This is a screen capture of running Cerebro under a proxy server:

Under this technical condition:

MacOSX High Sierra v. 10.13.6
no firewall: no Little Snitch, no Hands Off!, etc.
wifi under a proxy server

Input of Monocle trajectory analysis

Hi,
My name is Phoebe.
Thanks for the nice work, with such a clear demonstration!
I wonder what's the input in your trajectory step in this README.md? To my knowledge, the "seurat@assays$RNA@data" stores the normalized UMI count matrix, which might not be recommended to import in the Monocle object as I saw here:

I supposed "seurat@assays$RNA@counts" would be the suggested one as mentioned above?

Any suggestions would be appreciated. Not sure whether I get it wrong or not.
Thank you in advance!

Phoebe

High performance scatter plots.

Candidates 2D:

Candidates 3D:

rthreejs: https://github.com/bwlewis/rthreejs

Improve scalability of dimensional reductions.

3D is doable up to 150,000 cells. At 300,000 cells it is barely usable on my MacBook Pro.

Allow user to assign custom colors to samples and clusters.

White screen not going away

Even when reloading, the white screen is remaining, and i can't access the app past it. My hardware isn't particularly slow, so I don't know why.

Use qs package for faster RDS file loading.

https://cran.r-project.org/web/packages/qs/readme/README.html

Can I run the whole program on command line in R?

Can I run the whole program on command line in R/Rstudio instead of using the GUI?
Thanks!
Rini

Input files size/cell number limit

Hi and thanks again for this super useful tool!

I wonder if there is any pre-defined limit (number of cells or file size) in the .crb files that can be uploaded to Cerebro? For some data I'm able to upload and explore data with 20,001 cells but the upload prompts an error when uploading a file with 20,002 cells. Is there a way to increase this limit?

Many thanks!

Maximum Upload Size Exceeded in Standalone Version

Hey @romanhaa

First of all: THANK YOU for this amazing tool! I use Cerebro on a everyday basis and my non-bioinformatician colleagues love it for exploring results I share. I'm also sharing Cerebro objects for our new article submissions :)

I'm opening this issue because of a minor issue when using Cerebro on a Windows machine. Similarly to Cerebro implementation in R, an error message appears when trying to upload large files to Cerebro, stating 'Maximum Upload Size Exceeded'.

When I use Cerebro within R, I can easily bypass this by setting MaxFileSize to a larger value. However, I couldn't figure out how to do this when using a Windows standalone version (which I requested to be installed on my lab study room computer). Is there any way to set this value when using Cerebro in the standalone version?

Ability to create custom cell clusters

Hello!

First words: impressive work! I really like Cerebro 🙂

Also, I like some features of Loupe Cell Browser. Namely it is the ability to:

Using mouse to select cells of interest and create a custom cluster.

Using gene expression to create a custom cluster matching expression criteria (e.g. log2(counts) > 1).

Same as 2., but more advanced filters can be specified.

Using custom clusters to do the differential gene expression analysis.
This is possible on both global (my cluster vs. all other cells not in my cluster) and local (my cluster 1 vs. my cluster 2) scale.

Not really a thing Loupe can do, but would it be possible to calculate an enrichment of custom gene set (possibly using custom clusters)?

Would it be possible to implement some of these features? We think analysis of scRNA-seq data is, in general, composed of a lot of manual work, and so we want to provide biologists a tool, which will be able not to only visualize data, but also to do some useful analyses.

Consider the case, when biologist will identify some interesting cell cluster and want to see its differential expression relative to all other cells, and also enriched pathways. I can imagine biologist could give me information I can use to create the cell cluster of interest, but then I have to manually run DEA and GSEA, and share the results. That's very time-consuming and we think such analysis can be easily done in a proper tool (Cerebro 🙂).

Thanks in advance! I think I could contribute to Cerebro, but I am not a Shiny expert ☹️

Can't open file using Cerebro Shiny app

Good day,

I cant load a .crbfile using the shiny app. After launching cerebroApp::launchCerebro(maxFileSize =100000), it does not load the file. The cerebro file was created successfully. Do not know what the issue might be.

crb file:

[14:05:40] Start collecting data...

[14:05:40] Overview of Cerebro object:


class: Cerebro_v1.3
cerebroApp version: 1.3.0
experiment name: ascites_prettx
organism: hg
date of analysis: 2020-12-21
date of export: 2020-12-21
number of cells: 19,325
number of genes: 20,762
grouping variables (2): orig.ident, cell_type
cell cycle variables (1): Phase
projections (3): mnn, umap, UMAP_3D
trees (1): cell_type
most expressed genes: orig.ident, cell_type
marker genes:
  - cerebro_seurat (2): orig.ident, cell_type
enriched pathways:
  - cerebro_seurat_enrichr (2): orig.ident, cell_type, 
  - cerebro_ssGSEA_go (2): orig.ident, cell_type
trajectories:
extra material:


[14:05:40] Saving Cerebro object to: cerebro_ascites_prettx_2020-12-21.crb

[14:06:18] Done!

Console output:

...
Warning in writeBin(bytes, req$.bodyData) :
  problem writing to connection
Warning in writeBin(bytes, req$.bodyData) :
  problem writing to connection
Warning in file(filename, open = "wb") :
  cannot open file '/tmp/RtmpSAaHxk/a88d7f66f4c38577f3b227ea/0.crb': Disk quota exceeded
Warning: Error in file: cannot open the connection
  [No stack trace available]

App screenshot:

Thanks in advance for the help

Use loading animations for plots.

Is it even useful?

Show sample/cluster label on top of cells in exported dimensional reduction plots.

Gene expression tab: clean up "textAreaInput" automatically.

Remove empty lines.
Remove genes that are not present in the data?

Cholmod error in performGeneSetEnrichmentAnalysis(); large dataset

Hello,
Thank your for your amazing job! Our biologists greatly appreciate your application!

I am currently working on a large dataset (around 60,000 cells for 40,000 genes), and I have an error during the GSEA.

sobj <- cerebroApp::performGeneSetEnrichmentAnalysis(object = sobj, assay = "RNA", GMT_file = gmt.file, parallel.sz = 4)
[16:03:25] Loading gene sets...
[16:03:25] Loaded 50 gene sets from GMT file.
[16:03:25] Extracting transcript counts...
Error in asMethod(object) :
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

I don't have this problem with a little dataset.
I tried with a larger memory (I work on a computing cluster), but the problem persists (I allows 150Go, but it use only 70Go).
Do you have a solution?

Docker container to launch application.

Error: object 'seurat' not found when running 'exportFromSeurat' function

Demo
Here is a minimal reproducible example which uses the official Seurat's object and exports it to cerebro file via 'exportFromSeurat'.

library(Seurat)
library(cerebroApp)
cerebroApp::exportFromSeurat(object=pbmc_small, file='./crb.rds', organism='hg', column_cluster='res.0.8', column_sample='orig.ident', experiment_name='pbmc')

Problem
The error is:

Error in cerebroApp::exportFromSeurat(object = pbmc_small, file = "./Downloads/crb.rds", :
object 'seurat' not found

Possible Solution
I was afraid the variable name 'seurat' was hard-written in the function. See these lines in the source code: https://github.com/romanhaa/cerebroApp/blob/e830e9b7191db75214a3fca838e95b9373ba75ed/R/exportFromSeurat.R#L363-L373

It might be the variable named 'export' in the function.

Use shinyEventLogger for logging.

https://cran.r-project.org/web/packages/shinyEventLogger/vignettes/shinyEventLogger.html

unable to find required package ‘Seurat’ on data load with docker

I tried both
docker run -p 8080:8080 -v ~/Desktop:/plots romanhaa/cerebro:latest
and
docker run -p 8080:8080 -v ~/Desktop:/plots romanhaa/cerebro:v1.1.1

when I load the .rds using the browser I get the error unable to find required package ‘Seurat’

Windows application stuck on white screen.

When starting the Cerebro v1.1 windows app, it loads up to just a white screen. The log screen shows the following:

[2020-03-21 15:50:46.039] [info] stderr:
 Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  namespace 'dplyr' 0.8.0.1 is being loaded, but >= 0.8.3 is required
Calls: <Anonymous> ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

[2020-03-21 15:50:46.049] [info] stderr:
 Execution halted

[2020-03-21 15:50:49.125] [info] mainWindow loaded
[2020-03-21 15:53:08.103] [info] window-all-closed

Loading in multiple seurat object

Hi,

Thanks for the great app. I find it very useful to use your app. However, currently I am facing some issues with loading in files.

Based on your example, you load in 4 different H5 files which were produced by 10X Cell Ranger.

For me, I have a hashtag data. I proceeded to demultiplex it and currently I have one big seurat object with all the identifiers. I proceeded to break it down into 5 individual seurat objects (I have 5 groups) and I am lost here. I don't know how to load it into CerebroApp.

Could you offer some advice?

Thank you very much.

regards,
Fong

Cannot find 'print' in this Seurat object

I successfully loaded the seurat object using
cerebroApp::launchCerebro(maxFileSize =8000)
But I get an error when I click on any analysis tab,
Warning: Error in : Cannot find 'print' in this Seurat object
[No stack trace available]
Warning: Error in : Cannot find 'print' in this Seurat object
74:
Warning: Error in : Cannot find 'print' in this Seurat object
108:
Warning: Error in : Cannot find 'print' in this Seurat object
74:
Warning: Error in : Cannot find 'print' in this Seurat object
104:
Warning: Error in : Cannot find 'print' in this Seurat object
104:
Warning: Error in : Cannot find 'print' in this Seurat object
104:

What does this mean?
Thanks,
Rini

Manual X and Y axis settings don't work anymore in overview panel.

Error: cannot add bindings to a locked environment