parkerici / grappolo Goto Github PK

View Code? Open in Web Editor NEW

9.0 13.0 2.0 84 KB

R package to cluster single-cell data and generate features that can be used for model building

License: GNU General Public License v3.0

R 100.00%

clustering fcs-files flow-cytometry mass-cytometry single-cell

grappolo's Introduction

Please use github issues to report bugs and for feature requests

Installation

install the flowCore package

# If using a version of R >= 3.6
install.packages("BiocManager")
BiocManager::install("flowCore")
# else
source("http://bioconductor.org/biocLite.R")
biocLite("flowCore")

make sure devtools is installed on your system.

install.packages("devtools")

install grappolo with the following command

devtools::install_github("ParkerICI/grappolo")

Usage

This is an R package for clustering single-cell flow cytometry data and generate features to be used in mode building. The output of this clustering can be used to generate different types of visualizations using the vite package

The following snippets provide an example usage, documentation for all functions can be accessed directly in R.

Clustering

Given a set of FCS files, two modes of clustering are possible:

Each file is clustered separately
Data from multiple files is pooled together before clustering

The choice between these two possibilities has very important implications for feature generation and model building (see below).

Assuming an input directory called foo that contains four files:

- A.fcs
- B.fcs
- C.fcs
- D.fcs

This is how a clustering run is setup, in case you wanted to cluster each file individually

# These are the names of the columns in the FCS files that you want to use for clustering. 
# The column descriptions from the FCS files are used as name when available (corresponding
# to the $PxS FCS keyword). When descriptions are missing the channel names are used
# instead ($PxN keyword)

col.names <- c("Marker1", "Marker2", "Marker3")

# Please refer to the documentation of this function for an explanation of the parameters
# and for a description of the output. The output is saved on disk, and the function
# simply return the list of files that have been clustered
cluster_fcs_files_in_dir("foo", num.cores = 1, col.names = col.names, num.clusters = 200,
    asinh.cofactor = 5)

# You can also specify a list of files directly using the cluster_fcs_files function,
# which takes essentially the same arguments
files.list <- c("foo/A.fcs", "foo/B,fcs")
cluster_fcs_files(files.list, num.cores = 1, col.names = col.names, num.clusters = 200,
    asinh.cofactor = 5)

If instead you wanted to pool some files together, you would setup the run as follows

# Assuming for instance that you wanted to pool A.fcs and B.fcs in group 1, and C.fcs
# and D.fcs in group2 (once again please refer to the documentation for details)
files.groups <- list(
    group1 = c("foo/A.fcs", "foo/B.fcs")
    group2 = c("foo/C.fcs", "foo/D.fcs")
)

cluster_fcs_files_groups(files.groups, num.cores = 1, col.names = col.names, 
    num.clusters = 200, asinh.cofactor = 5)

Using the GUI

A GUI is available to launch a clustering run. The GUI allows you to specify all the input options in a graphical environment, instead of having to write R code.

To launch the GUI type the following in your R console

grappolo::clustering_GUI()

When the GUI starts you will be prompted to select a working directory. This directory must contain all the files that you want to include in the analysis. Select any file in that directory, and the directory that contains the file will be selected as working directory.

Output

Both clustering functions ouptut two types of data:

A summary table of per-cluster statistics
One or more RDS (R binary format) files containing cluster memberships for every cell event

The summary table contains one row for each cluster, and one column for each channel in the original FCS files, with the table entries representing the median intensity of the channel in the corresponding cluster. If multiple files have been pooled together this table also contains columns in the form [email protected], which contain the median expression of Marker1, calculated only on the cells in that cluster that came from sample A.fcs

The RDS files contain R data frames, where each row represents a different cell, and the columns the intensity of different markers. A special column called cellType indicates cluster membership

grappolo's People

Contributors

Stargazers

Watchers

Forkers

gjhanchem

grappolo's Issues

grappolo not working when clustering is launched from GUI

Hi, My team is having trouble running the SCAFFoLD package in R. We are using R Studio 3.4.4. The GUI returns a successful completion message, but the warning message in R Studio is:

Warning in parallel::mclapply(files.list, mc.cores = num.cores, mc.preschedule = FALSE, : 21 function calls resulted in an error

Is this an issue with the SCAFFoLD GUI package, the Bioconductor packages, or something else? Thank you! -Courtney

 I reinstalled all associated packages and this is what I am running so far: R Console version 3.4.4 

FlowCore from Bioconductor version 3.6 (BiocInstaller 1.28.0)
 devtools from CRAN (revolutionanalytics.com)
 install_github("nolanlab/scaffold") running all in a Mac OS environment, with C++ compiler (XCode) installed...

also getting same error using grappolo GUI, unfortunately.

**** I HAVE been able to run grappolo using R script.****

Error in `[.data.frame`(tab, , col.names) : undefined columns selected

Greetings,

I am trying to cluster some fcs files for a scaffold analysis, but I keep getting the same error, regardless of what files or marker names used:

> devtools::install_github("ParkerICI/premessa")
> devtools::install_github("ParkerICI/grappolo")
> devtools::install_github("ParkerICI/vite")
> devtools::install_github("ParkerICI/panorama")
> 
> library(premessa)
> library(grappolo)
> library(vite)
> library(panorama)
> 
> setwd("T:\\CyTOF\\Data for Analysis\\Lynn\\Mouse study\\WD M10 B1")
> 
> ## CLUSTERING
> 
> # These are the names of the columns in the FCS files that you want to use for clustering. 
> # The column descriptions from the FCS files are used as name when available (corresponding
> # to the $PxS FCS keyword). When descriptions are missing the channel names are used
> # instead ($PxN keyword)
> grappolo::cluster_fcs_files_in_dir(num.cores = 1, wd = ".", col.names = col.names, num.clusters = 200, asinh.cofactor = 5, output.dir = "clustered_single_samples")
uneven number of tokens: 753
The last keyword is dropped.
uneven number of tokens: 753
The last keyword is dropped.
Error in `[.data.frame`(tab, , col.names) : undefined columns selected
In addition: Warning message:
No '$PnE' keyword available for the following channels: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60
Using '0,0' as default.

I've checked with FlowJo whether the markers were all the same throughout all the input files, and they seem to be. I have attached a few of the input files we are trying to use, and the session info.
Some help would be appreciated!

Scaffold_Test_Data.zip
Sessioninfo.txt

Thank you and greetings,

Mike

question about output files from grappolo

Hi Federico - is there a way to generate .fcs files after clustering in grappolo? Or rather, is there a script that can convert .RData files from grappolo to .fcs?

Asking for my team, as they had previously analyzed data in SCAFFoLD, used a script to create .fcs files, then opened the resulting .fcs file in a different program such as cyt3, etc.

Please let me know if my question is unclear.

no RData or txt files created after pooled clustering in grappolo

Hi,
I'm having similar issues with grappolo as in the past, except now I am attempting to generate pooled clusters using the clustering_GUI(). Is it reasonable to suspect the same cause as before?
("The problem is due to the use of parallel::mclapply when called from a shiny app. I have opened an issue on the shiny github and will keep you posted rstudio/shiny#2163...")
I do not get an error message in R Studio, but the same symptoms persist: successful clustering message window pops up instantaneously, while no RData or .txt files are generated. Again, in MacOS.

Clara replacement?

Hi,
Did you think this could improve the speed without decreasing quality?
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008625
Best.

Discrepancy between the pooled clustering and SCAFFOLD analysis

Hi Federico,
I have been running a pooled clustering analysis using grappolo::cluster_fcs_files_groups, but then I can't use those data for the Scaffold analysis.
It seems the problem is that grappolo produce one .txt file (all.samples.clustered.txt) with all the data, while the current Scaffold (run_scaffold_analysis) code expect to have .txt for each .fcs files.
Thank you!

Chiara

GUI not working

Hi Federico,

The GUI is not working, but we can still run the program directly on R.
Thanks!