Giter Site home page Giter Site logo

compass's Introduction

[action codecov

COMPASS

Combinatorial Polyfunctionality Analysis of Single Cells

COMPASS is now available on Bioconductor!

Getting Started

Install the release version of COMPASS with:

library(BiocManager)
BiocManager::install("COMPASS")

or the development version with:

library(BiocManager)
BiocManager::install(version = "devel")
BiocManager::install("COMPASS")

To get an idea of how to use COMPASS, read the vignette.

compass's People

Contributors

dtenenba avatar gfinak avatar hpages avatar kayla-morrell avatar kevinushey avatar link-ny avatar llynn avatar malisas avatar nturaga avatar ptvan avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

compass's Issues

Implement option to ignore cell subsets of degree one.

For constitutively activated markers like Granzyme B or Perforin, we want to ignore degree one subsets. In general we'll have the option of ignoring degree one subsets and merging them with the all-negative cell subset.

Incorrect countFilterThreshold Definition

The description for the countFilterThreshold in the COMPASSContainer function says that it removes files "if the number of cells expressing at least one marker of interest is less than this threshold".

But I believe the code actually removes files if the number of total number of cells in the parent population is less that the threshold.

In line 150: filter <- counts > countFilterThreshold

where the counts argument is defined as "A named integer vector of the cell counts(of the parent population) for each sample in \code{data}"

Note: The definition should be updated in COMPASSContainerFromGatingSet as well.

Bugs in COMPASSContainerFromGatingSet

Daryl reports that COMPASSContainerFromGatingSet is giving warnings:

Creating COMPASS Container
A COMPASSContainer with 112 samples from 8 individuals, containing data across 7 markers.
Warning message:
In if (!is.na(markers)) { :
  the condition has length > 1 and only the first element will be used

with the latest release of flowWorkspace and related tools.
He also gets an error:

Error in if (any(sapply(data, function(x) any(x < 0)))) { :
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In if (!is.na(markers)) { :
  the condition has length > 1 and only the first element will be used
2: '.local' is deprecated.
Use 'getSingleCellExpression' instead.
See help("Deprecated")

Presumably the API has changed somewhere and this did not get updated.
I don't think this is high priority since we can construct the objects by hand anyway.

Ideas for improving matching in CCFGS

We could consider some more strategies for name matching between the gate names, and the underlying flowSet parameters (name and desc) in COMPASSContainerFromGatingSet:

  • Choosing a mapping with a minimal 'distance', e.g. Levenshtein distance + preference for mappings for which one is an abbreviation of the other (ie, GzB -> G ran z yme B)
  • Fuzzy regex matching, e.g. agrep,
  • Other ideas?

One would hope that the cytokine gate names are generally chosen as some simple transformation of the actual cytokine / marker names, but it is probably impossible to predict each case...

@gfinak , thoughts on whether it may be worth the effort -- do we imagine using this function more for new data sets which might need some more robust mapping? Or should we just make it easy to generate maps manually?

COMPASSContainerFromGatingSet error on GatingSet parsed from flowJo xml

Due to the change (RGLab/flowWorkspace#171) made a while ago, the name column is no longer consistent with rownames for the pData of GatingSet that was parsed from flowJo, which fails the COMPASS constructor,

>       CC <- COMPASSContainerFromGatingSet(subgslist,node="4+",swap=FALSE,
+                individual_id = "PTID.VISITNO", sample_id = "name",
+                markers=c("CD154","GzB","IFNg", "IL2"),
+                mp = list("4+/154+"   = "CD154",
+                          "4+/GzB+"   = "GzB",                          
+                          "4+/IFNg+"  = "IFNg", 
+                          "4+/IL2+"   = "IL2" 
                              ))           

Extracting cell counts
Fetching 4+
Error in COMPASSContainerFromGatingSet(subgslist, node = "4+", swap = FALSE,  : 
  sample names are not consistent with rownames of pData!

The fix would be simply dropping the sample_id argument and overwrite the name column by rownames automatically. That is to day, rownames will always be the sample_id

An example data set for COMPASS

Is there example data we can use? Or can we develop sufficiently interesting simulated data that demonstrates the use and efficacy of COMPASS?

Clean up plot / plot2 methods

A lot of the functionality for subsetting, reshaping and such has been a bit 'bolted on' at this point. The code could use a cleanup.

Plot differences of mean gamma with plot.COMPASSResult

Eg., given two fits fit_s, fit_u, we might want to write something like

plot(fit_s, fit_u, ...)

to get a plot of the mean gamma from fit_s, minus those from fit_u.

The color scale should be divergent from 1 to -1.

Construct COMPASSContainer from flowWorkspace objects.

I've written a COMPASSContainerFromGatingSet method to construct a COMPASSContainer from a GatingSet or GatingSetList.

COMPASSContainerFromGatingSet <- function(gs=NULL,node=NULL,filter.fun=NULL,individual_id="PTID",sample_id="name",stimulation_id="Stim",mp=NULL)

Some notes:
filter.fun is a function, that may be user provided. It does some string substitution to clean up the node names (removing plus signs and other common symbols that people stuff in their cell population names). The code tries to guess the mapping between node names and markers. If it fails, it lets the user know, and the user can supply the mapping via mp= a list of name=value pairs where name is the node name and value is the marker name.
This is really only necessary for manual gating schemes where we sometimes have two dimensional gates for single markers.
The function will the most common data sets we get, but should provide the flexibility to be used on other data.

Matching markers in COMPASSContainerFromGatingSet()

I have some issues turning a GatingSetList into a COMPASS Container
Previous to this, I merged several GatingSets given the examples at:
http://www.bioconductor.org/packages/3.7/bioc/vignettes/flowWorkspace/inst/doc/HowToMergeGatingSet.html
There seems to be a mismatch between markers requested, and markers returned.

In these two cases when I ask for DUMP, I get data for IL-4, and when I ask for IL-4, it is not found.
(it's not a one off problem, all markers seem to be mis-matched)
Does this represent a problem with the code, or potentially a problem with the data?

Any comments would be helpful.

> foo <- COMPASSContainerFromGatingSet(L, 'Memory/CD154+ CD69+', individual_id = "sampleId", markers=c("DUMP"))
Extracting cell counts
Fetching CD154+ CD69+
Fetching child nodes
common markers are: 
CXCR3 DUMP IL-4 CD4 CCR7 CD45RA IL-17 CD69 CCR6 CD3 TNF CD154 IFNg CCR4 CD8 IL-2 
We will map the following nodes to markers:
Extracting single cell data for Memory/CD154+ CD69+/IL-4+
...................................................................................................................................................................................................................................Creating COMPASS Container
Filtering low counts
Filtering 75 samples due to low counts
> foo <- COMPASSContainerFromGatingSet(L, 'Memory/CD154+ CD69+', individual_id = "sampleId", markers=c("IL-4"))
Extracting cell counts
Fetching CD154+ CD69+
Fetching child nodes
common markers are: 
CXCR3 DUMP IL-4 CD4 CCR7 CD45RA IL-17 CD69 CCR6 CD3 TNF CD154 IFNg CCR4 CD8 IL-2 
We will map the following nodes to markers:
Extracting single cell data for NA
Error in .cpp_getNodeID(obj@pointer, sampleNames(obj)[1], y) : 
  NA not found!

order_by_max_functionality in plotCOMPASSResult

I thought this had been fixed, but seemingly it is still not working.

I want to turn re-ordering of columns OFF in the COMPASS heatmaps. There is an argument both in plotCOMPASSResult and in pheatmap which should accomplish this.... order_by_max_functionality.

If I set that to FALSE, it still is re-ordering.

If I'm looking at the correct code, I think I see the problem. That argument is handled in plotCOMPASSResult, but FALSE also needs to be passed to pheatmap (since that argument defaults to TRUE).

bundle boost dependencies with COMPASS

Compass depends on boost, a very large library, just for the digamma function. If we really want to keep that implementation, we should bundle the part of boost that implements digamma into the src of the package. The bcp utility makes this easy.

ShinyCOMPASS cannot find "transform_subset_label"?

Where to find the definition of function "transform_subset_label"
Here is the error log:

> shinyCOMPASS(fit)
Preparing data for the Shiny application, please wait a moment...
The files necessary for launching the COMPASS Shiny application have been copied to '/var/folders/13/r18ls6xn1llg31h11b4lybh80000gn/T//RtmpTGPJIz/shinyCOMPASS'.
Starting the Shiny application...

Listening on http://127.0.0.1:6281
Warning: Error in unname: could not find function "transform_subset_label"
Stack trace (innermost first):
    42: unname
    41: rev
     2: runApp
     1: shinyCOMPASS
Error in unname(transform_subset_label(colnames(DATA$data$n_s)[-ncol(DATA$data$n_s)])) : 
  could not find function "transform_subset_label"

pheatmap duplicates viewports

For grid viewports, there are two 'pairs' of functions for navigating viewports.

pushViewport()
popViewport()

and

downViewport()
upViewport()

The main difference: push and pop create and delete viewports, while down and up simply navigate between existing viewports.

The code in pheatmap matches pushViewport with upViewport, hence producing many duplicated viewports, each containing a single object, making them difficult to interact with.

All categories filtered out

Hi.

I have recently been trying to use COMPASS to compare different PBMC populations expressing 5 cytokines. When I try to plot(fit) my data, the threshold filter removes all of my categories (unless I set it to < 0 ) and I noticed that my gamma and mean_gamma are all 0s and my alpha_s and A_alphas are all quite high. What influences the binary outcome of the gamma and mean_gamma matrices? I have been trying to sift through the code, but cannot determine what about my data is causing the fit values to all be 0s. I have included a screenshot of various values from my COMPASSContainer.

Thanks for the help

fit values screenshot

my data

Something is wrong with the way COMPASS is evaluating expressions in the call

The treatment= and control= arguments take expressions and should evaluate them for subsetting, but the code breaks when we iterate over, for example, the levels of a factor. Like this:

  results<-sapply(levels(cc.cd4$meta$Stim)[3:6],function(x){
      COMPASS(cc.cd4,
      treatment=Stim%in%x,
      control=Stim%in%"negctrl",
      model="discrete",
      iterations=100000,
      replications=10,
      verbose=TRUE)
  })
 Error in match(x, table, nomatch = 0L) : object 'x' not found 

Odd bug, not reproduced

I had this bug pop up on one COMPASS run, but not the next run:

test<-COMPASS(cc.cd4,
+ treatment=Stim%in%"TB 10.4",
+ control=Stim%in%"negctrl",
+ model="discrete",
+ iterations=100000,
+ replications=10,
+ verbose=TRUE)
There are a total of 57 samples from 57 individuals in the 'treatment' group.
There are a total of 110 samples from 59 individuals in the 'control' group.
There are multiple samples per individual; these will be aggregated.
The selection criteria for 'treatment' and 'control' do not produce paired samples for each individual. The following individual(s) will be dropped:
    1004363, 1004571
The model will be run on 57 paired samples.
The category filter has removed 43 of 51 categories.
There are a total of 8 categories to be tested.
Fitting discrete COMPASS model.
Initializing parameters...
Computing initial parameter estimates...
Iteration 1000 of 1e+05.
Iteration 2000 of 1e+05.
Error: unimplemented type 'double' in 'coerceToReal'

It did not occur on the next run..

speed up package compilation

The package takes an unfortunately long time to compile. We should fix this (probably with a build step to concatenate source files together)

Tag each COMPASS model fit with sessionInfo()

Following up on the excellent suggestion of @kevinushey we'll tag each COMPASS model fit with R's sessionInfo().
Using R's attr to do this.
i.e. after fitting the model, but before returning the COMPASSResult

attr(result,"sessionInfo")<-sessionInfo()
return(atr)

We may need to modify the show method for the object so that these don't get printed when the object is printed to console.

Accessors for COMPASSResult

We should write accessors for COMPASSResult to get the

  • metadata
  • mean_gamma matrix

And to generate plots of metadata against functionality or polyfunctionality score.

function to make COMPASSResults comparable for heatmaps

We'd like to have different results show the same categories in individual heatmaps.
I've hacked something together.
makeComparable()
takes a list of COMPASSResults and modfies the contents by augmenting the categories and mean_gamma matrices so that they display common categories and common subjects (even when data are missing.. imputes with zeros).

Still has some inefficiencies.

  • augments the fit$gamma matrix, which is huge. This is just because plotting code in plot() looks at that matrix to get a dimension. We may want to just look at fit$mean_gamma for the same information.
  • The code makes n COMPASSResults comparable by looking at n choose 2 combinations. This can certainly be made more efficient.
  • probably a few other things, but we have something that works for now.

Compass:::simpleCOMPASS -> Posterior() function

Dear Dr. Greg Finak
I appreciate your recent help.
I post my issue here.

cd4 <- read.xlsx2("cd4_inter.xls", sheetIndex = 1)
cd4u3 <- U3 <- subset(cd4, (cd4$Stimulation == "Contol") & (cd4$Day == 3)); nrow(U3) # 70
cd4g3 <- G3 <- subset(cd4, (cd4$Stimulation == "Sample") & (cd4$Day == 3)); nrow(G3) # 70

fit2 <- COMPASS:::SimpleCOMPASS(data.matrix(cd4u3[,-c(1:6)]), data.matrix(cd4g3[,-c(1:6)]), meta, "iid", "sid", iterations=100, verbose = TRUE) # 20 second running time

post2 <- Posterior(fit2) ; post2 # NULL
plot(fit2) # working well

I think I don't know how to use "Posterior()" well.
Sim-data is working well at "Posterior" function, however my simpleCOMPASS output, "fit2", is not.

fit # Sim-data output
A COMPASS model fit on 10 paired samples.
fit2 # my Compass:::simpleCOMPASS output
A COMPASS model fit on 70 paired samples.
One of my guess is that it has too many paired samples. Is it right ?

Any help would be appreciated.
Sincerely,
Dohoon

Displaying COMPASS heatmap with other plots in a figure

I have a COMPASS heatmap I would like to display in a figure with some other plots, like the figure shown here from this paper:
f2 large
I am curious how the figure was created. This is the closest I can get using cowplot and some example data:

library(COMPASS) # modified
library(cowplot)
library(ggplot2)
library(gtable)

# Prepare heatmap
cytokine_annotation_colors <- c("black", "black", "black", "black", "black", "black", "black")
grouping <- "Status"
compassResult <- readRDS("/path/to/compassResult.rds")
# Modified plot.COMPASSResult to return heatmap as grob
heatmap_grob <- plot_1.COMPASSResult(compassResult, grouping, show_rownames = FALSE,
                            main = "Heatmap of Mean Probability of Response",
                            fontsize=14, fontsize_row=13, fontsize_col=11,
                            cytokine_annotation_colors=cytokine_annotation_colors)
class(heatmap_grob) # [1] "gTree" "grob"  "gDesc"
# Turn grob into gtable to make compatible with cowplot
heatmap_gtable <- gtable(unit(1, c("grobwidth"), data=heatmap_grob), unit(1, "grobheight", data=heatmap_grob))
heatmap_gtable <- gtable_add_grob(heatmap_gtable, heatmap_grob, 1, 1)

example_plot <- ggplot(data=mtcars, aes(x=mpg, y=cyl)) + geom_point() + labs(title="mtcars example plot")

cowplot_figure <- plot_grid(heatmap_gtable, NULL, example_plot, rel_widths = c(2, 4, 2), labels = c("A", "", "B"), nrow=1)
ggsave(filename="cowplot_fig_example.png",
       plot=cowplot_figure, path="/home/malisa/Desktop", device="png",
       width=9, height=6, units="in")

cowplot_fig_example

This required modifying plot.COMPASSResult to return the final heatmap as a grob and then turning it into a gtable to make compatible with cowplot. As you can see, the plot positions are still all bungled and I'm not sure how to make it look neat. If I do show_rownames = TRUE it looks even worse.

Is there an easy solution using R? If not, I could perhaps save the heatmap as an svg and then put together a figure using some other software...?

merge functionality for COMPASSContainers

Merge

  • Drop markers not in common between the CCs, and warn the user if markers are dropped.
  • Make sure there are no duplicated sample names between the COMPASS containers, and generate new unique names if necessary. We can set an attribute, or something similar, to link back to the original sample name in such a case.
  • Merge the metadata, dropping columns not in common with a warning.
  • Make sure the mappings are common between the two CCs (individual_id, sample_id, stimulation_id) and appropriately set for the new CC.

reorder `count` within COMPASSContainer call

## ensure that the counts are >= the number of rows in the data
  if (any(sapply(data, nrow) > counts)) {
    stop("There are entries in 'counts' that are greater than the ",
         "number of rows included in the 'data' matrices.", call.=FALSE)
  }

This validity check within COMPASSContainer function assumes the count and data are in the same sample order, which should not be necessarily required.

Looking at COMPASSContainerFromGatingSet call

message("Filtering low counts")
    filter <- counts > countFilterThreshold
    keep.names <- names(counts)[filter]
    sc_data <- sc_data[keep.names]
    counts <- counts[keep.names]
    pd <- subset(pd, eval(as.name(sample_id)) %in% keep.names)
    message(gettextf("Filtering %s samples due to low counts", length(filter) -
                       length(keep.names)))

The low-count sample filtering could be moved to COMPASSContainer function to address this issue.

individual_id? or metadata feed for COMPASSContainerFromGatingSet

I cannot catch up how to input the metadata for COMPASSContainerFromGatingSet. Can you give a example for working on GatingSet?
I get bunch of missing warnings when I code as:
COMPASSContainerFromGatingSet(gs = gs_man, "CD154",individual_id = pData(gs_man)$name)

Extracting cell counts
Fetching CD154
Some columns not found in metadata
Expected: name P1_S003_V1_CD154_ICS_Enr.fcsExpected: name P1_S003_V1_CD154_ICS_Pre.fcsExpected: name P1_S003_V3_CD154_ICS_Enr.fcsExpected: name P1_S003_V3_CD154_ICS_Pre.fcsExpected: name P1_S003_V4_CD154_ICS_Enr.fcsExpected: name P1_S003_V4_CD154_ICS_Pre.fcsExpected: name P1_S005_V1_CD154_ICS_Enr.fcsExpected: name P1_S005_V1_CD154_ICS_Pre.fcsExpected: name P1_S005_V3_CD154_ICS_Enr.fcsExpected: name P1_S005_V3_CD154_ICS_Pre.fcsExpected: name P1_S005_V4_CD154_ICS_Enr.fcsExpected: name P1_S005_V4_CD154_ICS_Pre.fcsExpected: name P1_S007_V1_CD154_ICS_Enr.fcsExpected: name P1_S007_V1_CD154_ICS_Pre.fcsExpected: name P1_S007_V3_CD154_ICS_Enr.fcsExpected: name P1_S007_V3_CD154_ICS_Pre.fcsExpected: name P1_S007_V4_CD154_ICS_Enr.fcsExpected: name P1_S007_V4_CD154_ICS_Pre.fcsExpected: name P1_S009_V1_CD154_ICS_Enr.fcsExpected: name P1_S009_V1_CD154_ICS_Pre.fcsExpected: name P1_S009_V3_CD154_ICS_Enr.fcsExpected: name P1_S009_V3_CD154_ICS_Pre.fcsExpected: name P1_S009_V4_CD154_ICS_Enr.fcsExpected: n...
Missing: P1_S003_V1_CD154_ICS_Enr.fcs
Missing: P1_S003_V1_CD154_ICS_Pre.fcs
Missing: P1_S003_V3_CD154_ICS_Enr.fcs

COMPASSContainerFromGatingSet error due to the incorrect child node

4+/PD1+ is supplied in mp which is contradictory to parent node setting 8+. This failed the validity check of data and counts in downstream call COMPASSContainer. We want to capture this kind of logic error earlier at COMPASSContainerFromGatingSet to report the less obscure message.

On 08/04/2016 11:43 AM, Morris, Daryl E wrote:

>       COMPASSContainerFromGatingSet(subgslist,node="8+",swap=FALSE,
+                individual_id = "PTID.VISITNO",
+                markers=c("CD154","GzB","IFNg", "IL2","IL4", "TNFa","IL21","PD-1"),
+                countFilterThreshold = 180, #5000,               
+                mp = list("8+/154+"   = "CD154",
+                          "8+/GzB+"   = "GzB",                          
+                          "8+/IFNg+"  = "IFNg", 
+                          "8+/IL2+"   = "IL2", 
+                          "8+/IL4+"   = "IL4",
+                          "8+/TNFa+"  = "TNFa",                         
+                          "8+/IL21+"  = "IL21",                         
+                          "4+/PD1+"  =  "PD-1")))
Extracting cell counts
Fetching 8+
Fetching child nodes
...
Extracting single cell data for 
8+/154+|8+/GzB+|8+/IFNg+|8+/IL2+|8+/IL4+|8+/TNFa+|8+/IL21+|4+/PD1+
..........................Creating COMPASS Container
Error : There are entries in 'counts' that are greater than 
the number of rows included in the 'data' matrices.

Clean out the continuous model

There are a load of C++ files dedicated to the 'continuous' model which is not actually used in COMPASS. We can get rid of them (for now).

User-friendly interface to COMPASS when only counts are available

Although there are internal functions to handle such data, the canonical format expected is still the 'list of matrices of cell intensities', from which counts are constructed.

We need a user-friendly interface to COMPASS that works for users who only have the counts matrices available.

Heatmap display

Add options to plot heat map e.g. 1/0, 1 means plot all the categories
0 means only plot the categories where colSums(Mgamma) >0
i.e. if all the subjects within that category have probability 0, we don't show that category

Fix metadata generated on COMPASS model fit

When COMPASS is called, a collapsed data.frame of metadata is generated and placed into e.g. the fit$data$meta slot. However, this metadata is misleading as it collapses over sample-specific metadata, and gives the appearance of mis-subsetted data.

Either this metadata should be removed (if possible) or only metadata that can be confirmed as individual-specific should be returned.

Checkpoint

It would be nice to include a checkpoint at the C level so that the code can be interrupted from the console with CTRL C.

Include real data in COMPASS

It would be nice if we could include (some subset of) real data, rather than simulated data, in the COMPASS package.

Will we have a green light to include part of the HVTN or RV144 data in the package once the paper is published?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.