Giter Site home page Giter Site logo

myles-lewis / glmmseq Goto Github PK

View Code? Open in Web Editor NEW
18.0 5.0 10.0 14.17 MB

Gene-level general linear mixed model

Home Page: https://myles-lewis.github.io/glmmSeq/

License: Other

R 98.08% HTML 1.92%
bioinformatics differential-gene-expression gene-expression transcriptomics cran glmm mixed-models

glmmseq's Introduction

Lifecycle: Maturing License: MIT CRAN status Hits GitHub issues GitHub tag Downloads Travis

glmmSeq

This R package is designed to model gene expression with a general linear mixed model (GLMM). This allows us to include random effects as well as fixed effects. For the purpose of the package we use the glmer function from the lme4 package which fits a GLMM.

This package focuses in particular on changes in genes expression between different response or treatment groups over time.

Loading the package

From CRAN

install.packages("glmmSeq")

From Github

devtools::install_github("myles-lewis/glmmSeq")

Locally

You can also download the source directory and load the functions individually:

functions = list.files("./R", full.names = TRUE)
invisible(lapply(functions, source))

But you will need to load in the additional libraries then:

# Install CRAN packages
invisible(lapply(c("MASS", "car", "ggplot2", "ggpubr", "lme4", 
                   "lmerTest", "methods", "parallel", "plotly", 
                   "pbapply", "pbmcapply"),
                 function(p){
                   if(! p %in% rownames(installed.packages())) {
                     install.packages(p)
                   }
                   library(p, character.only=TRUE)
                 }))

# Install BioConductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
invisible(lapply(c("qvalue"), function(p){
  if(! p %in% rownames(installed.packages())) BiocManager::install(p)
  library(p, character.only=TRUE)
}))

Example script

For examples see the vignette.

Reference

glmmSeq was developed by the bioinformatics team at the Experimental Medicine & Rheumatology department and Centre for Translational Bioinformatics at Queen Mary University London.

If you use this package please cite as:

citation("glmmSeq")

## To cite package ‘glmmSeq’ in publications use:
##
##  Myles Lewis, Katriona Goldmann, Elisabetta Sciacca, Cankut Cubuk and Anna Surace (2021). 
##  glmmSeq: General Linear Mixed Models for Gene-level Differential Expression. 
##  R package version 0.5.4. https://github.com/myles-lewis/glmmSeq
##
## A BibTeX entry for LaTeX users is
##
##  @Manual{,
##    title = {glmmSeq: General Linear Mixed Models for Gene-level Differential Expression},
##    author = {Myles Lewis and Katriona Goldmann and Elisabetta Sciacca and Cankut Cubuk and Anna Surace},
##    year = {2022},
##    note = {R package version 0.5.4},
##    url = {https://github.com/myles-lewis/glmmSeq},
##  }

glmmseq's People

Contributors

elisabettasciacca avatar katrionagoldmann avatar myles-lewis avatar zktuong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

glmmseq's Issues

vector memory exhausted issue

Hi,

Thanks for developing this package!

I'm following the vignette and trying to run glmmSeq on a relatively small dataset (26 samples * 20k genes). The 26 samples are 13 pairs as a random effects variable of (1|individual). If I'm using the model ~disease+(1|condition)+covar1+covar2+covar3+covar4, R will give me Error: cannot allocate vector of size 6223.5 Gb. It runs ok if I remove 1 fixed effect variable. It wouldn't run on an HPC either, and I suppose no cores can handle a vector of this size.

sizeFactors instructions in vignette

hi,

in your vignette under size factors,

when using:

sizeFactors <- calcNormFactors(counts, method="TMM")

the sizeFactor should have an additional step like:

sizeFactors <- dgelist$samples$lib.size * dgelist$samples$norm.factors

because in the calcNormFactors details, they explicitly state:

This function computes scaling factors to convert observed library sizes into effective library sizes. The effective library sizes for use in downstream analysis are lib.size * norm.factors where lib.size contains the original library sizes and norm.factors is the vector of scaling factors computed by this function.

https://rdrr.io/bioc/edgeR/man/calcNormFactors.html

related to #20

plotCutoff parameter in fcPlot

hi Kat,

I don't know if it's me, but the role of the plotCutoff option in the fcPlot function doesn't look to be very clear .
I think it would be good if you can explain better what it is meant for.
I was trying to remove all the dots that are below my significance threshold (q < 0.05), I thought this option was able to help me but maybe it's not the case.

Dispersion memory issue

Hi

Do you have any recommendations on how to calculate the dispersion in a memory-efficient manner?

I've tried both:

  1. disp <- apply(tpm, 1, function(x){
    (var(x, na.rm=TRUE)-mean(x, na.rm=TRUE))/(mean(x, na.rm=TRUE)**2)
    })
  2. disp <- setNames(edgeR::estimateDisp(tpm)$tagwise.dispersion, rownames(tpm))

Normally I use edgeR::filterByExpr; however, then it won't work for the downstream glmmSeq it seems since some dispersion values are missing.

colors in base plot

Something is still going wrong with colors :(
if I do:

  pairedPlot(glmmResult=fitted.model,
                geneName = gene,
                x1Label = "Time",
                x2Label="Treatment",
                xTitle="Time",
                IDColumn = "Patient.ID",
                graphics = "base",
                colours = c("DMF" = "khaki", "Vehicle" = "slategray2"),
                lineColour = c("DMF" = "khaki", "Vehicle" = "slategray2"),
                modelSize = 3,
                modelColour = c("DMF" = "gold3", "Vehicle" = "darkblue"),
                modelLineColour = c("DMF" = "gold3", "Vehicle" = "darkblue"),
                fontSize=1,
                x2Offset = 6,
                logTransform=TRUE,
                addViolin = FALSE, addBox = FALSE,
                pairedOnly = FALSE)

I get:
Cattura

while what I wanted was:
immagine

(this sencond plot is done with the code of my latest push)
Do let me know if you need my data to test it :)

mention size factors in the vignette

Although it's optional, the vignette could mention that we can specify size factors as offsets to be included in the linear predictor during fitting.
Three options to get size factors are:

  • Manually:
sizeFactors <- colSums(tpmdata)  #total no of reads 
sizeFactors <- sizeFactors2 / mean(sizeFactors2)  #scale so that mean=1 (normalise) 
  • edgeR:
    sizeFactors <- calcNormFactors(counts, method="TMM")
  • DESeq2:
    sizeFactors <- estimateSizeFactorsForMatrix(tpmdata)

Add progress bar

Add progress bar to glmmSeq. It is currently difficult to tell how long each run will take. This will require the pbmcapply package.

Output summary

Could the model output table be explained?
Also is there a journal article for the methods used in creating this package?

Carrie

GSEA Support

Thanks for the outstanding package!

I am trying to formulate an analogous GSEA metric for the output of glmmSeq. Using other methods (edgeR, limma, DESeq2, wilcoxon tests) the test statistic is signed. The statistic is used for ranking the genes for differential expression, and the sign conveniently indicates the direction of enrichment. In the case with using a chi-squared statistic, I am wondering if there can be an analog. So far I have been doing the following for the contrast implied by b and the statistic stat:

sign(res@stats$coef[,b])*res@stats$Chisq[,stat]

Any thoughts on this would be much appreciated!

Can glmmseq be used in non-longitudinal data?

Hi,

I'm trying to use glmmseq to analyze some dataset in which each patient contributed multiple samples. I want to consider patient ID as random factor and the diagnosis as fixed factor. So my formula would be like:

glmmSeq(~ diagnosis + (1 | patient_ID),
countdata = OTU.table,
metadata = Sample.table,
dispersion = disp,
progress = TRUE)

As you can see, I don't have a "Timepoint" variable in this model.

Based on glmmseq manual, glmmseq is "Designed for longitudinal analysis..." If I don't have Timepoint can I still use this model?

Thanks very much!
Leran

Using TPM normalised vs raw count expression data

Hello,

Great package, many thanks for making this! I had a question regarding the example in the vignette provided. The example uses TPM normalised expression data, and with this data it calculates size factors, dispersions and fits it to a NB.

I was wondering if this is the recommended way of using this package? A negative binominal distribution can be unsuitable for TPM normalised data according to some (see for example https://biostars.org/p/471335/ ), and I'm confused why it would still be useful to calculate size factors on normalised data?

Thanks in advance.
Best wishes, Rik

fontSize option does not work

We have to fix the fontSize option.
If I try with:

fontSize =12,

I get the following error:

Error in pairedPlot(glmmResult = fitted.model, geneName = gene, x1Label = "VisitCode",  : 
  formal argument "fontSize" matched by multiple actual arguments

pairedPlot error using GlmmSeq object with covariates

Thanks for your work on this package!

When using an object of class GlmmSeq in pairedPlot(), if additional covariates are included the model, the pairedPlot() function does not work due to the code shown below. The vignette shows an example with an interaction and a random effect ( ~ Timepoint * EULAR_6m + (1 | PATID)). However, if other covariates (e.g., gender) are in the model, the following error occurs.

Is there a reason why you've chosen to throw an error here (e.g. plotting difficulty, statistical messiness), and is there a workaround if other covariates are in the model?

relevant code: lines 133-136 of pairedPlots.R

if(ncol(glmmResult@modelData) != 2){
  stop(
    paste("These plots only work for interactions between two variable.",
          "Therefore nrow(glmmResult@modelData) should be 2."))

Thanks!

About turing off annotationPosition

Hi Kat,

I have a minor issue.
I don't know if you had a particular idea in mind when you set up annotationPosition to TRUE in the fcPlot (line 212, fcPlot.R).
If there's no specif reason, I would turn it off by default as I find it very annoying when I try to move gene labels and I move the arrow instead by mistake. This makes me loose the original position of the label and it's difficult to put it back to the original dot.
From the related plotly code:

annotationPosition determines if the main anchor of the annotation is editable. The main anchor corresponds to the text (if no arrow) or the arrow (which drags the whole thing leaving the arrow length & direction unchanged).

If you turn it off the label movement will still be assured by annotationText = TRUE and annotationTail = TRUE.

do I need to add age into the model?

Hi,
Thanks for developing this useful package. I am wondering about the age correction. Do I need to add it to the model? Or other confounders?

Best,

Add '...' to fcPlot

Hi Kat,

I'm working at a shiny app where I want to interactively show a pairedPlot when a gene is clicked on the fcPlot.
To do this I need to populate two more options of the plotly fcPlot: source and key. If would just add ... among your fcPlot options and then to your plotly call (line #195) this would be easily allowed.

Thanks!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.