thelovelab / deseq2 Goto Github PK

View Code? Open in Web Editor NEW

346.0 346.0 88.0 3.9 MB

Differential expression of RNA-seq data using the Negative Binomial

Home Page: https://bioconductor.org/packages/DESeq2

R 84.60% Mathematica 7.05% C++ 5.58% TeX 2.77%

deseq2's People

Contributors

Stargazers

Watchers

Forkers

wenzhenl vivekbhr kc-lan elmahyai nalcala agdenadel petershan1119 ardaamen duydn dagooten felixshiyong inambioinfo akmorrow13 yjingj rajaldebnath mayoneday giovancholico zachary-foster const-ae davan690 chrislou-bioinfo javrodriguez meijian federicomarini varnold16 weitingkang coke-zhang calpan stjordanis chenrx2123 hjames1 kewalinsamart hfyuanuq aloriot kimoyerr kwameforbes jlv100 gadams1959 lindazana elittmann ryo-nac zhaojk humaasif alaminzju ftucos liuchao-nj csoneson moritzschaefer lewis-scott-element rgiordan klausvigo lindberg-m biomystery mohanbabu29 procha2 shafiq-islam-cse rookiedata1 gsarfo-boateng wesleybarriosufv daenbol albertkuo alienzj nsrilaxmi samhuairen jafsia sz-yang j-81 zerack5 mcj0713 jiaguihua hasihays rimanb librachris areyesq89 pamelawanwan josephpascal s-a-nersisyan mmwsygmyr hxj5 linuxpham oeh77 mdbabumiamssm rebentisu ycl6 nine-sarayut frederikziebell shuostat timothyez

deseq2's Issues

Check that `length` assay exists when processing `tximeta` SE

When using a tximeta SE as the input to DESeqDataSet, even if the length assay has been manually removed, this code still tells the user that it's "using counts and average transcript lengths from tximeta" (and the check that all the values in the missing assay are >0 is not raising an error): https://github.com/mikelove/DESeq2/blob/1fce7c80e08d0231b0897580cec9fd9e3ead199c/R/AllClasses.R#L497-L504

While the rest of the downstream analysis goes through without errors, the message might be confusing. I guess there are two ways around this, not sure which one would be preferable:

if the length assay does not exist, use just the counts (i.e., add a check to the if statement)
if length should be there (based on the countsFromAbundance), stop if it is not available

Can DESeq2 deal with comparison between different species?

Dear Michael,

I want to compare the gene expression between different species, using orthogroup ID to connnect their gene ID. Can DESeq2 deal with that?

Yours,
Wei

Updating version introduced optim error with apeglm

Hello!

I recently updated to the most recent version of DESeq2 (and a number of other packages), and am suddenly getting an optim error that was not present in the same analysis before the update. The error is
Error in optimHess(par = init, fn = nbinomFn, gr = nbinomGr, x = x, y = y, : non-finite value supplied by optim

My session info is below. Any advice you may have would be greatly appreciated!

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.38.3               SummarizedExperiment_1.28.0 Biobase_2.58.0             
 [4] MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.2       
 [7] GenomeInfoDb_1.34.9         IRanges_2.32.0              S4Vectors_0.36.2           
[10] BiocGenerics_0.44.0        

loaded via a namespace (and not attached):
 [1] bdsmatrix_1.3-6        Rcpp_1.0.10            locfit_1.5-9.7         mvtnorm_1.1-3         
 [5] apeglm_1.20.0          lattice_0.20-45        png_0.1-8              Biostrings_2.66.0     
 [9] Rsubread_2.12.3        utf8_1.2.3             plyr_1.8.8             R6_2.5.1              
[13] emdbook_1.3.12         coda_0.19-4            RSQLite_2.3.0          httr_1.4.5            
[17] ggplot2_3.4.1          pillar_1.8.1           zlibbioc_1.44.0        rlang_1.0.6           
[21] rstudioapi_0.14        annotate_1.76.0        blob_1.2.3             Matrix_1.5-3          
[25] bbmle_1.0.25           BiocParallel_1.32.5    geneplotter_1.76.0     RCurl_1.98-1.10       
[29] bit_4.0.5              munsell_0.5.0          DelayedArray_0.24.0    numDeriv_2016.8-1.1   
[33] compiler_4.2.2         xfun_0.37              pkgconfig_2.0.3        tidyselect_1.2.0      
[37] KEGGREST_1.38.0        tibble_3.2.0           GenomeInfoDbData_1.2.9 codetools_0.2-19      
[41] XML_3.99-0.13          fansi_1.0.4            crayon_1.5.2           dplyr_1.1.0           
[45] MASS_7.3-58.3          bitops_1.0-7           grid_4.2.2             xtable_1.8-4          
[49] gtable_0.3.1           lifecycle_1.0.3        DBI_1.1.3              magrittr_2.0.3        
[53] scales_1.2.1           cli_3.6.0              cachem_1.0.7           XVector_0.38.0        
[57] vctrs_0.5.2            generics_0.1.3         RColorBrewer_1.1-3     tools_4.2.2           
[61] bit64_4.0.5            glue_1.6.2             parallel_4.2.2         fastmap_1.1.1         
[65] AnnotationDbi_1.60.0   colorspace_2.1-0       memoise_2.0.1          knitr_1.42

Different DESeq2 results (both FC and pvalue) by changing col/row orders in input

I asked this question in bioconductor support (https://support.bioconductor.org/p/9144947/#9144981). Not sure if it's properer to ask it here.

I'm getting different p-values and FCs from DESeq2 by simply changing the order of the columns in the count table. I can't understand why. I made sure the cols of count table match the rows of colData in both.

The differences with betaPrior=T are small ([-0.003, 0.003] for p-value, and [-0.002, 0.001] for FC). But if I do betaPrior=F and then use apeglm in lfcShrink to shrink the FC, the difference of log2FC can be up to 0.8 (and the baseMean for that gene is 300), which is concerning.

The difference of log2FoldChange before shrinking is much smaller: [-4.5e-05, 4.5e05]

Column Genotype in colData is the only tested variable in the model. It contains WT and Mutant as factors, with WT as the 1st level.

Actually if I change column Genotype to character type and rerun DESeq2 (so DESeq2 will convert them to factors with the warning message), I get the same results independent of the col/row orders in the input. So it seems the difference is somehow related to the factorization of the columns in colData, but I don't know why, and which results I should trust.

The input data (data.rds) can be downloaded here:
https://github.com/weishwu/test_data/blob/main/data.rds?raw=true

My code:

data <- readRDS('data.rds')

dds1 <- DESeqDataSetFromMatrix(countData = data[['counts']], colData = data[['colData']], design = ~ Genotype)
dds1 <- dds1[rowSums(counts(dds1)) > 1, ]
dds1 <- DESeq(dds1, betaPrior = FALSE)
res1 <- as.data.frame(lfcShrink(dds1, coef='Genotype_Mutant_vs_WT', type='apeglm'))

# only reorder the inputs, but the columns of counts still match the rows of colData:
dds2 <- DESeqDataSetFromMatrix(countData = data[['counts']][,c(4:6,1:3)], colData = data[['colData']][c(4:6,1:3),], design = ~ Genotype)
dds2 <- dds2[rowSums(counts(dds2)) > 1, ]
dds2 <- DESeq(dds2, betaPrior = FALSE)
res2 <- as.data.frame(lfcShrink(dds2, coef='Genotype_Mutant_vs_WT', type='apeglm'))

# difference of log2FC after shrinking
range(res1$log2FoldChange-res2$log2FoldChange)

# difference of log2FC before shrinking
range(results(dds1)$log2FoldChange-results(dds2)$log2FoldChange)

sessionInfo( )

Output:

[1] -0.82441013  0.01714274


[1] -4.515867e-05  4.509163e-05


R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /opt/conda/envs/WAT_diffex/lib/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] forcats_0.5.1               stringr_1.4.0              
 [3] dplyr_1.0.9                 purrr_0.3.4                
 [5] readr_2.1.2                 tidyr_1.2.0                
 [7] tibble_3.1.7                ggplot2_3.3.6              
 [9] tidyverse_1.3.1             DESeq2_1.34.0              
[11] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[13] MatrixGenerics_1.6.0        matrixStats_0.62.0         
[15] GenomicRanges_1.46.1        GenomeInfoDb_1.30.0        
[17] IRanges_2.28.0              S4Vectors_0.32.3           
[19] BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               lubridate_1.8.0       
 [4] bit64_4.0.5            RColorBrewer_1.1-3     httr_1.4.3            
 [7] numDeriv_2016.8-1.1    tools_4.1.3            backports_1.4.1       
[10] utf8_1.2.2             R6_2.5.1               DBI_1.1.2             
[13] colorspace_2.0-3       apeglm_1.16.0          withr_2.5.0           
[16] tidyselect_1.1.2       bit_4.0.4              compiler_4.1.3        
[19] cli_3.3.0              rvest_1.0.2            xml2_1.3.3            
[22] DelayedArray_0.20.0    scales_1.2.0           mvtnorm_1.1-3         
[25] genefilter_1.76.0      XVector_0.34.0         pkgconfig_2.0.3       
[28] bbmle_1.0.25           dbplyr_2.2.0           fastmap_1.1.0         
[31] rlang_1.0.2            readxl_1.4.0           rstudioapi_0.13       
[34] RSQLite_2.2.8          generics_0.1.2         jsonlite_1.8.0        
[37] BiocParallel_1.28.3    RCurl_1.98-1.7         magrittr_2.0.3        
[40] GenomeInfoDbData_1.2.7 Matrix_1.4-1           Rcpp_1.0.8.3          
[43] munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.1       
[46] stringi_1.7.6          MASS_7.3-57            zlibbioc_1.40.0       
[49] plyr_1.8.7             grid_4.1.3             blob_1.2.3            
[52] parallel_4.1.3         bdsmatrix_1.3-6        crayon_1.5.1          
[55] lattice_0.20-45        Biostrings_2.62.0      haven_2.5.0           
[58] splines_4.1.3          annotate_1.72.0        hms_1.1.1             
[61] KEGGREST_1.34.0        locfit_1.5-9.5         pillar_1.7.0          
[64] geneplotter_1.72.0     reprex_2.0.1           XML_3.99-0.9          
[67] glue_1.6.2             modelr_0.1.8           png_0.1-7             
[70] vctrs_0.4.1            tzdb_0.3.0             cellranger_1.1.0      
[73] gtable_0.3.0           assertthat_0.2.1       cachem_1.0.6          
[76] emdbook_1.3.12         xtable_1.8-4           broom_0.8.0           
[79] coda_0.19-4            survival_3.3-1         AnnotationDbi_1.56.1  
[82] memoise_2.0.1          ellipsis_0.3.2

How to plot PCA without background grid lines?

Hi,
I want to create a PCA plot without background grid lines.
I could not find an option in DESeq. Is there any way to do it

genewise dispersion estimate orders of magnitude too high

I find that for many genes in my data set, the gene-wise dispersion estimates are 3 orders of magnitude higher than expected. I set up a minimal example at https://github.com/frederikziebell/deseq2-bug. There it seems that the preliminary dispersion estimate is ok, but since the estimation did not converge, re-fitting with fitDispGrid intoduces the high dispersion estimate.

meta-analysis based on log2FC and P-value from DESeq2

Hi Mike,

Do you have any suggestion if I want to perform a meta-analysis based on log2FC and P-value from DESeq2?

Thanks

Shicheng

A weird error

Hi,

When I want to run results function on a DESeqDataset object I get this weird error:

Error in if (!expanded & (hasIntercept | noInterceptPullCoef)) { : 
  argument is of length zero

Do you have any idea why?

plotPCA Question

Hi there. I'm trying to get the loadings on the PCA I make with the plotPCA function after I run DESeq. I haven't been able to find anything about this process in the manuals or vignettes.

Result interpretation

Hi,
Thanks for developing this useful package.
I have a question about the results of DESeq2. Usually, I use q-value<=0.05 and |FC|>2 as the criteria to screen differentially expressed genes. If I want to identify genes with similar expression under different conditions, which criteria should I use? I was thinking of only using q-value>0.05. But there are also some genes with |FC|~1 and q-value<=0.05.
Thank you very much!

Unable to import data when using single factor

I have this dataset which I can not import when using a single variable for the model:

> se.sub
class: SummarizedExperiment 
dim: 59181 7 
metadata(0):
assays(1): counts
rownames(59181): ENSG00000000003 ENSG00000000419 ... ENSG00000288578
  ENSG00000288588
rowData names(3): ensembl_gene_id gene_name ensembl_id
colnames(7): BRBRUN1_OT_NPC017977945 BRBRUN1_OT_NPC017977946 ...
  BRBRUN1_OT_NPC017977944 BRBRUN1_OT_NPC017977947
colData names(15): library project ... clone replicate
> colData(se.sub)$genotype
# A tibble: 7 x 1
  genotype
  <fct>   
1 WT      
2 WT      
3 HT      
4 HT      
5 HT      
6 WT      
7 HT      
> table(colData(se.sub)$genotype)

WT HT 
 3  4 
> dds <- DESeqDataSet(se.sub, design = ~genotype)
Error in DESeqDataSet(se.sub, design = ~genotype) : 
  design has a single variable, with all samples having the same value.
  use instead a design of '~ 1'. estimateSizeFactors, rlog and the VST can then be used
>

I have narrowed down the snippet and there seems to be a bug here:

if (length(designVars) == 1) {
      var <- colData(se)[[designVars]]
      if (all(var == var[1])) {
        stop("design has a single variable, with all samples having the same value.
  use instead a design of '~ 1'. estimateSizeFactors, rlog and the VST can then be used")
      }
    }

Turns out that colData() returns a tibble data structure which is not compatible and all(var == var[1]) fails to do what it is supposed to:

> vars <- colData(se.sub)[['genotype']]
> vars
# A tibble: 7 x 1
  genotype
  <fct>   
1 WT      
2 WT      
3 HT      
4 HT      
5 HT      
6 WT      
7 HT  

## This is not correct
> vars == vars[1]
     genotype
[1,]     TRUE
[2,]     TRUE
[3,]     TRUE
[4,]     TRUE
[5,]     TRUE
[6,]     TRUE
[7,]     TRUE
>

If I cast the tibble to dataframe all is good:

vars <- as.data.frame(colData(se.sub))[['genotype']]
> vars
[1] WT WT HT HT HT WT HT
Levels: WT HT
> vars == vars[1]
[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.10.0
LAPACK: /usr/lib/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] car_3.0-11                  carData_3.0-4              
 [3] vsn_3.60.0                  ggplot2_3.3.5              
 [5] biomaRt_2.48.0              DESeq2_1.32.0              
 [7] SummarizedExperiment_1.22.0 Biobase_2.52.0             
 [9] MatrixGenerics_1.4.0        matrixStats_0.59.0         
[11] GenomicRanges_1.44.0        GenomeInfoDb_1.28.0        
[13] IRanges_2.26.0              S4Vectors_0.30.0           
[15] BiocGenerics_0.38.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           bit64_4.0.5            filelock_1.0.2        
 [4] RColorBrewer_1.1-2     progress_1.2.2         httr_1.4.2            
 [7] tools_4.1.2            utf8_1.2.1             R6_2.5.0              
[10] affyio_1.62.0          DBI_1.1.1              colorspace_2.0-1      
[13] withr_2.4.2            tidyselect_1.1.1       prettyunits_1.1.1     
[16] bit_4.0.4              curl_4.3.1             compiler_4.1.2        
[19] preprocessCore_1.54.0  cli_3.1.0              xml2_1.3.2            
[22] DelayedArray_0.18.0    labeling_0.4.2         scales_1.1.1          
[25] hexbin_1.28.2          genefilter_1.74.0      affy_1.70.0           
[28] rappdirs_0.3.3         stringr_1.4.0          digest_0.6.27         
[31] foreign_0.8-81         rio_0.5.27             XVector_0.32.0        
[34] pkgconfig_2.0.3        dbplyr_2.1.1           fastmap_1.1.0         
[37] limma_3.48.0           readxl_1.3.1           rlang_0.4.11          
[40] rstudioapi_0.13        RSQLite_2.2.7          generics_0.1.0        
[43] farver_2.1.0           BiocParallel_1.26.0    zip_2.2.0             
[46] dplyr_1.0.6            RCurl_1.98-1.3         magrittr_2.0.1        
[49] GenomeInfoDbData_1.2.6 Matrix_1.3-4           Rcpp_1.0.7            
[52] munsell_0.5.0          fansi_0.5.0            abind_1.4-5           
[55] lifecycle_1.0.0        stringi_1.7.6          zlibbioc_1.38.0       
[58] BiocFileCache_2.0.0    grid_4.1.2             blob_1.2.1            
[61] forcats_0.5.1          crayon_1.4.1           lattice_0.20-45       
[64] haven_2.4.1            Biostrings_2.60.1      splines_4.1.2         
[67] annotate_1.70.0        hms_1.1.0              KEGGREST_1.32.0       
[70] locfit_1.5-9.4         pillar_1.6.1           geneplotter_1.70.0    
[73] XML_3.99-0.6           glue_1.4.2             data.table_1.14.0     
[76] BiocManager_1.30.15    png_0.1-7              vctrs_0.3.8           
[79] cellranger_1.1.0       gtable_0.3.0           purrr_0.3.4           
[82] assertthat_0.2.1       cachem_1.0.5           openxlsx_4.2.4        
[85] xtable_1.8-4           survival_3.2-13        tibble_3.1.2          
[88] AnnotationDbi_1.54.0   memoise_2.0.0          ellipsis_0.3.2

MeanSD plot

Hey,

Since recently, I observed a problem with the meanSD plot function originating from the vsn package. My plot looks exactly like one linked to a very recent report of yours: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
which I found via the google images search. It was published 18h ago, although by now the plots in the html manual on the DESeq2 bioconductor site look okay again. How did you fix that issue?
Would greatly appreciate a tip.

Thank you in advance!

Failed to find a rowRanges() issue

Hello,

I have been working with DESeq2 package perfectly until I installed another package. Now when making a dds object, I get this error.

Error in MatrixGenerics:::.load_next_suggested_package_to_search(x) :
Failed to find a rowRanges() method for RangedSummarizedExperiment objects.

My R package version is: R 4.0.2 (2020-06-22)
Bioconductor version 3.12 (BiocManager 1.30.16)

I have updated my Bioconductor and all the dependent packages. However, it still doesn't work even with the example dataset.
I appreciate it if you could lead me through solving this issue
Thanks,
Shadi

plotCounts Scaling Definitions

In the function documentation is "normalized by size factor" but in the vignette "normalizes counts by sequencing depth". These two could mean the same but might not. I would like to see the function's documentation explicitly define what is meant by size factor. estimateSizeFactors has three different approaches to calculate the factors, so this could be more precisely defined.

maxit parameter cannot be changed in DESeq function

Hi,

When launching the DESeq function, I get the following warning:

25 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Nevertheless, there is no way to specify the maxit argument to the DESeq function so it can pass it along to nbinomWaldTest. Could this feature be added?

Thanks in advance!

Nicolas

Different DESeq2 results

Hello,

I performed patch-seq for 2 sets of neurons and then used DESeq2 to look for transcriptomic differences between the groups. One group consists of 7 neurons and the second group consists of 9 neurons.

Genes that meet a threshold criteria of L2FC of more than 1.5 and adjusted p-value less than 0.01 are considered as differentially expressed (DE) genes. I get a total of 123 DE genes for my dataset.

I notice that some genes which are visibly different are not picked as DE genes by DESeq2. A plausible reason for this is DESeq2 is treating the zero counts as dropouts. So, if I added a constant value to all the gene counts which get rid of the zeroes. Now the visibly different gene is picked as a DE gene. Also, adding the constant value of 1 to the gene counts now gives me 953 DE genes. What this means is that DESeq2 got misled by the zeroes and treated some gene counts as false dropouts.

An example of such a gene that has the following gene counts (gene counts for each neuron are separated by a comma within the group):

**Gene 1:-**
Group-1: 400,6,0,118,644,0,4738
Group-2: 0,34,0,0,0,0,0,0,0

The DESeq2 statistics for this gene are:
base mean = 399.68; L2FC = -7.84; lfcSE = 2.56; **adjusted p-value = 0.13**

If I add a constant value (=1) to all the gene counts, now the DESeq2 statistics are:
base mean = 370.11; L2FC = -7.44; lfcSE = 1.26; **adjusted p-value = 1.35e-06**

Notice the big difference for the p-values (in bold above) between the original counts and after adding 1 to the counts. Can someone please explain why this is happening? At what step in DESeq2, the zeroes in the gene count are misleading the conclusions?

DESeq2 version in R being used is v1.30.1.

Thanks,
Prajkta

DESeqDataSetFromMatrix error

Hi,

I am trying to create an object from a matrix file using the DESeqDataSetFromMatrix function. But getting the following error

Error in validObject(.Object) :
  invalid class "CompressedGRangesList" object: invalid object for slot "elementMetadata" in class "CompressedGRangesList": got class "S4", should be or extend class "DataFrame"

I thought it might be an issue related to weird data structure from my end, so I tested it with the example given in the function help but the result was the same

countData <- matrix(1:100,ncol=4)
condition <- factor(c("A","A","B","B"))
dds <- DESeqDataSetFromMatrix(countData, DataFrame(condition), ~ condition)  
Error in validObject(.Object) :
  invalid class "CompressedGRangesList" object: invalid object for slot "elementMetadata" in class "CompressedGRangesList": got class "S4", should be or extend class "DataFrame"

Here is my session info

R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /home/abhijit/bin/R-4.0.0/lib/libRblas.so
LAPACK: /home/abhijit/bin/R-4.0.0/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] DESeq2_1.38.0               SummarizedExperiment_1.20.0
 [3] Biobase_2.50.0              MatrixGenerics_1.2.1
 [5] matrixStats_0.63.0          GenomicRanges_1.42.0
 [7] GenomeInfoDb_1.26.7         IRanges_2.24.1
 [9] S4Vectors_0.36.2            BiocGenerics_0.44.0

loaded via a namespace (and not attached):
 [1] genefilter_1.72.1      locfit_1.5-9.4         tidyselect_1.1.2
 [4] purrr_0.3.4            splines_4.0.0          lattice_0.20-45
 [7] generics_0.1.3         colorspace_2.0-3       vctrs_0.4.1
[10] utf8_1.2.3             blob_1.2.3             XML_3.99-0.13
[13] survival_3.5-3         rlang_1.0.2            pillar_1.8.1
[16] glue_1.6.2             DBI_1.1.3              BiocParallel_1.24.1
[19] bit64_4.0.5            RColorBrewer_1.1-3     GenomeInfoDbData_1.2.4
[22] lifecycle_1.0.1        zlibbioc_1.36.0        munsell_0.5.0
[25] gtable_0.3.1           memoise_2.0.1          geneplotter_1.68.0
[28] fastmap_1.1.1          parallel_4.0.0         AnnotationDbi_1.52.0
[31] fansi_1.0.3            Rcpp_1.0.8.3           xtable_1.8-4
[34] scales_1.2.1           cachem_1.0.7           DelayedArray_0.24.0
[37] annotate_1.68.0        XVector_0.30.0         bit_4.0.5
[40] ggplot2_3.3.6          dplyr_1.0.7            grid_4.0.0
[43] cli_3.2.0              tools_4.0.0            bitops_1.0-7
[46] magrittr_2.0.3         RCurl_1.98-1.10        RSQLite_2.3.0
[49] tibble_3.1.6           pkgconfig_2.0.3        ellipsis_0.3.2
[52] Matrix_1.5-3           assertthat_0.2.1       httr_1.4.5
[55] R6_2.5.1               compiler_4.0.0

I also tested with the most latest version of DESeq2, but nothing changed.
Any help is much appreciated.

glmGamPoi fit fails if normMatrix is provided

Looking at the glmGamPoi test case:

dds <- makeExampleDESeqDataSet(n=100, m=8)
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition
dds <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi")

This also works when I call estimateSizeFactors and estimateDispersions manually:

dds <- makeExampleDESeqDataSet(n=100, m=8)
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds, fitType="glmGamPoi")
dds <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi")

However, when I supply the normMatrix argument the subsequent glmGamPoi fails:

dds <- makeExampleDESeqDataSet(n=100, m=8)
nm = matrix(2, nrow=nrow(dds), ncol=ncol(dds))
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition
dds <- estimateSizeFactors(dds, normMatrix=nm)
dds <- estimateDispersions(dds, fitType="glmGamPoi")
dds <- nbinomLRT(dds, reduced=~1, type="glmGamPoi")
# Error in combine_size_factors_and_offset(offset, size_factors, Y, verbose = verbose) :
#   is.numeric(size_factors) && (length(size_factors) == 1 || length(size_factors) ==  .... is not TRUE
#dds <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi") # same if used instead of nbinomLRT

Is it possible to use normMatrix with glmGamPoi?

The reason why I want to do that is that I'm looking at single-cell gene expression changes with chromosome aneuploidies.

(Filing this here instead of Bioc support because I think it's a bug.)

How to get DESeq2 results in large count matix files?

Dear All
I am working on a denovo transcriptome analysis of the sugar cane plant. The count matrix has a 1792064 number of transcripts. Now, the differential gene expression is running by the tools DESeq2. The total number of samples is 36 and it has 12 groups comparison. The differential gene expression has been running for more than 25 days to date. Some files are generated in the data folder

readCountRlogNorm.xls
rlog_Normalized_Values.txt
readCountNorm.xls

and the figure folder
SampleDistanceRlog.pdf
SampleCorrelationRlog.pdf
PCA.png

absPairVenn folder showing empty

Please check the library used in the script.
library(DESeq2)
library("BiocParallel")
register(MulticoreParam(100))

Command used
Rscript DEseq2_replicate.R counts.matrix SampleInfo_Replicates contrast

Please help me!!!

Add testing against a threshold for ashr shrinkage

Hi Mike,

I recently found out that it's easy to add testing against a fold-change in ashr and thought this could be added in lfcShrink. Here's an example:

library(airway)
data(airway)

keep <- rowSums(assay(airway) >= 5) >=3
dds <- DESeqDataSet(airway[keep,], design = ~ cell + dex)
dds <- DESeq(dds)

res <- results(dds, name = "dex_untrt_vs_trt")
betahat <- res$log2FoldChange
sebetahat <- res$lfcSE
fit <- ashr::ash(betahat, sebetahat, mixcompdist = "normal", method = "shrink")

lfcThreshold <- log2(1.5)
cdf_pos_lfc <- ashr::cdf_post(fit$fitted_g, lfcThreshold, ashr::set_data(betahat, sebetahat))
cdf_neg_lfc <- ashr::cdf_post(fit$fitted_g, -lfcThreshold, ashr::set_data(betahat, sebetahat))

res$log2FoldChange <- fit$result$PosteriorMean
res$lfcSE <- fit$result$PosteriorSD

res$lfsr <- ifelse(res$log2FoldChange > 0, cdf_pos_lfc, 1 - cdf_neg_lfc)
res <- res[order(res$lfsr),]
res$svalue <- dplyr::cummean(res$lfsr)
res$pvalue <- NULL
res$padj <- NULL

plotMA(lfcShrink(dds, coef="dex_untrt_vs_trt", type="ashr"))
plotMA(res)

Could you maybe add this functionality?

Best,
Frederik

Possible bug: results function calculates too large contrasts

svalue argument seems not work in lfcShrink

HI Mike,
The svalue argument seems not work in lfcShrink (always behave as svalue=T), when "lfcThreshold" is presented. Is this suppose to?

Thanks!

Both DESeq2 1.36 and 1.34

Inconsistent contrast names

Hi,

I am trying to use DESeq2 for single cell RNA data, and following DESeq2 Vignette recommendation to use LRT test and glmGamPoi. Yet I notice some inconsistencies in the naming of contrast while running the DESeq function with and without parallelization as shown below. To my understanding, no matter the statistical tests used and parallelization, the contrast terms should be similarly named, isn't it?

Another problem is that when I run without parallelization, the resulting logFC could not be shrinked with apeglm and when I inspect the each contrast result before shrinkage, all lfcSE value is NA. Again, I don't think parallelization should have had any effect here.

I am using DESeq2 1.31.18, GithubSHA1: a5941b3.

Would love to here your opinion on this.

Best regards,
Vu

Without parallelization

dds1 <- DESeq2::DESeq(dds1, test="LRT", 
                        reduced=as.formula(arg$options$reduced_formula),
                        minmu=1e-6, minRep=Inf, 
                        betaPrior = FALSE,
                        quiet=FALSE,
                        fitType='glmGamPoi')

Wth Parallelization

dds1 <- DESeq2::DESeq(dds1, test="LRT", 
                      reduced=as.formula(arg$options$reduced_formula),
                      minmu=1e-6, minRep=Inf, 
                      betaPrior = FALSE,
                      quiet=FALSE,
                      useT=TRUE, BPPARAM=param.opt, parallel=TRUE,
                      fitType='glmGamPoi')

Identical dds1 object but with test ='Wald'

`results` function fails for interaction designs (v1.20)

Hi Mike

I looked like a bug, therefore, I am reporting here. The results command fails for the designs with interactions, complaining that the betaPrior slot is not full in the dds object (which is always set to FALSE for interaction designs anyway).

R version : 3.5.0
bioc version : 3.6.0
DESeq2 version : 1.20

It works with version 1.18 on the same data

About normalized counts of DESeq2

Hi,

I want to select a control gene in Real-time qPCR analysis, such as beta-actin. But I found it is a multiple copy gene in my studies species, and I compared their TPM abundance, finding their expression level are not stable. Later I thought maybe TPM value is not proper to judge that, because TPM is more like a proportion of a gene in all expressed genes. And later I found the normalization of DESeq2 is different. The normalized count were obtained by using :

dds <- DESeq(dds)
normalized_counts <- counts(dds, normalized=TRUE)

I though this normalized method can be more correct to obtain a real expression level, while the counts of beta-actin are still not stable. Maybe it's a real situation, or maybe this method can not be applied to do that? Could you tell me the answer ?

Best,
Wei

Incorrect number of arguments when using DESeq with glmGamPoi

Hi. I ran into very strange issue every time I use DESeq() on my scRNA-Seq data. There is information of the package version in the screenshot below. I installed them all using the latest version on github via devtools.

Please let me know what should I do to overcome this problem. I'm using Windows 10. My colleague ran the same script on his mac and it worked just fine...

Rownames of HTSeq counts not recognized with DESeqDataSetFromHTSeqCount

edit: using version 1.24.0

I am working with data from a collaboration that includes HTSeq output in the following format, without an explicit rownames column:

	sample_x
ENSMUSG00000000001	288
ENSMUSG00000000003	0
ENSMUSG00000000028	4
ENSMUSG00000000031	0
ENSMUSG00000000037	0
ENSMUSG00000000049	4
ENSMUSG00000000056	40
ENSMUSG00000000058	944
ENSMUSG00000000078	480
ENSMUSG00000000085	80

When I run DESeqDataSetFromHTSeqCount(), I get the following error:

Error in DESeqDataSet(se, design = design, ignoreRank) :
all samples have 0 counts for all genes. check the counting script.

The sample table, design formula, and directory are all as expected. It seems that the issue lies in this line of code in DESeqDataSetFromHTSeqCount():

rownames(tbl) <- l[[1]]$V1

In my case there is no V1, because the rownames are already set for each file. After the following line, specialRows is set to an empty vector, since none of the rows are named:

specialRows <- (substr(rownames(tbl), 1, 1) == "_") | rownames(tbl) %in% 
    oldSpecialNames

and, as such, !specialRows also evaluates to an empty vector. The subset then removes all rows from tbl:

tbl <- tbl[!specialRows, , drop = FALSE]

The following checkpoint also fails because V1 is always NULL:

if (!all(sapply(l, function(a) all(a$V1 == l[[1]]$V1)))) 
    stop("Gene IDs (first column) differ between files.")

Would it be possible to check if V1 exists, and, if not, use the rownames of each item in l? It may be that this is a nonstandard output format, in which case I can obviously just edit the HTSeq files in batch.

Thanks in advance!

Potential problem with independent filtering

Hello,

I was routinely using DESeq2 for some small RNA-seq datasets, and suddenly I noticed that FDR corrections are too aggressive. After some investigation, I found out that independent filtering is wrongly determining the optimal threshold:

dds <- DESeqDataSetFromMatrix(countData = counts, colData = groups, design = ~ Group)
sizeFactors(dds) <- size_factors
dds <- DESeq(dds)
res <- results(dds)

plot(metadata(res)$filterNumRej, 
     type="b", ylab="number of rejections",
     xlab="quantiles of filter")
lines(metadata(res)$lo.fit, col="red")
abline(v=metadata(res)$filterTheta)

Is there any explanation of such a behavior, or I found a bug? Thanks!

sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
 [5] matrixStats_0.62.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.4         IRanges_2.30.1             
 [9] S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] KEGGREST_1.36.3        genefilter_1.78.0      locfit_1.5-9.6         tidyselect_1.2.0       splines_4.2.1         
 [6] lattice_0.20-45        generics_0.1.3         colorspace_2.0-3       vctrs_0.5.0            utf8_1.2.2            
[11] blob_1.2.3             XML_3.99-0.11          survival_3.4-0         rlang_1.0.6            pillar_1.8.1          
[16] glue_1.6.2             DBI_1.1.3              BiocParallel_1.30.4    bit64_4.0.5            RColorBrewer_1.1-3    
[21] GenomeInfoDbData_1.2.8 lifecycle_1.0.3        zlibbioc_1.42.0        Biostrings_2.64.1      munsell_0.5.0         
[26] gtable_0.3.1           codetools_0.2-18       memoise_2.0.1          geneplotter_1.74.0     fastmap_1.1.0         
[31] parallel_4.2.1         fansi_1.0.3            AnnotationDbi_1.58.0   Rcpp_1.0.9             xtable_1.8-4          
[36] scales_1.2.1           cachem_1.0.6           DelayedArray_0.22.0    annotate_1.74.0        XVector_0.36.0        
[41] bit_4.0.4              ggplot2_3.3.6          png_0.1-7              dplyr_1.0.10           grid_4.2.1            
[46] cli_3.4.1              tools_4.2.1            bitops_1.0-7           magrittr_2.0.3         RCurl_1.98-1.9        
[51] RSQLite_2.2.18         tibble_3.1.8           pkgconfig_2.0.3        crayon_1.5.2           Matrix_1.5-1          
[56] httr_1.4.4             R6_2.5.1               compiler_4.2.1

lfcshrink "lessAbs" returns "greaterAbs"

This contrasts with the expected behavior when one calls the results method, which lfcshrink is built onto of.

Expected:

Reality:

ashr removed from CRAN

Hi @mikelove , looks like you're already aware of this, but I see that ashr was removed from CRAN recently: https://cran.r-project.org/web/packages/ashr/index.html

I see some changes in DESeq2 that appear to reflect this, but I wanted to post in case other users hit this issue: 6a4d125

vst blind

I think this might be considered a documentation issue.

In the vst() description, it states:

This wrapper for the VST is not blind to the experimental design: the sample covariate information is used to estimate the global trend of genes’ dispersion values over the genes’ mean normalized count. It can be made strictly blind to experimental design by first assigning a design of ~1 before running this function

However, vst() has the blind parameter (TRUE by default) with the description:

whether to blind the transformation to the experimental design (see varianceStabilizingTransformation)

Those sound somewhat contradictory. If the function is not blind, then what is the purpose of the blind parameter?

Inconsistent naming of interaction terms in some cases

Hi Mike,

I noticed when doing spline fitting that for some designs the naming of the interaction terms is inconsistent. In this example

library("fission")
library("splines")
data("fission")

dds <- DESeqDataSet(fission, design=~1)
dds$minute <- as.double(dds$minute)
spline_basis <- ns(dds$minute, df=3)
colnames(spline_basis) <- paste0("spline",colnames(spline_basis))
colData(dds) <- cbind(colData(dds), spline_basis)

design(dds) <- ~spline1:strain + spline2:strain + spline3:strain
dds <- DESeq(dds, test="LRT", reduced=~spline1 + spline2 + spline3, parallel=T)
colnames(mcols(dds))

the interaction terms are named spline1.strainwt, spline1.strainmut, strainwt.spline2, strainmut.spline2, strainwt.spline3, and strainmut.spline3.

I know that this is not the best full design to interpret, but it's the simplest case I could find. In contrast, chainging the design to ~spline1 + spline2 + spline3 + spline1:strain + spline2:strain + spline3:strain gives consistent names.

issue with installing the package on Apple M1 Max

Hi!
I get the following error when installing DESeq2 on my mac (M1 Max):

ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.6.0/12.0.1'
ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [DESeq2.so] Error 1
ERROR: compilation failed for package ‘DESeq2’

removing ‘/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/DESeq2’
Warning message:
In i.p(...) :
installation of package ‘/var/folders/3j/r82js80x57zcctgt9fnzs9fc0000gp/T//RtmpB1ivha/filed39748a058c/DESeq2_1.39.8.tar.gz’ had non-zero exit status

library(DESeq2)
Error in library(DESeq2) : there is no package called ‘DESeq2’

Thanks!
Sophie

DESeq2 Ignore Some Significant Differential Genes due to Extreme Count Outlier

or example, the read count of gene Z in condition A is 0, 0, 0, 123 and in condition B is 13000,13500,12500,14000. It is obvious that gene Z is significantly differential express between condition A and condition B, however, sometimes, DESeq2 set the pvalue to NA because of the outlier in condition A.

I suggest that before filtering the outliers, firstly testing whether the maximum value in condition A is far less than the minimum value in condition B. To do this, DESeq2 can reduce the false negative rate.

Please read this before submitting an issue or PR

If you have a question about usage of the software, do not post here, instead post to the Bioconductor support site:

https://support.bioconductor.org

This GitHub repo is to facilitate browsing of the code, and pull requests. For non-minor pull requests, it's a very good idea to first post to the support site link above, where we can discuss beforehand. I probably will not accept major PR that haven't been discussed beforehand, because I would need to have time to ensure that it won't cause unintended consequences on the user base.

row-subsetting gives error if rowData contains GRanges column names

Hi Mike,

here's an example of the problem

se <- SummarizedExperiment(assays = matrix(1:4, ncol = 2), rowData = DataFrame(start = c(1,1)))
dds <- DESeqDataSet(se, design = ~1)
validObject(dds)
# TRUE
dds[1,]

which returns Error in validObject(.Object) : invalid class "CompressedGRangesList" object: 'mcols(x)' cannot have columns named "seqnames", "ranges", "strand", "start", "end", "width", or "element".

My use-case is that I don't want to bother with genomic ranges and just store genomic information (chromosome, start and end position, biotype) in the rowData slot.

I think it would be best if arbitriary rowData columns are allowed if the DESeqData set is created from a non-RangedSummarizedExperiment or at least throw an error already during object creation, that some 'mcols'-names are not allowed.

Different result

Hi,
One read with same count number in two group but the contrast with different log.
The normal result is like that (A2_A4 A5 belong to BW group, N8 N7 belong to N group), the log value of line 1 is -7.98641"

"
But the other file (only some read and count are same) with differnet result like that, the log value of line 1 (same read and counts) is 3.86641 "

"
Could you please tell me is there anything wrong?
Thanks in advance.
Best wishes.
Liu

Archive 'many samples' tweetorial

Hi Mike,

since it's not clear atm how long Twitter will be around, maybe it's good to archive the really helpful tweetorial you mention in the experiments with many samples section in the DESeq2 vignette. I've made a standalone html page of it here. Feel free to copy it to this repo.

Best,
Frederik

DESeqDataSet() fails when the count matrix contains a value higher than .Machine$integer.max

Hi Mike,

when the count matrix contains a value larger than .Machine$integer.max, DESeqDataSet() fails because NA values are introduced when converting the matrix to integer mode. Seeing such a high count is not an unrealistc scenario (anymore), since I found the error by analyzing a recount3 data set:

library("recount3")
library("DESeq2")
library("tidyverse")

rse <- available_projects() %>%
  filter(project == "SRP074739") %>% 
  create_rse()

dds <- DESeqDataSet(rse, design = ~1)
# Warning: NAs introduced by coercion to integer rangeError in validObject(.Object) : 
#  invalid class “DESeqDataSet” object: NA values are not allowed in the count matrix

and just to show that it is related to the integer-mode conversion, we can run

sum(assay(rse) > .Machine$integer.max)
# 10

x <- c(.Machine$integer.max, .Machine$integer.max + 1)
mode(x) <- "integer"
x
# Warning: NAs introduced by coercion to integer range
# [1] 2147483647    NA

The "trouble-causing" counts are exceptionally high for this data set

assay(rse) %>% 
  as.vector()  %>%  
  qplot(bins = 100) + 
  scale_x_log10(breaks = 10^(0:10)) +
  scale_y_log10() +
  labs(x = "count", y = "# of observations") +
  geom_vline(xintercept = .Machine$integer.max, linetype = "dashed")

but the issue could be present for other data as well.

Parallel processing broken in R4.2

Hi Mike,

DESeq(makeExampleDESeqDataSet(), parallel=T) gives an error

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'args' in selecting a method for function 'do.call': $ operator is invalid for atomic vectors

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.1               stringr_1.4.0               dplyr_1.0.9                 purrr_0.3.4                 readr_2.1.2                 tidyr_1.2.0                 tibble_3.1.7                ggplot2_3.3.6              
 [9] tidyverse_1.3.1             RNAseqQC_0.1.2              DESeq2_1.36.0               SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.0        matrixStats_0.62.0          GenomicRanges_1.48.0       
[17] GenomeInfoDb_1.32.1         IRanges_2.30.0              S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] fs_1.5.2               bitops_1.0-7           lubridate_1.8.0        bit64_4.0.5            RColorBrewer_1.1-3     httr_1.4.3             tools_4.2.0            backports_1.4.1        utf8_1.2.2            
[10] R6_2.5.1               DBI_1.1.2              colorspace_2.0-3       withr_2.5.0            tictoc_1.0.1           tidyselect_1.1.2       bit_4.0.4              compiler_4.2.0         rvest_1.0.2           
[19] cli_3.3.0              xml2_1.3.3             DelayedArray_0.22.0    scales_1.2.0           genefilter_1.78.0      digest_0.6.29          rmarkdown_2.14         XVector_0.36.0         pkgconfig_2.0.3       
[28] htmltools_0.5.2        dbplyr_2.1.1           fastmap_1.1.0          rlang_1.0.2            readxl_1.4.0           rstudioapi_0.13        RSQLite_2.2.13         generics_0.1.2         jsonlite_1.8.0        
[37] BiocParallel_1.30.0    RCurl_1.98-1.6         magrittr_2.0.3         GenomeInfoDbData_1.2.8 Matrix_1.4-1           Rcpp_1.0.8.3           munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.1       
[46] stringi_1.7.6          yaml_2.3.5             zlibbioc_1.42.0        grid_4.2.0             blob_1.2.3             parallel_4.2.0         crayon_1.5.1           lattice_0.20-45        Biostrings_2.64.0     
[55] haven_2.5.0            cowplot_1.1.1          splines_4.2.0          annotate_1.74.0        hms_1.1.1              KEGGREST_1.36.0        locfit_1.5-9.5         knitr_1.39             pillar_1.7.0          
[64] geneplotter_1.74.0     reprex_2.0.1           XML_3.99-0.9           glue_1.6.2             evaluate_0.15          BiocManager_1.30.17    modelr_0.1.8           png_0.1-7              vctrs_0.4.1           
[73] tzdb_0.3.0             cellranger_1.1.0       gtable_0.3.0           assertthat_0.2.1       cachem_1.0.6           xfun_0.30              xtable_1.8-4           broom_0.8.0            survival_3.3-1        
[82] AnnotationDbi_1.58.0   memoise_2.0.1          ellipsis_0.3.2

Best,
Frederik

DESeq2 error on analyzing microbiome data

Hi Dr. Love,

I am trying to do a differential abundance analysis with microbiome sequencing data suing DESeq2 package, but I keep getting errors after trying different approaches within the package. The data is a 50 by 501 matrix with each row being a sample and the first column being the group indicator and the other 500 columns are sequencing reads for 500 taxa. No missing data or zero-valued sequencing reads in the data set (see attached data set
data.xlsx
). Below is the R script for the analysis. Please help. Thanks a lot!

class(data)
[1] "data.frame"
countsData=t(data[,-1])+1
rownames(countsData)=colnames(data)[-1]
class(countsData)
[1] "matrix"
print(dim(countsData))
[1] 500 50
print(countsData[1:5,1:5]
[,1] [,2] [,3] [,4] [,5]
rawCount1 8 8 10 11 9
rawCount2 8 9 8 7 8
rawCount3 36 34 30 36 33
rawCount4 7 8 6 5 8
rawCount5 10 7 63 13 64

get the binary predictor data and annotate the predictor data

xData=as.data.frame(data[,1])
colnames(xData)=colnames(data)[1]
xData[,"x"]=as.factor(xData[,"x"])
class(xData)
[1] "data.frame"
print(dim(xData))
[1] 50 1
print(xData[1:5,])
[1] 0 0 1 0 1
Levels: 0 1

prepare for analysis

dds <- DESeqDataSetFromMatrix(countData=countsData,colData=xData,design=~x)

run analysis

suppressWarnings(runDeseq<- try(DESeq(dds, quiet = TRUE), silent = TRUE))
if (inherits(runDeseq, "try-error")) {

If the parametric fit failed, try the local.

suppressWarnings(runDeseq<- try(DESeq(dds, fitType = "local", quiet = TRUE),
silent = TRUE))
if (inherits(runDeseq, "try-error")) {

If local fails, try the mean

suppressWarnings(runDeseq<- try(DESeq(dds, fitType = "mean", quiet = TRUE),
silent = TRUE))
}
if (inherits(runDeseq, "try-error")) {
# If still bad, quit with error.
stop("DESeq1 fail")
}
}

When I dig into the error message, it says "simpleError in estimateDispersionsFit(object, fitType = fitType, quiet = quiet): all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT".

So I tried

dds <- DESeqDataSetFromMatrix(countData=countsData,colData=xData,design=~x)
dds <- estimateDispersionsGeneEst(dds)

but it gave me another error message: "Error in .local(object, ...) : first calculate size factors, add normalizationFactors, or set normalized=FALSE".

Here is some session info:

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

packageVersion("DESeq2")
[1] ‘1.24.0’

strategy to handle outliers in DESeq2 version 1.10 and 1.30

Dear Mike,

I want to confirm whether version 1.10 and 1.30 applied different strategy to handle outliers expression point? I find my result from version 1.10 can not be repeated with 1.30 which maybe caused by different strategy to handle outliers. Just want to confirm with you.

Thanks

Shicheng

weired dispersion and mean

Hi,
I am using DESeq2 to process my RNAseq data, it looks like there are something wrong with my data, here is the
Is the diagnostic plot right ? what does "mean of normalized counts" ? I have read the DESeq2 paper, however I don't quite understand it, and what kind of diagnostic plot is good ?

DESeq(fitType=)

(double posted by accident, sorry for the noise)

Error with SummarizedExperiment 1.17.2

Hi, the development version of SummarizedExperiment (1.17.2, 20-02-14) is causing a new error (below). I'm on Bioc 3.11 (and R devel 2020-01-10 r77651) and DESeq2 1.27.24. Rolling back to SummarizedExperiment 1.16.1 (latest official release) fixed the problem.

error: (unknown)
please use 'assay(x, withDimnames=FALSE)) <- value' or 'assays(x,
  withDimnames=FALSE)) <- value' when the dimnames on the supplied
  assay(s) are not identical to the dimnames on DESeqDataSet object 'x'

Backtrace:
  ...
  7. DESeq2::estimateDispersions(...)
  8. DESeq2:::.local(object, ...)
  9. DESeq2::estimateDispersionsGeneEst(...)
 11. SummarizedExperiment::`assays<-`(`*tmp*`, value = `*vtmp*`)

DESeq2 building with error in Bioc 3.11

The first version of DESeq2 in the new devel branch wasn't building (as of 11/1/2019 for example):

http://bioconductor.org/checkResults/devel/bioc-LATEST/DESeq2/

This was due to a dependency (Suggests) on IHW which depends in turn on lpsymphony. I've just pushed a new version (1.27.1) of DESeq2 that removes the IHW Suggests to get a successful build of DESeq2. IHW is only needed to one demonstration chunk in the vignette, which is now switched to unevaluated.

recover from failed zinbwave weights

Hi Mike,

I have a dataset where a single gene fails the weight check in getAndCheckWeights when using zinbwave. I just removed it, but would you take a pull request where it automatically removes those genes from the object and moves on, rather than failing?

Normalization of RNA seq data before DESeq2

i have a dataset having 4 rna seq healthy tissue samples prepared with unstranded Illumina library and another dataset with 8 rna seq healthy tissue samples prepared with reverse stranded Illumina library. I wish to combine these 2 datasets together and run DGE with a third dataset having 100 tumor tissue samples. But DESEQ2 takes in raw counts as input. Can I normalize each dataset seaprately and then merge them together to run DESeq2?

Why DESeq2 do not support the decimals for count matrix

I prepare a count matrix from the salmon results, but DESeq2 do not support this. While the documentation of tximport provides poor explainations for many steps and parameters.

lfcShrink error

Hi, I'm following the DESeq manual. In the section of lfcShrink, I used de command:

resLFC <- lfcShrink(dds, coef= "condition_carcinoma_vs_normal", type="apeglm")

but the result was the next error

Error in optimHess(par = init, fn = nbinomFn, gr = nbinomGr, x = x, y = y,  : 
  non-finite value supplied by optim

thelovelab / deseq2 Goto Github PK

deseq2's People

Contributors

Stargazers

Watchers

Forkers

deseq2's Issues

get the binary predictor data and annotate the predictor data

prepare for analysis

run analysis

If the parametric fit failed, try the local.

If local fails, try the mean

Recommend Projects

Recommend Topics

Recommend Org