Giter Site home page Giter Site logo

spatialpca's People

Contributors

shangll123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

spatialpca's Issues

Unable to create SpatialPCA object with SVG for Slide-Seq V2

Hi,

I am trying to run SpatialPCA on the Slide-Seq V2 dataset in the tutorial. I am unable to create the SpatialDataObject using the Spatially variable genes computed using Sparkx.

Code:

load("./Puck_200115_08_count_location.RData")
dim(countmat)
dim(location)

slideseqv2 = CreateSpatialPCAObject(counts=countmat, location=location, project = "SpatialPCA",gene.type="spatial",sparkversion="sparkx",
numCores_spark=10, gene.number=3000, customGenelist=NULL,min.loctions = 20, min.features=20)

Error:

Use SCTransform function in Seurat to normalize data.
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set, setting model to use fixed slope and exclude poisson genes.
Total Step 1 genes: 14348
Total overdispersed genes: 14118
Excluding 230 genes from Step 1 because they are not overdispersed.
Variance stabilizing transformation of count matrix of size 16235 by 51398
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
|==============================================================================================================| 100%
Setting estimate of 406 genes to inf as theta_mm/theta_mle < 1e-3
of step1 poisson genes (variance < mean): 0
of low mean genes (mean < 0.001): 1924
Total '# 'of Step1 poisson genes (theta=Inf; variance < mean): 470
Total '#'of poisson genes (theta=Inf; variance < mean): 2067
Calling offset model for all 2067 poisson genes
Found 568 outliers - those will be ignored in fitting/regularization step

Ignoring theta inf genes
Replacing fit params for 2067 poisson genes by theta=Inf
Setting min_variance based on median UMI: 0.04
Second step: Get residuals using fitted parameters for 16235 genes
|==============================================================================================================| 100%
Computing corrected count matrix for 16235 genes
|==============================================================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 1.487796 mins
Determine variable features
Centering data matrix
|==============================================================================================================| 100%
Getting residuals for block 1(of 11) for counts dataset
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'i' in selecting a method for function '[': object 'all.features' not found

Session Info:

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8
[6] LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

time zone: America/Chicago
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] patchwork_1.1.3 Seurat_4.9.9.9067 SeuratObject_4.9.9.9091 sp_2.1-0
[5] RSpectra_0.16-1 dplyr_1.1.3 ggplot2_3.4.3 BPCells_0.1.0
[9] SeuratDisk_0.0.0.9020 SpatialPCA_1.3.0

loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.21 splines_4.3.0 later_1.3.1 bitops_1.0-7
[5] tibble_3.2.1 polyclip_1.10-6 matlab_1.0.4 fastDummies_1.7.3
[9] lifecycle_1.0.3 doParallel_1.0.17 rprojroot_2.0.3 hdf5r_1.3.8
[13] globals_0.16.2 processx_3.8.2 lattice_0.21-9 MASS_7.3-60
[17] magrittr_2.0.3 plotly_4.10.2 remotes_2.4.2.1 httpuv_1.6.11
[21] glmGamPoi_1.12.2 sctransform_0.4.0 askpass_1.2.0 spam_2.9-1
[25] sessioninfo_1.2.2 pkgbuild_1.4.2 spatstat.sparse_3.0-2 reticulate_1.34.0
[29] cowplot_1.1.1 pbapply_1.7-2 RColorBrewer_1.1-3 abind_1.4-5
[33] pkgload_1.3.3 zlibbioc_1.46.0 Rtsne_0.16 GenomicRanges_1.52.0
[37] purrr_1.0.2 BiocGenerics_0.46.0 RCurl_1.98-1.12 pracma_2.4.2
[41] git2r_0.32.0 GenomeInfoDbData_1.2.10 IRanges_2.34.1 S4Vectors_0.38.2
[45] ggrepel_0.9.3 irlba_2.3.5.1 listenv_0.9.0 spatstat.utils_3.0-3
[49] umap_0.2.10.0 goftest_1.2-3 spatstat.random_3.1-6 fitdistrplus_1.1-11
[53] parallelly_1.36.0 DelayedMatrixStats_1.22.6 DelayedArray_0.26.7 leiden_0.4.3
[57] codetools_0.2-19 RcppRoll_0.3.0 tidyselect_1.2.0 farver_2.1.1
[61] matrixStats_1.0.0 stats4_4.3.0 spatstat.explore_3.2-3 jsonlite_1.8.7
[65] BiocNeighbors_1.18.0 ellipsis_0.3.2 progressr_0.14.0 ggridges_0.5.4
[69] survival_3.5-7 iterators_1.0.14 foreach_1.5.2 tools_4.3.0
[73] ica_1.0-3 Rcpp_1.0.11 glue_1.6.2 gridExtra_2.3
[77] MatrixGenerics_1.12.3 usethis_2.2.2 GenomeInfoDb_1.36.4 withr_2.5.1
[81] fastmap_1.1.1 bluster_1.10.0 pdist_1.2.1 fansi_1.0.4
[85] openssl_2.1.1 callr_3.7.3 digest_0.6.33 R6_2.5.1
[89] mime_0.12 colorspace_2.1-0 scattermore_1.2 tensor_1.5
[93] anndata_0.7.5.6 spatstat.data_3.0-1 utf8_1.2.3 tidyr_1.3.0
[97] generics_0.1.3 data.table_1.14.8 FNN_1.1.3.2 S4Arrays_1.2.0
[101] prettyunits_1.2.0 httr_1.4.7 htmlwidgets_1.6.2 uwot_0.1.16
[105] pkgconfig_2.0.3 gtable_0.3.4 lmtest_0.9-40 XVector_0.40.0
[109] htmltools_0.5.6 profvis_0.3.8 dotCall64_1.0-2 Biobase_2.60.0
[113] scales_1.2.1 png_0.1-8 rstudioapi_0.15.0 Signac_1.11.9000
[117] reshape2_1.4.4 nlme_3.1-163 curl_5.1.0 cachem_1.0.8
[121] zoo_1.8-12 stringr_1.5.0 KernSmooth_2.23-22 parallel_4.3.0
[125] miniUI_0.1.1.1 desc_1.4.2 pillar_1.9.0 grid_4.3.0
[129] vctrs_0.6.3 RANN_2.6.1 urlchecker_1.0.1 promises_1.2.1
[133] xtable_1.8-4 cluster_2.1.4 cli_3.6.1 compiler_4.3.0
[137] Rsamtools_2.16.0 rlang_1.1.1 crayon_1.5.2 future.apply_1.11.0
[141] labeling_0.4.3 ps_1.7.5 plyr_1.8.9 fs_1.6.3
[145] stringi_1.7.12 viridisLite_0.4.2 deldir_1.0-9 BiocParallel_1.34.2
[149] assertthat_0.2.1 munsell_0.5.0 Biostrings_2.68.1 lazyeval_0.2.2
[153] devtools_2.4.5 spatstat.geom_3.2-5 SPARK_1.1.1 CompQuadForm_1.4.3
[157] Matrix_1.6-1.1 RcppHNSW_0.5.0 sparseMatrixStats_1.12.2 bit64_4.0.5
[161] future_1.33.0 shiny_1.7.5 SummarizedExperiment_1.30.2 ROCR_1.0-11
[165] igraph_1.5.1 memoise_2.0.1 fastmatch_1.1-4 bit_4.0.5

Appreciate your help.

Thanks!

Error in t.default(PCvalues) : argument is not a matrix

Hi Lulu,

Thanks for developing such as great method! From my user experience, spatialPCA is very robust, efficient, and accurate.

When running spatialPCA on my data (with a large sample size), I followed the tutorial of slideseq. In clusterlabel = louvain_clustering(clusternum=8,latent_dat=object@SpatialPCs,knearest=round(sqrt(dim(object@SpatialPCs)[2])) ), I encountered an error Error in t.default(PCvalues):argument is not a matrix.

I went over the function and noticed that PCvalues, which is equivalent to object@SpatialPCs is not a matrix but a dgeMatrix.

info.spatial = as.data.frame(t(PCvalues))

Therefore, a potential workaround is to pass as.matrix(object@SpatialPCs) to the clustering function and problem will be solved.

clusterlabel = louvain_clustering(clusternum=8,latent_dat=as.matrix(object@SpatialPCs),knearest=round(sqrt(dim(object@SpatialPCs)[2])) )

In case other people also encounter this issue, I decided to post it here. Please feel free to close the issue as this is a simple resolution and does not affect the overall procedure.

Sincerely,
Wenjing

inputation

After filtering out certain genes and positions, the final clustering results in spatial transcriptomics data are missing some spots. For comparison using metrics like ARI (Adjusted Rand Index) or other operations, alignment between the clustering results and true labels is required. What methods can be used in the literature to impute missing spots in the clustering results? Looking forward to your assistance.

Error in x[seq_len(n)] : object of type 'S4' is not subsettable

Hi, I was creating SpatialPCA object using the raw count data and location coordinates.
I used the code "ST = CreateSpatialPCAObject(counts=rawcount, location=location, project = "SpatialPCA",gene.type="spatial",sparkversion="spark", gene.number=3000,customGenelist=NULL,min.loctions = 20, min.features=20)" and there is an error below.

Using top 3000 significant spatially variable genes.
Error in x[seq_len(n)] : object of type 'S4' is not subsettable
Calls: head -> head.default
Execution halted

If you know the reason, please reply.

Thank you.

imputation

After filtering out certain genes and positions, the final clustering results in spatial transcriptomics data are missing some spots. For comparison using metrics like ARI (Adjusted Rand Index) or other operations, alignment between the clustering results and true labels is required. What methods can be used in the literature to impute missing spots in the clustering results? Looking forward to your assistance.

Multiple sample SpatialPCA

Hello, I'm interested in performing SpatialPCA on multiple spatial transcriptomes. Can SpatialPCA be applied to multiple samples? If I understand correctly, it uses spatial location information, but when I have multiple samples, spatial locations may overlap.

Error in 'SpatialPCA_buildKernel'

Hi! I was using SpatialPCA to analyze spatial data, and I tried to create a spatial object using standard normalization method in Seurat instead of using SCTransform, I also chose HVG as 'gene.selection'. But when I ran SpatialPCA_buildKernel, it gave the error below:

Warning message in mclapply(1:dim(location)[1], fx_gaussian, mc.cores = ncores):
“scheduled cores 1, 2, 3 did not deliver results, all values of the jobs will be affected”
Error in vectbl_as_col_location2():
! Can't extract columns past the end.
Location 1 doesn't exist.
There are only 0 columns.

I would appreciate if you know the reason. Thanks!

unable to reproduce example the website

Hello

I am trying to reproduce the example from this page https://lulushang.org/SpatialPCA_Tutorial/DLPFC.html
I downloaded the data here https://drive.google.com/drive/folders/1mkXV3kQKqwxk42SW4Rb263FgFj2K8HhT?usp=sharing (as you indicated).

Please find below the error message I got

Please let me know what I need to modify.

best regards,
William


> library(SpatialPCA)
> library(ggplot2)
>
>
> sample_names=c("151507", "151508", "151509", "151510", "151669", "151670", "151671" ,"151672","151673", "151674" ,"151675" ,> i=9 # Here we take the 9th sample as example, in total there are 12 samples (numbered as 1-12), the user can test on other s> clusterNum=c(7,7,7,7,5,5,5,5,7,7,7,7) # each sample has different ground truth cluster number
>
>
> load( paste0("/home/wdenault/spatial_RNA_seq/data_spatial_PCA/DLPFC/LIBD_sample",i,".RData"))
> print(dim(count_sub)) # The count matrix
[1] 33538  3639
> print(dim(xy_coords)) # The x and y coordinates. We flipped the y axis for visualization.
[1] 3639    2
>
>
>
>
> # location matrix: n x 2, count matrix: g x n.
> # here n is spot number, g is gene number.
> xy_coords = as.matrix(xy_coords)
> rownames(xy_coords) = colnames(count_sub) # the rownames of location should match with the colnames of count matrix
> LIBD = CreateSpatialPCAObject(counts=count_sub,
+                               location=xy_coords,
+                               project = "SpatialPCA",
+                               gene.type="spatial",
+                               sparkversion="spark",
+                               numCores_spark=5,
+                               gene.number=3000,
+                               customGenelist=NULL,
+                               min.loctions = 20,
+                               min.features=20)
## Use SCTransform function in Seurat to normalize data.
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 15124 by 3639
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 3639 cells
Error: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.

Using spatial PCA to project new data

Hello :)

I found your work fascinating. That's why I want to use it on a very different problem which is to use your spatial PCA on encoded vectors which are the outputs of a neural network. Our number of points (i.e. n) is very large. So I applied your method to a subsample of our data set. Now I want to project the remaining element onto the latent space defined by W. One solution would be:
$Z_{proj} = W^{-1}(Y_{proj} - (XB)^T)$
Another solution would be to reuse equation 13 of your supplementary methods. What would be your solution?
Many thanks in advance!

SpatialPCA with Stereo-Seq data

Hello! I'm working with Stereo-seq data, specifically with the MOSTA dataset.
I want to know which parameters you would recommend for running this kind of dataset.
I have been using the parameters from Slide-seq V2, but I'm unsure if they are the most accurate.
Thanks!

Question about SpatialPCA_EstimateLoading

Hi!
Recently I‘ve read your paper and I am interensted in your work. But there is one point I did not understand in SpatialPCA_EstimateLoading.R
In SpatialPCA_EstimateLoading.R, from what I understand, W_est_here is used for calculate $\widehat{W}$. $\widehat{W}=LR$ , L is a m by d matrix for the first d eigenvectors of G_each.

W_est_here = eigs_sym(G_each, k=PCnum, which = "LM")$vectors-(-sum_det -(k*(n-q))/2*log(params$tr_YMY+F_funct_sameG(W_est_here,G_each)))

I do not understand what the code (-sum_det -(k*(n-q))/2*log(params$tr_YMY+F_funct_sameG(W_est_here,G_each))) is for.
Could you please answer this question for me?
Thank you in advance!

Running Spatial PCA on Gaussian data

Hello,

I am interested in running spatial PCA on some spatial factor model to assess its de-noising property.

The model is LF^T +E , where E is a Gaussian noise. I am running different experiment with different noise level. My understanding is that the current implement assumes a certain level of variance of each row (1 I suspect).

Do you have any suggestion on how to run spatial PCA on some Gaussian data. See an R example below. I would like to run spatial PCA on Z.

`x <-runif(1000)
y <-runif(1000)
X = cbind(x,y)
plot (x,y)
library(ggplot2)

set.seed(3)#problem fro set.seed(1)
f <- matrix(NA, nrow = 3, ncol =200)
for ( i in 1:ncol (f)){

t1<- sample (c(0,1), size=1)
t2<- sample (c(0,1), size=1)

f[1,i] <- t1rnorm(n=1)
f[2,i] <- t2
rnorm(n=1)

f[3,i] <- t2*rnorm(n=1)

}
L <- matrix(NA, ncol=3, nrow=length(x))

factor <- c()

for (i in 1:length(x)){

if ( (x[i] <.33 & y[i] <.33 )|(x[i] >.33 & y[i] >.33 & x[i] <.66 & y[i] <.66) | (x[i] >.66 & y[i] >.66 )){
L[i,] <- c(1,0,0)
factor=c(factor,1)
}else{
if ( (x[i] <.33 & y[i] >.66 )|(x[i] >.33 & y[i] <.33 & x[i] <.66 ) | (x[i] >.66 & y[i] >.33 & y[i] <.66)){
L[i,] <- c(0,1,0)
factor=c(factor,2)
}else{
L[i,] <- c(0,0,1)
factor=c(factor,3)
}
}

}

df = data.frame(x=x,y=y, Factor=as.factor(factor))

colors <- c("#D41159","#1A85FF","#40B0A6" )
P1 <- ggplot(df, aes(x,y, col=Factor))+geom_point(size=3)+

geom_hline(yintercept = 0.33)+
geom_hline(yintercept = 0.66)+
geom_vline(xintercept = 0.66)+
geom_vline(xintercept = 0.33)+
xlab("")+ylab("")+
scale_color_manual(values = colors)+
theme_minimal()+theme( axis.text.y=element_blank(),

                     axis.ticks.y=element_blank(),
                     axis.text.x=element_blank(),
                     axis.ticks.x=element_blank())

Z = L%%f + matrix(rnorm(nrow(L) ncol(f), sd=2), nrow = nrow(L))
`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.