cahanlab / singlecellnet Goto Github PK

View Code? Open in Web Editor NEW

128.0 128.0 27.0 43.05 MB

SingleCellNet: classify single cells across species and platforms

License: MIT License

R 100.00%

singlecellnet's People

Contributors

Stargazers

Watchers

singlecellnet's Issues

corplot_sub layer display selection

need to incorporate the parameter to allow the plot to be displayed at different level as desired, not just the top one.

Vignette Error: differing number of rows

Hello,

I am trying to work through the vignette using the example data provided, and am running into an error during the "Classification Annotation assignment" step:

stPark <- get_cate(classRes = crParkall, sampTab = stPark, dLevel = "description1", sid = "sample_name", nrand = nqRand)

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 43745, 43795

stop(gettextf("arguments imply differing number of rows: %s", paste(unique(nrows), collapse = ", ")), domain = NA)
data.frame(..., check.names = FALSE)
cbind(deparse.level, ...)
cbind(sampTab, category = topCats, scn_score = topCat_score)
get_cate(classRes = crParkall, sampTab = stPark, dLevel = "description1", sid = "sample_name", nrand = nqRand)

sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] singleCellNet_0.1.0 cowplot_1.0.0 reshape2_1.4.4
[4] pheatmap_1.0.12 dplyr_0.8.5 ggplot2_3.3.0

loaded via a namespace (and not attached):
[1] backports_1.1.6 sn_1.6-1 plyr_1.8.6
[4] igraph_1.2.5 lazyeval_0.2.2 splines_3.6.2
[7] listenv_0.8.0 usethis_1.6.0 TH.data_1.0-10
[10] digest_0.6.25 htmltools_0.4.0 gdata_2.18.0
[13] fansi_0.4.1 magrittr_1.5 memoise_1.1.0
[16] cluster_2.1.0 ROCR_1.0-7 remotes_2.1.1
[19] globals_0.12.5 sandwich_2.5-1 prettyunits_1.1.1
[22] colorspace_1.4-1 ggrepel_0.8.2 xfun_0.13
[25] callr_3.4.3 crayon_1.3.4 jsonlite_1.6.1
[28] survival_3.1-12 zoo_1.8-7 ape_5.3
[31] glue_1.4.0 gtable_0.3.0 leiden_0.3.3
[34] pkgbuild_1.0.6 future.apply_1.4.0 BiocGenerics_0.30.0
[37] scales_1.1.0 mvtnorm_1.1-0 bibtex_0.4.2.2
[40] Rcpp_1.0.4 metap_1.3 plotrix_3.7-7
[43] viridisLite_0.3.0 reticulate_1.15 rsvd_1.0.3
[46] stats4_3.6.2 htmlwidgets_1.5.1 httr_1.4.1
[49] gplots_3.0.3 RColorBrewer_1.1-2 TFisher_0.2.0
[52] ellipsis_0.3.0 ica_1.0-2 pkgconfig_2.0.3
[55] farver_2.0.3 tidyselect_1.0.0 labeling_0.3
[58] rlang_0.4.5 munsell_0.5.0 tools_3.6.2
[61] cli_2.0.2 devtools_2.3.0 ggridges_0.5.2
[64] stringr_1.4.0 yaml_2.2.1 npsurv_0.4-0
[67] processx_3.4.2 knitr_1.28 fs_1.4.1
[70] fitdistrplus_1.0-14 caTools_1.18.0 purrr_0.3.3
[73] randomForest_4.6-14 RANN_2.6.1 pbapply_1.4-2
[76] future_1.16.0 nlme_3.1-147 compiler_3.6.2
[79] rstudioapi_0.11 plotly_4.9.2.1 curl_4.3
[82] png_0.1-7 testthat_2.3.2 lsei_1.2-0
[85] tibble_3.0.0 DescTools_0.99.34 stringi_1.4.6
[88] ps_1.3.2 desc_1.2.0 lattice_0.20-41
[91] Matrix_1.2-18 multtest_2.40.0 vctrs_0.2.4
[94] mutoss_0.1-12 pillar_1.4.3 lifecycle_0.2.0
[97] Rdpack_0.11-1 lmtest_0.9-37 RcppAnnoy_0.0.16
[100] data.table_1.12.8 bitops_1.0-6 irlba_2.3.3
[103] gbRd_0.4-11 patchwork_1.0.0 R6_2.4.1
[106] KernSmooth_2.23-16 sessioninfo_1.1.1 codetools_0.2-16
[109] boot_1.3-24 MASS_7.3-51.5 gtools_3.8.2
[112] assertthat_0.2.1 pkgload_1.0.2 rprojroot_1.3-2
[115] withr_2.1.2 mnormt_1.5-6 multcomp_1.4-13
[118] expm_0.999-4 parallel_3.6.2 grid_3.6.2
[121] tidyr_1.0.2 Rtsne_0.15 numDeriv_2016.8-1.1
[124] Biobase_2.44.0

Any assistance would be greatly appreciated!
Thanks,

Julia

Functions to add to singleCellNet Namespace

sc_testPattern
sc_sampR_to_pattern
getTopGenes

Singlecellnet introduces "rand" as new cells in the prediction output

Hello,

I have been reviewing some of the previous issues on the "rand" annotation category specifically in #20 and #28.

However, I also noticed that upon prediction for my query dataset using the RF model, singlecellnet introduced 50 new columns in the same location where my cells barcodes are held, each being labelled "rand#" i.e. rand1, rand2 etc.. This caused my prediction output data frame to contain more "cells" (6326) than actual cells in my Seurat object (6276), and required removal prior to merging.

Is this expected behaviour, and if so, could you explain why the package is introducing "new" cells into the prediction? I understand why an individual cell would be annotated as "rand" in certain circumstances, but I do not understand why, in the output prediction frame, new columns are introduced that resemble cells that are named "rand".

extractSCE object "exp_type" not found

Hi, in the extractSCE function, I saw
@param sce_object
@param exp_type
However, the actual param are sce_object and exp_slot_name, whereas the code calls exp_type
as a result, when I run extractSCE, I always get the error "object exp_type not found"

According to the @param it does seem to me that exp_type is a parameter and not a value defined elsewhere
May I know whether this is a bug and whether it can be fixed?

Thanks in advance

Error in combn: n < m

Hi singleCellNet devs. I am trying out this tool and receiving the following error:

Error in combn(1, 2) : n < m

It seems to happen whenever one of the cluster labels is the empty string or NA. You may want to add an error message for this case. Here's an example.

singleCellNet::scn_train(
  stTrain = data.frame(cluster = c("", "A", "B"), id = 1:300),
  dLevel = "cluster", colName_samp = "id",
  expTrain = matrix(rnorm(26*300), ncol = 300) %>% 
    set_rownames(LETTERS) %>%
    set_colnames(1:300) 
)

Specify Clusters to show in the correlation heatmap

Reducing memory requirements

Hi SCN devs! I am running SCN and I keep running out of RAM. The training set has 32k genes, 5464 cells, and 60 cell types (so 30*59 = 1770 pairs). I'm using the parameters below. Is this problem something you would expect, and do you have any advice? I have already downsampled from a much larger number of cells, but maybe the issue is the high number of cell types?

Thank you!

nTopGenes = 10
nRand = 0
nTrees = 1000
nTopGenePairs = 25

Issues for Installation

Hi SingleCellNet Team,

When I try to install the package, I got the following error and not sure what to deal with:
Error in i.p(...) :
(converted from warning) installation of package ‘C:/Users/~1/AppData/Local/Temp/Rtmp0S7yMI/file1128ee055cf/singleCellNet_0.1.0.tar.gz’ had non-zero exit status

Would you guys have any idea what may this happened and how to fix this? Really appreciate all your help.

Best

Cell type with less than 100 cells

Hi everyone,

I'm trying to use SCN to build a cross-species classifier. The issue is that for my dataset, there are few rare cell types with cell number less than 100 (some of them are even less than 10). I'm aware that I could change ncells in splitcommon function. However, I'm not sure if I reduce ncells, will it cause any underfit effect for the trained classifier? Does anyone also encounter such issue? Would you mind sharing the way how you guys tackle it?

Thanks!

Easily heatmap top x genes across all clusters

'mc.cores' > 1 is not supported on Windows

Hi SingleCellNet Team,

When I try to find best pairs and transform query data, and train classifier using my windows computer. It gives me the following error:

Error in mclapply(myPatternG, sc_testPattern, expDat = tmpPdat, mc.cores = mcCores) : 'mc.cores' > 1 is not supported on Windows

Is there anyway to bypass the case of mc.cores>1 ?

Thanks

Interoperability with AnnData or Loom

Hi,

Do you recommend a way to integrate this method with h5ad or loom objects?

For instance, in an h5ad/loom file I would have my UMAP clusters stored in the object, linked to the cells within that clusters and their expression values. To be able to use SCN, would I need to export the expression matrix from each cluster to serve as input to SCN? Or is there a better way?

Thanks in advance!

Forcing all cells to be assigned?

Hi all,

I was wondering if it's possible to classify cells such that all cells must be assigned a cell-type label (i.e. with no possibility of a cell falling into an "other"/"unassigned" category?) If so, how can I specify this in the classification function?

Thanks so much!

warnings and error related to scn_train: "In sqrt(as.numeric(llfit$summary[, 2])) : NaNs produced"

Hi! Could you help me with this? I am using singleCellNet to train a classifier using a single cell dataset with 15729 genes, 9 categories, and 3600 cells (400 cells per category).

dim(expTrain)
[1] 15729  3600

I wonder why I received this warning when running scn_train :

class_info <- scn_train(stTrain = stTrain, expTrain = expTrain, nTopGenes = 50, nRand = 400, nTrees = 1000, nTopGenePairs = 125, dLevel = "type", colName_samp = "cell")
Warning messages:
1: In sqrt(as.numeric(llfit$summary[, 2])) : NaNs produced
2: In sqrt(as.numeric(llfit$summary[, 2])) : NaNs produced
3: In sqrt(as.numeric(llfit$summary[, 2])) : NaNs produced
4: In sqrt(as.numeric(llfit$summary[, 2])) : NaNs produced
5: In sqrt(as.numeric(llfit$summary[, 2])) : NaNs produced
6: In sqrt(as.numeric(llfit$summary[, 2])) : NaNs produced

Also, when I set nTopGenes = 50, and nTopGenePairs =150, I received this error at the same time when running scn_train :

Error in if ((countList[tgp[1]] < maxPer) & (countList[tgp[2]] < maxPer)) { : 
  missing value where TRUE/FALSE needed

The error seems to be related to nTopGenePairs, as when I set it below 125, there is no error. The warning is always present.
Thank you for your help in advance!

An error in scn_train: new columns would leave holes after existing columns

Hello!
I'm using given MWS data (mouse atlas microwell-seq) for training.
However, while running the scn_train function, I got an error like below.

Error in [<-.data.frame(*tmp*, Matrix::which(tmpAns < dThresh), value = 0) :
new columns would leave holes after existing columns
Calls: system.time -> scn_train -> trans_prop -> [<- -> [<-.data.frame

Could you check this problem?
Thank you!

somesingleCellNet

How to increase font size in sc_violinClass and other SCN plots?

Hello SCN Team

Could you tell me how to increase font size in the SCN output images e.g. sc_violinClass as I need them for submitting to manuscripts. current plots have small fonts as the celltypes increase in training model

Thanks

-Biju

add top enr genes to the side of the correlation heatmap

cor_plot

Show top-most nodes on right of correlation heatmap (and associated genes).

Error in makeNode... object 'Node' not found

Maybe class definition missing? This line in makeNode() is where it breaks.

tmpNode<-Node$new(nodeName)

Other Training datasets and form of import dataset

Dear all,
Are there any other comprehensive mouse training datasets that you might be aware of?
Also, so as far as I understood any sc dataset that has been analysed can be imported as a training set and I just need the expression and the cluster annotation right?
Also I am wondering should my inquiry expression be raw data or is it better to have a normalized, scaled and in general a processed data?

Thank you!

rand cluster

Dear developer,
I have been trying the singlecellnet data on my data, mostly mouse immune cells, which I have processed using scanpy and then converted to a loom file! I have around 12k common genes with two different training dataset and limiting them to certain cell types that I am interested in (10-15 immune cell types available depending on the training set).
Upon the result, most of my cells are assigned to the rand cluster and there is no clear pattern in the heatmap, I get a strong expression for the rand cluster! What can be possibly wrong? do you have any suggestions? Due to the design, my dataset is supposed to be pretty clear and manually I can identify some clusters very well so I am wondering what might be off during the process?\

p.s. when I try to assess the classifier on my actually data (not the test data) I get the following error

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

I couldn't really find what's going wrong as I simply just replaced the parameters with my query data,I would be thankful for your help!

Here is what I do basically:

stQuery = lfile$sampTab
dim(stQuery)
expQuery = lfile$expDat
dim(expQuery)
commong = rownames(expQuery)
tm = utils_loadObject("tm10x.rda")
tmexp=tm$expDat
tmst=tm$sampTab

cts = c("B cell", "alveolar macrophage","classical monocyte", "early pro-B cell",
"basophil", "dendritic cell","erythroblast","fibroblast","Fraction A pre-pro B cell",
"granulocyte", "hematopoietic precursor cell", "late pro-B cell", "granulocytopoietic cell",
"macrophage", "hepatocyte", "monocyte", "natural killer cell", "T cell", "immature B cell", "immature T cell",
"late pro-B cell","myeloid cell","promonocyte","proerythroblast","non-classical monocyte",
"proerythroblast","mast cell", "hematopoietic precursor cell")
tmst2 = filter(tmst, cell_ontology_class %in% cts)
tmst2 = droplevels(tmst2)
rownames(tmst2) = as.vector(tmst2$cell) # filter strips rownames

tmexp2 = tmexp[,rownames(tmst2)]
dim(tmexp2)

#filter genes
commonGenes = intersect(rownames(tmexp2), commong)
length(commonGenes)
tmexp2_filtered = tmexp2[commonGenes,]

#training
stList = splitCommon(tmst2, ncells=100, dLevel="cell_ontology_class")
stTrain = stList[[1]]
expTrain = tmexp2_filtered[,rownames(stTrain)]
system.time(class_info2<-scn_train(stTrain = stTrain, expTrain = expTrain,
nTopGenes = 10, nRand = 70, nTrees = 1000,
nTopGenePairs = 25, dLevel = "cell_ontology_class", colName_samp = "cell"))

#validate data
stTestList = splitCommon(stList[[2]], ncells=100, dLevel="cell_ontology_class")
stTest = stTestList[[1]]
expTest = tmexp2_filtered[,rownames(stTest)]

#predict
system.time(classRes_val_all2 <- scn_predict(class_info2[['cnProc']], expTest, nrand = 50))

#assess
tm_heldoutassessment = assess_comm(ct_scores = classRes_val_all2, stTrain = stTrain, stQuery = stTest,
dLevelSID = "cell", classTrain = "cell_ontology_class",
classQuery = "cell_ontology_class", nRand = 50)
plot_PRs(tm_heldoutassessment)

#apply to data
system.time(crHS <- scn_predict(class_info2[['cnProc']], expQuery, nrand=50)
tm_assessment = assess_comm(ct_scores = crHS, stTrain = stTrain,
stQuery = stQuery,
classTrain = "cell_ontology_class",
classQuery="cluster",dLevelSID ="cell_name")
plot_PRs(tm_pbmc_assessment)
nqRand=50
sgrp = as.vector(stQuery$cluster)
names(sgrp) = as.vector(stQuery$cell_name)
grpRand = rep("rand", nqRand)
names(grpRand) = paste("rand_", 1:nqRand, sep='')
sgrp = append(sgrp, grpRand)
sc_hmClass(crHS, sgrp, max=5000, isBig=TRUE, cCol=F, font=8)
sc_violinClass(sampTab = stQuery,classRes = crHS, sid = "cell_name", dLevel = "cluster", ncol = 12)

error: package ‘ps’ does not have a namespace

Hi SingleCellNet,
I cannot install the package. Could you please help me? Thanks!

install.packages("devtools")
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/devtools_2.3.2.zip'
Content type 'application/zip' length 448584 bytes (438 KB)
downloaded 438 KB

package ‘devtools’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
C:\Users\Yuanjian\AppData\Local\Temp\RtmpWa9dNR\downloaded_packages

devtools::install_github("pcahan1/singleCellNet")
Error: .onLoad failed in loadNamespace() for 'processx', details:
call: NULL
error: package ‘ps’ does not have a namespace

SCN score a probability?

Hello,

Is the SCN score a probability?

Best,
Alva

Different result of "Visualize average top pairs genes expression for training data"

Hello Yuqi,
When I am running the "Visualize average top pairs genes expression for training data" part, there comes a big list of missing genes

and a Warning message:
In brewer.pal(n = 12, name = "Spectral") :
n too large, allowed maximum for palette Spectral is 11
Returning the palette you asked for with that many colors

The result is like this, not like what you show in the tutorial.

Could you please tell me how to fix this issue?
Thanks!
Best,
YJ

query_transform function error

Hi singleCellNet community,

I was trying to train a mouse gastrulation atlas data. For the binary matrix generation step, I got this error:

system.time(expQtransAll<-query_transform(expTMraw[cgenesA,rownames(stTest)], xpairs))
Error in intI(j, n = d[2], dn[[2]], give.dn = FALSE) :
no 'dimnames[[.]]': cannot use character indexing
Timing stopped at: 73.59 11.98 85.57

I'm pretty show expTMraw keeps both rownames and colnames, but I don't really understand why it says "no 'dimnames[[.]]': cannot use character indexing".

Sincerely hope anyone could help me with this.

thanks!

Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : invalid character indexing

Following the tutorial in the ReadMe, it all goes well until
classRes_val_all = scn_predict(cnProc=class_info[['cnProc']], expDat=expTest, nrand = 50), which returns

Loaded in the cnProc
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : 
  invalid character indexing

Elaboration on methods

Hi Authors,

Could you please elaborate on

how are top differentially expressed genes identified (findClassyGenes/getClassGenes/gnrAll)? Are you using a published DE analysis method? Or looking for the most variable genes (by SD)? Or something else?
how is the binary matrix formed? i.e. under what conditions would a particular value be assigned 0 or 1? I could not find other mentions of "Top-Pair transformation" on the internet.

Some singleCellNet error

HI:
There are some error when i run the singleCellNet,
first,the plot_metrics() error:
plot_metrics(tm_heldoutassessment)
Error in p1 | p4 :
operations are possible only for numeric, logical or complex types
second,the umap() error:
system.time(umPrep_HS<-prep_umap_class(crParkall, stQuery, nrand=50, dLevel="description1", sid="sample_name", topPC=5))
PCA
UMAP
Error in umap(pcRes$x[, 1:topPC], min_dist = 0.5) :
could not find function "umap"
Timing stopped at: 0.408 0.023 0.434

Thank U
NS

class_info$cgenes_list clarification

Hi,

Thank you for such a great package! I had a quick question about class_info$cgenes_list. Are these the top positively expressed genes? Or are they the top most variable (both positive and negatively expressed genes)?

Hope this makes sense.

Thank you!

how to create rda file for pbmc dataset from 10X?

Hello Yuqi,
I'm trying to run the SingleCellNet by using the public PBMC dataset from 10X. I downloaded from the Seurat website (https://satijalab.org/seurat/articles/pbmc3k_tutorial.html). It includes matrix.mtx, barcodes.tsv and genes.tsv.
How could I transfer it into the rda file?
Do I need to do QC, normalizing, scaling, PCA and cell clustering before running the SingleCellNet?
Thanks!
Best,
YJ

clean the TM10X(stList_tm10x_full, sampTab) data into a structure as same as the example data

@yuqiyuqitan Dear Yuqi, many thanks for developing this awesome tool!!

Could you show me how to prepare the stTM (metadata) for the TM10X('stList_tm10x_full.rda'),i.e. making its structure as same as the example data(sampTab_TM_053018)?

Thank you and best,

Guoliang

Can't classify cells

Hello Im using you package to annotate one dataset using one of proper datasets as reference.

Im able to reproduce the whole tutorial but I encounter the following error:

Loaded in the cnProc
Error in expVal[cgenes, ] : subindex out of limits

I get this error when running:

crParkall<-scn_predict(class_info[['cnProc']], expTest, nrand=nqRand)

Any help?

Thanks!

hm_gpa_sel missing genes

In the tutorial, when playing with the variable gene pairs, the function reported the mising genes warning. seed is 100, I am using version 0.1 (session info attached.) Could you kindly looking into it? Thanks!

hm_gpa_sel(gpTab, genes = class_info$cnProc$xpairs, grps = train, maxPerGrp = 50)
Missing genes:  Ear2_Ifitm3,Ear2_Ubb,Abcg1_Ubc,Ear2_Ifitm2,Abcg1_Ubb,Abcg1_Nfkbia,Mpeg1_Ifitm3,Mpeg1_Ubb,Il18_Nfkbia,Mpeg1_Ifitm2,Sirpa_Ifitm2,Il18_Ifitm3,Il1rn_Nfkbia,Nceh1_Ubc,Il18_Jun,Klhdc4_Ubc,Il1rn_S100a6,Il1rn_Jun,Sirpa_Jun,Sirpa_Xist,Nceh1_Xist,Pla2g15_Xist,Ccl6_S100a6,Nceh1_Cd63,Pla2g15_Cd63,H2-Ob_Txn1,Cd79b_Txn1,Cd79a_Txn1,H2-Eb1_Itm2b,H2-Aa_Itm2b,H2-Ob_Ifitm3,H2-Ob_Ifitm2,H2-Oa_S100a6,H2-Oa_Ifitm3,H2-Oa_Anxa2,Cd79a_S100a6,Cd79b_Ifitm2,Cd79b_Ifitm3,H2-Eb1_Ifitm2,H2-Eb1_Anxa2,H2-Aa_Dstn,H2-Aa_S100a6,Cd37_Anxa2,Cd37_Dstn,Cd19_Dstn,H2-Ab1_Cd63,Cd19_Cd63,H2-Ab1_Cd9,Cd19_Cd9,Upk1a_Lgals1,Upk1a_Vim,Ivl_Nfkbia,Ivl_Vim,Ivl_Aldh2,Upk1b_Ifitm3,Foxq1_Hspa8,Upk1b_Ald

Warning message:
In brewer.pal(n = 12, name = "Spectral") :
  n too large, allowed maximum for palette Spectral is 11
Returning the palette you asked for with that many colors

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Leap 15.2

Matrix products: default
BLAS:   /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] singleCellNet_0.1.0 cowplot_1.1.1       reshape2_1.4.4     
 [4] pheatmap_1.0.12     dplyr_1.0.6         data.table_1.14.0  
 [7] magrittr_2.0.1      patchwork_1.1.1     ggplot2_3.3.3      
[10] ll_0.1.0            colorout_1.2-2     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6          mvtnorm_1.1-1       lattice_0.20-41    
 [4] class_7.3-18        assertthat_0.2.1    digest_0.6.27      
 [7] utf8_1.2.1          RSpectra_0.16-0     R6_2.5.0           
[10] plyr_1.8.6          rootSolve_1.8.2.1   e1071_1.7-6        
[13] pillar_1.6.1        rlang_0.4.11        Exact_2.1          
[16] rstudioapi_0.13     Matrix_1.3-2        reticulate_1.18    
[19] labeling_0.4.2      stringr_1.4.0       munsell_0.5.0      
[22] umap_0.2.7.0        proxy_0.4-25        compiler_4.0.3     
[25] pkgconfig_2.0.3     askpass_1.1         DescTools_0.99.41  
[28] openssl_1.4.4       tidyselect_1.1.1    tibble_3.1.2       
[31] gridExtra_2.3       lmom_2.8            expm_0.999-6       
[34] randomForest_4.6-14 fansi_0.5.0         viridisLite_0.4.0  
[37] crayon_1.4.1        withr_2.4.2         MASS_7.3-53.1      
[40] grid_4.0.3          jsonlite_1.7.2      gtable_0.3.0       
[43] lifecycle_1.0.0     DBI_1.1.1           scales_1.1.1       
[46] gld_2.6.2           stringi_1.6.2       farver_2.1.0       
[49] viridis_0.5.1       ellipsis_0.3.2      generics_0.1.0     
[52] vctrs_0.3.8         boot_1.3-27         RColorBrewer_1.1-2 
[55] tools_4.0.3         glue_1.4.2          purrr_0.3.4        
[58] parallel_4.0.3      colorspace_2.0-1

An extraordinary result with sc_compAlpha function

Dear Professor,
I had an extraordinary result with sc_compAlpha function, with a sparse matrix, the function returns the Alpha results with rows, but with a dense matrix, the function returns a strange result, and what happened ?example code:
matrix <- matrix(data = c(1:500), nrow = 10)
rownames(matrix) <- paste("gene",1:10,sep="")
colnames(matrix) <- paste("sample",1:50,sep="")
dThresh=0
matrix <- trans_prop(matrix,total=1e+05)
alphaAll<-sc_compAlpha(matrix,threshold=dThresh);
length(alphaAll)

[1] 500

x <- 7 * (1:7)
mat <- sparseMatrix(i=c(1,3:8), j=c(2,9,6:10), x = x)
rownames(mat) <- paste("gene",1:8,sep="")
colnames(mat) <- paste("sample",1:10,sep="")
alphaAll_dgCMatrix<-sc_compAlpha(mat,threshold=dThresh);
length(alphaAll_dgCMatrix)

[1] 8

Avoid classifier re-training

Dear authors,

Thank you for developing singleCellNet.
I have a question regarding the common gene set used for training. Suppose I am training a classifier that will be used to predict multiple different datasets automatically - I will not know in advance which genes will be present there and which absent. Is there a way to train a generic model that will work on different datasets or will I need to train a new one for each combination of reference/query datasets? Thanks!

"rand" category

Hi!

I know you guys have mentioned in your preprint that "we created 100
randomized cell expression profiles (nrand = 100) to train up a “rand” category in the SCN-TP
classifier, which can help in cases where some cell types that are present in the query data are
not included in the training data (Step 2b)". However, would you guys mind explaining more on what do you mean by "create randomized cell expression profiles"? Do you guys simulate the expression for 100 cells or you randomly choose 100 cells from the dataset to form the category, or else?

thanks!

require package umap

SCN with sparse matrices

Is there an adaptation to SCN to be made in order to have as an input a sparse matrix as opposed to a dense matrix?

theme_dviz_hgrid not defined

add, and give credit where credit is due:

theme_dviz_hgrid <- function(font_size = 14, font_family = "") {
color = "grey90"
line_size = 0.5

Starts with theme_cowplot and then modify some parts

theme_cowplot(font_size = font_size, font_family = font_family) %+replace%
theme(
# make horizontal grid lines
panel.grid.major = element_line(colour = color,
size = line_size),
panel.grid.major.x = element_blank(),

  # adjust axis tickmarks
  axis.ticks        = element_line(colour = color, size = line_size),

  # adjust x axis
  axis.line.x       = element_line(colour = color, size = line_size),
  # no y axis line
  axis.line.y       = element_blank()
)

}

weighted_down dThresh?

system.time(tmpX<-weighted_down(expTrain, 1.5e3, dThresh=0.25))
Error in weighted_down(expTrain, 1500, dThresh = 0.25) :
unused argument (dThresh = 0.25)
Timing stopped at: 0 0 0

dataset about viggette

I download the dataset with the link

I get

where is stTM_raw_060719_demo.rda,expTM_raw_060719_demo.rda,stPBMC_demo.rda,expPBMC_demo.rda, only human_mouse_genes_Jul_24_2018.rda is matched with the viggette

error in assign_cate

Im applying classifier for query data from the tutorial and I got this error
system.time(crACL <- scn_predict(class_info[['cnProc']], expQuery, nrand = 50))
stQuery <- assign_cate(classRes = crACL, sampTab = stQuery, cThresh = 0.5)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 49471, 49521
I have 49471 cells and I saw that in crACL matrix the number of columns are 49521 which the last 50 are rand. Please advise how I should assign the cells based on score.

Thank you

Issues in going through the provided data

When i went through the provided data with the code, i got stuck at accessing classifier.
After entering the code " tm_heldoutassessment = assess_comm(ct_scores = classRes_val_all, stTrain = stTrain, stQuery = stTest, dLevelSID = "cell", classTrain = "newAnn", classQuery = "newAnn", nRand = 50)", the following error message popped up:

Error in AUC(tmp$recall, tmp$precision, method = AUCmethod) :
unused argument (method = AUCmethod)

Ortholog Table for Cross-Species Comparisons

Dear Developer,

Thank you for the comprehensive vignette; I've enjoyed learning this method, especially for cross-species analyses. Specifically, I am curious how you created the ortholog table. Would it be possible to post this method in the SCN repository (if the code exists)? If not, no worries. My motivation for asking primarily comes from the desire to quickly and efficiently create ortholog tables for (1) multiple species or (2) when new versions of references become available.

Thank you for your time.

functions not exported

plot_tsne

Error in classMat

Hi Yuqi,

Thanks so much for your help in troubleshooting yesterday. I have just about gone through the whole process with my own datasets, and I have run into this error when trying to generate a heatmap for classifying my query data to my training data:

sc_hmClass(crQueryall, sgrp, max=5000, isBig=TRUE, cCol=F, font=8)
Error in classMat[, cells2] : subscript out of bounds

The previous step ran fine:
sgrp = as.vector(stQuery$description)
names(sgrp) = as.vector(stQuery$sample_name)
grpRand =rep("rand", nqRand)
names(grpRand) = paste("rand_", 1:nqRand, sep='')
sgrp = append(sgrp, grpRand)

and the table with the query data looks like the same format as the vignette query data.

Any help with figuring out this error would be greatly appreciated!

Julia

Plots.R not included when I 'install_github'

Functions from plots.R aren't in my environment

adapting gpa clustering to the exisiting classification pipeline

i am working on it.

cahanlab / singlecellnet Goto Github PK

singlecellnet's People

Contributors

Stargazers

Watchers

Forkers

singlecellnet's Issues

[1] 500

[1] 8

Starts with theme_cowplot and then modify some parts

Recommend Projects

Recommend Topics

Recommend Org