Giter Site home page Giter Site logo

sscclust's Introduction

Travis CI Build Status AppVeyor Build status

sscClust

simpler single cell RNAseq data clustering (sscClust), is a package implement mutilple functionalities which are basic procedures in single cell RNAseq data analysis, including variable genes identification, dimension reduction, clustering on reduced data. Also some new ideas were proposed, such as projecting data to a feature space using spearman correlation which make visualiaztion and clustering more efficient, clustering with subsampling and classfification which make it feasible to process thousands cells' data.

Installation

Calculation will become expensive when the dataset become large, so we stongly recommand use R which linked to optimized BLAS library, such as ATLAS, MKL etc. For windows users, Microsoft R open is recommanded, and for Unix-alike users, please refer this for how to compile R using external BLAS library.

Package sscVis is required.

To install this package, simply:

install.packages("devtools")
devtools::install_github("Japrin/sscVis")
devtools::install_github("Japrin/sscClust")

Example

Run the clustering pipeline for clustering using all data:

data("sce.Pollen")
sce.all <- ssc.run(sce.Pollen, subsampling=F, k.batch=11)

for clustering with subsampling:

data("sce.Pollen")
sce.sub <- ssc.run(sce.Pollen, subsampling=T, sub.frac = 0.8, k.batch=11)

More information can be found in the vignettes:

  1. clustring by various methods, see this vignette

  2. clustering using spearman correlation and subsampling, see this vignette.

sscclust's People

Contributors

japrin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sscclust's Issues

Vignette Code ssc.run argument method.vgene: "sd" does not work

Hi Japrin

When running your vignette, in the block #All in One: sce.Pollen <- ssc.run(sce.Pollen,method.vgene = "sd",sd.n = 1500,method.reduction = "pca",method.clust = "kmeans", k.batch=11,seed = 9997) yields the error message Error in ssc.reduceDim(obj, assay.name = assay.name, method = method.reduction, : No variable genes identified by method sd !

It could be solved by rewriting the argument value for method.vgene from "sd" to "HVG.sd", so running
sce.Pollen <- ssc.run(sce.Pollen,method.vgene = "HVG.sd",sd.n = 1500,method.reduction = "pca",method.clust = "kmeans", k.batch=11,seed = 9997)

which then works: this makes sense, as in the invoked ssc.variableGene method, the arguments can be one of "HVG.sd", "HVG.mean.sd", "HVG.trendVar" (but not "sd").

Is this the correct way of running this vignette?

Kind regards

Confusion for density based clustering method

Hello, Japrin
According to the clustering method document, run dimension reduction using both PCA and tSNE method, and then density-based clustering on the tSNE map should use a script like:

sce.Pollen <- ssc.run(sce.Pollen,method.vgene = "sd",sd.n = 1500,method.reduction = "pca",
                      method.clust = "dpclust", 
                      parfile = system.file("extdata/Pollen.par.r",package = "sscClust"),
                      out.prefix = "./Pollen.dpclust", seed=9997)

But the argument method.reduction used here is "pca", not "tsne", so I'm confused about the actual method used here.
Which one of the following is right?

1. PCA -> tSNE -> use tSNE data to perform density clustering
2. PCA -> use PCA data to perform density clustering -> Mapping the result of clustering to a tSNE Map

By the way, the argument seed seems to make a diffrence in the clustering result, do you have any recommendation for this?

How to display two groups of cells in one t-SNE map

Hi Japrin

Thanks again for yesterday's reply, and now a new question come across:
There was a t-SNE map(section b) displaying cell clustering result in a paper correlating to this package:
plot
It is described that the clustering analysis was processed separately for CD8+ and CD4+ cells, so I'm wondering how to put things together, since I didn't found any function to do similar things in this package.

Looking forward to your reply,
Thanks a lot!

perform "str(metadata(sce.Pollen)$ssc$variable.gene$sd) " got "NULL"

Hi, Japrini
I am a fresh man of R language . And i performed the codes following your guide, but i got
NULL instead of chr [1:1500] "G5596" "G945" "G244" "G496" "G3558" "G5598" "G1909..." when i visualize the variable gene selection in part:

sce.Pollen <- ssc.variableGene(sce.Pollen,method="sd",sd.n=1500) str(metadata(sce.Pollen)$ssc$variable.gene$sd)

Could you give me some suggestions? Thanka a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.