Giter Site home page Giter Site logo

stemid's Introduction

StemID and RaceID2 algorithms

New Functions

Reorder cluster labels in any tSNE dimenisons by fix_kpart()

Differential gene expression analysis diffexpnb()

Abels (based on Lennart's version) version of plottsne - better graphics and you can change the colors See: plottsne() the original is plottsne.original()

RaceID2 is an advanced version of RaceID, an algorithm for the identification of rare and abundant cell types from single cell transcriptome data. The method is based on transcript counts obtained with unique molecular identifies.

StemID is an algorithm for the derivation of cell lineage trees based on RaceID2 results and predicts multipotent cell identites.

RaceID2 and StemID are written in the R computing language.

Methods

  • initialize. Creates a SCseq object.
    As input we need data frame of transcript counts, columns are cells, rows are genes. Run as:

    • sc <- SCseq(inputdata)
  • filterdata. Filters data.
    Input parameters and default values are:

    1. mintotal=1000 (discards cells with less than mintotal reads)
    2. minexpr=5, minnumber=1 (discards genes with less than minexpr transcripts in at least minnumber cells)
    3. maxexpr=Inf (discards genes with more than maxexpr transcripts in at least one cell)
    4. downsample=FALSE (logical; when TRUE data is downsampled to mintotal transcripts per cell, otherwise it is median normalized)
    5. dsn=1 (number of downsamplings; output is an average over dsn downsamplings)
    6. rseed=17000 (seed used for downsampling)
    7. dsversion="JCB" (downsampling function version)

    Input parameters are stored in slot sc@filterparameters. The method first median normalizes or downsamples (dependeing of downsample) transcripts across cells with more than mintotal transcripts and stores the result in slot sc@ndata. Then removes genes according to minexpr, minnumber and maxexpr and stores resulting data.frame into sc@fdata.

    • sc <- filterdata(sc, mintotal=1000, minexpr=5, minnumber=1, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000, dsversion = 'JCB')
    • sc <- filterdata(sc) -- runs function with default values.
  • clustexp. Clusters data using kmedoids.
    Input parameters and default values are:

    1. clustnr=20 (Number of clusters. Must be greater than 1.)
    2. bootnr=50 (Maximum number of clusters for the computation of the gap statistics or the derivation of the cluster number by saturation criterion.)
    3. metric="pearson" (Metric to compute distance between cells. Options are: "spearman","pearson","kendall","euclidean","maximum","manhattan","canberra","binary","minkowski". Check function dist.gen for more information. Distances are stored in sc@distances.)
    4. do.gap=TRUE (If set to TRUE, the number of clusters is determined using gap statistics. Default is TRUE.)
    5. sat=FALSE (incorporated in RaceID2, computes the number of clusters using saturation criterion.)
    6. SE.method="Tibs2001SEmax" ()
    7. SE.factor=.25 ()
    8. B.gap=50 (Number of bootstrap runs for the gap statistics.)
    9. cln=0 (Number of clusters for clustering. In case it is 0, will be determined by either gap statistics of saturation criterion.)
    10. rseed=17000 (Seed for random number generator used in case of gap statistics and for posterior clustering.)
    11. FUNcluster="kmeans" (incorporated in RaceID2, this can be kmeans, hclust or kmedoids. )

Input parameters are stored in slot sc@clusterpar. Default is taken when no specified.
Data in sc@fdata in clustered using clustfun function. First, the distance bewteen cells is computed according to the metric with function dist.gen and stored in sc@distances as a matrix. Next, if required, the number of clusters is determined using either gap statistics or saturation criterion, using function clusGapExt. Finally, clustering is performed using function clusterboot from fpc R package. Output is sotred in sc@cluster and sc@fcol:

  • object@cluster$kpart: contains the cluster assignation of each cell before oultier detection (next step in analysis).
  • object@cluster$jaccard
  • object@cluster$gap
  • object@cluster$clb
  • object@fcol

Run as:

  • sc <- clustexp(sc, clustnr=20, bootnr=50, metric="pearson", do.gap=FALSE, sat=TRUE, SE.method="Tibs2001SEmax", SE.factor=0.25, B.gap=50, cln=0, rseed=17000, FUNcluster="kmedoids")
  • sc <- clustexp(sc) -- runs function with default values
  • findoutliers. Finds outliers.
    Input parameters and default values are:
  1. outminc=5 ()
  2. outlg=2 ()
  3. probthr=1e-3 ()
  4. thr=2**-(1:40) ()
  5. outdistquant=.95 ()
  6. version = 2 (equal to 1 or 2, depending on RaceID version)

hmmm Run as:

  • sc <- sc <- findoutliers(sc, outminc=5,outlg=2,probthr=1e-3,thr=2**-(1:40), outdistquant=.95, version = 2)
  • sc <- findoutliers(sc) -- runs function with default values
  • comptsne. Computes tSNE map.
    Input parameters and default values are:
  1. rseed=15555 (seed for random numbers)
  2. sammonmap=FALSE ()
  3. initial_cmd=TRUE ()
  4. others ()

hmmm Run as:

  • sc <- comptsne(sc, rseed = 1555, sammonmap = FALSE)
  • sc <- comptsne(sc)

Plots

  • clustheatmap.

  • plottsne.

Functions

  • downsample. Downsamples inputdata.
    Transcript data is converted to integer data and random sampling is done dsn times and averaged. A peudocount equal to 0.1 is added to the resulting data.frame. There are two versions (DG and JCB, written by Dominic Gr"un and Jean-Charles Boisset respectively). By default the functions uses JCB version. To choose another one use dsversion in method filterdata.

  • clustfun. Clusters sc@fdata.
    Version 2, from RaceID2. Computes distance between cells (using dist.gen function) using specified metric. Determines cluster number if required using gap statistics or saturation criterion. Then clusters data (using clusGapExt function) using the specified method -kmedoids, kmeans or hclust-.

  • dist.gen. Distance between cells.
    Computes and returns the distance matrix computed by using the specified distance (mmetric) measure to compute the distances between the cells. In case of metric "spearman", "pearson", or "kendall", the function takes 1 - correlation as a distance, and takes the direct measurement of the distance for metric "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski".

  • clusGapExt. Gap statistics and saturation criterion.

The following files are provided:

StemID/RaceID2 class definition: RaceID2_StemID_class.R StemID/RaceID2 sample code: RaceID2_StemID_sample.R StemID/RaceID2 reference manual: Reference_manual_RaceID2_StemID.pdf StemID/RaceID2 sample data: transcript_counts_intestine_5days_YFP.xls

stemid's People

Contributors

anna-alemany avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.