Giter Site home page Giter Site logo

scrattch.hicat's Introduction

DOI

scrattch.hicat: Hierarchical, Iterative Clustering for Analysis of Transcriptomics

A hicat

Master: Travis build status Coverage status

Dev: Travis build status Coverage status

Installation

scrattch.hicat has several dependencies, including two from BioConductor and one from Github:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("limma")

devtools::install_github("JinmiaoChenLab/Rphenograph")

Once these dependencies are installed, scrattch.hicat can be installed with:

devtools::install_github("AllenInstitute/scrattch.hicat")

Vignettes

An overview of the main functions in scrattch.hicat

Tutorials

An interactive walkthrough of the major steps in clustering for scrattch.hicat.

Roadmap

The next few updates to scrattch.hicat will be aimed at getting code testing in place for major clustering functions:
0.0.22: Current version; Tests in place for de.genes.R functions.
0.0.23: Tests in place for cluster.R functions.
0.1.0: Vignette re-integrated; Adding pkgdown page; Update to Master branch.

Previous updates: 0.0.21: Added TravisCI and covr integration.

The scrattch suite

scrattch.hicat is one component of the scrattch suite of packages for Single Cell RNA-seq Analysis for Transcriptomic Type CHaracterization from the Allen Institute.

License

The license for this package is available on Github at: https://github.com/AllenInstitute/scrattch.hicat/blob/master/LICENSE

Level of Support

We are planning on occasional updating this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests.

Contribution Agreement

If you contribute code to this repository through pull requests or other mechanisms, you are subject to the Allen Institute Contribution Agreement, which is available in full at: https://github.com/AllenInstitute/scrattch.hicat/blob/master/CONTRIBUTION

Image attribution:

By Internet Archive Book Images [No restrictions], via Wikimedia Commons

scrattch.hicat's People

Contributors

cvanvelt avatar eturkes avatar hypercompetent avatar jeremymiller avatar lawrenceh123 avatar oliviaf-alleninstitute avatar sarojasomu avatar tebakken avatar yilinzhao1615 avatar yzizhen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrattch.hicat's Issues

Display_cl function

Hello scrattch.hitcat team,

I am following the tutorial presented here: https://taxonomy.shinyapps.io/scrattch_tutorial with the tasic2016data. However, I am having problems when plotting the heatmaps (display.result = display_cl(onestep.result$cl, norm.dat, plot=TRUE, de.param=de.param), I got the following error:

Error in is.null(Rowv) || is.na(Rowv): 'length = 2' in coercion to 'logical(1)'
Traceback:

  1. display_cl(onestep.result$cl, norm.dat, plot = TRUE, de.param = de.param)
  2. plot_cl_heatmap(tmp.dat, cl, markers, ColSideColors = tmp.col,
    . prefix = prefix, labels = NULL, by.cl = TRUE, min.sep = min.sep,
    . main = main, height = height, width = width)
  3. heatmap.3(tmp.dat[, ord], Rowv = as.dendrogram(gene.hc), Colv = NULL,
    . col = col, trace = "none", dendrogram = "none", cexCol = cexCol,
    . cexRow = cexRow, ColSideColors = ColSideColors[, ord], breaks = breaks,
    . colsep = sep, sepcolor = "black", main = main, key = key,
    . density.info = "none")

I was wondering if you could help me with this issue.

Thank you

How to cite scratch.hicat ?

I've used some of your code for my own research and I'm trying to find a way to reference your package. Could you please point me to the correct reference?

Thanks!

install requirements

I think the install instructions are missing installing WGCNA from bioconductor? They say "several dependencies, including two from BioConductor and one from Github", but only list limma from bioconductor, and I get an error if I haven't installed WGCNA already.

fast_tsne.R breaks installation of dev branch

Warning in file(filename, "r", encoding = encoding) :
cannot open file '/home/trygveb/src/FIt-SNE/fast_tsne.R': No such file or directory
Error in file(filename, "r", encoding = encoding) :
cannot open the connection
Error : unable to load R code in package ‘scrattch.hicat’
ERROR: lazy loading failed for package ‘scrattch.hicat’

Installation error

Hi,

I tried installing package, scrattch.hicat, using remotes::install_github("AllenInstitute/scrattch.hicat"), while the below error messages were obtained:

Downloading GitHub repo AllenInstitute/scrattch.hicat@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
download from 'https://api.github.com/repos/AllenInstitute/scrattch.hicat/tarball/HEAD' failed

Any suggestions are appreciated!

Best regards,
Zhenyao

ERROR: dependency ‘qlcMatrix’ is not available for package ‘scrattch.hicat’

Hi,

When trying to install scrattch.hicat, I run into the following error:

ERROR: dependency ‘qlcMatrix’ is not available for package ‘scrattch.hicat’

It seems that qlcMatrix is no longer available on CRAN. Installing the package from its GitHub repo using devtools solved the problem.

Perhaps the DESCRIPTION file could be updated to retrieve the dependency automatically from GitHub upon installation?

Best,

Ángeles

Choosing the DE score threshold

Hi,
I am clustering snRNAseq data and I was wondering how you chose DE scores for different types of data (

#' # Recommended initial parameters for 10x Nuclei (> 1,000 genes per sample):
). Was there a set of statistic tests to determine these numbers, or were they chosen based on trial and error with the different datasets?
I am also curious about why DE score contribution is based on the p-values for chi squares from a binary express/doesn't express metric rather than a test for continuous data. I do also see that the genes are filtered by LFC before they are allowed to contribute to the DE score.
Thanks!

iter_clust keeps failing

Hi
Im trying to run the scrattch.hicat module on some 10x data. however the iter_clust function keeps failing with the error Error in unique.default(cl) : unique() applies only to vectors

I am not sure what is causing this. It works fine on the toy dataset on tasic2016 data but seems to fail on mine.
I am using log and not log2 normalised values which were first corrected for library size and then scaled to factor of 10,000 in seurat. Primarily because UMI counts are much lower so its common to use this than cpm or fpkm. can that be causing the error.

Merge clusters based on number of marker genes regardless of de.score

Count the number of up and down markers that meet the de gene criteria (default p < 0.01 and log2FC > 1). Allow merging of cluster pairs based on the number of up and/or down markers regardless of the de score (i.e. summed -log10P). This can be used as a final curation of clusters to require bidirectional markers between all clusters.

Verbosity needs to be adjustable

Running iterative clustering dumps a lot of text to the console. We should be able to toggle or adjust this behavior throughout scrattch.hicat.

Dependencies need to be updated

qlcMatrix has been deprecated on CRAN (see reason here). The DESCRIPTION file should be updated to use the archived version or some other workaround.

Split findVG to computation and plots

Currently, findVG both computes statistics for variable genes and can optionally output plots to a file as a side effect. Instead, we should split this up to 3 functions:
1 computes the statistics
1 generates the plots
1 saves the plots (or leave this up to the user to use ggsave() or cowplot).

Plot River includes source files not included

The riverplot function include two lines that source code not included in the repository, when you run the function it gives this error.

In file(filename, "r", encoding = encoding) :
cannot open file '/home/bharris/zizhen/My_R/map_river_plot.R': No such file or directory

when I view the function, it appears that both of these will throw an error

	source("~/zizhen/My_R/map_river_plot.R")
	source("~/zizhen/My_R/sankey_functions.R")

Broad types

Hello,

I was trying to use hicat on my own data. However, I can't seem to find the genes/markers that you use to classify in the broad types of GABAergic, glutamergic and non-neuronal (first levels of the dendrogram). Could you please indicate where I can find more information on that ?

Thank you so much in advance, I've been searching and searching

Best

Docs: Workflow of functions

Recommended by Fahimeh: We should build a workflow showing which functions are used for each step of analysis, analogous to our supplementary figure.

Details on the clustering method

Hello,

I used scrattch.hicat for my scRNA-seq analysis and I thank you for the developpement of this great tool.

However, In the pipeline that you used in the Tasic et al, 2018 paper, I would like clarification on some points.

  1. You performed the bootstrapping and consensus clustering on each of the broad class that you identified beforehand. What it is not clear for me it is when you merged the co-clustering matrices. I understand that you merged the co-clustering matrices of PCA and WGCNA modes for each broad class. Is it the case? So steps until the merging module is applied for each broad class, right?

  2. In the de_param function, to set the de.score.th you recommand to use for small datasets (#cells < 1000), a de.score.th = 40,
    and for large datasets (#cells > 10000), a de.score.th = 150. But do we consider the whole dataset to set this de.score.th or the number of cells encompassed in each broad class?
    For example if I have a dataset of 8000 cells with 3 classes (class1: 6000 cells/ class2: 1500 cells/ class3: 500 cells), I have to set:

  • for class1: de.score.th= 105
  • for class2: de.score.th= 50
  • for class3: de.score.th= 40

or, do I have to set:
de.score.th=130 for all?

  1. Last question: when assigning core and intermediate cells, if you find that the best.cluster.score is not the original cluster of the cell, do you reassign this cell to its best cluster or do you keep the original one?

Thank you in advance.
Best regards

Update column names for tutorial

Hello!

My name is Andrew Blair, a graduate student at UCSC in Josh Stuart's lab. First, congratulations and thank you for providing a clear and descriptive tutorial!

I ran into a few minor annotation 'bugs' during the tutorial though and wanted to let you guys know:

  1. Update from tutorial 'primary_type' to 'primary_type_label' and 'sample_id' to 'sample_name'
    select.cells <- tasic_2016_anno %>%
    filter(primary_type_label != "unclassified") %>%
    filter(grepl("Igtp|Ndnf|Vip|Sncg|Smad3",primary_type_label)) %>%
    select(sample_name) %>%
    unlist()

  2. The column 'primary_type' was not present in the data frame
    ref.cl.df <- as.data.frame(unique(anno[,c("primary_type_id", "primary_type_color", "broad_type")]))

  3. Update from anno$sample_id to anno$sample_name
    ref.cl <- setNames(factor(anno$primary_type_id), anno$sample_name)

Your tutorial worked after these updates.

Thanks Again!

Constellation plots full example

Hello,

Thank you so much for making this toolbox publically available! I am trying to make a constellation plot like the ones you have in your papers, but I am having a hard time generating the correct input for the get_knn_graph() function. You provide great example csv files for the plot_constellation() function but none for the knn_graph function. Would you mind providing me with example input to that function so that I can tailor my data accordingly?

Thanks,
Salwan

Clustering should be able to run without writing files

We need to have a mode for clustering that doesn't write files to the users' machine. This is not normal behavior for most functions. The outputs aren't very large, so we should be able to contain these results in a list object.

Constellation plots: calculating knn in reduced dimension space - pca or umap?

Hi scrattch.hicat team,

I've been trying make some constellation plots of my own using the code in your package.

In your methods section for the Yao 2020 preprint, you describe the process for making the constellation plots:

For each cell its 15 nearest neighbors in reduced dimension space were determined and summarized by cluster. For each cluster, we then calculated the fraction of nearest neighbors that were assigned to other clusters.

Does "reduced dimension space" here refer to PCA, or UMAP?
And if PCA - how many PCs did you use?

I understand that the cluster nodes are derived from the UMAP coordinates (centroids), but it's not clear from the explanation or the code if you are getting the knn table from PCA or UMAP coordinates. My hunch is that you use PCA for this, following the workflow used for clustering. Am I right about this?

Thanks a lot!
Carmen

after iter_clust, cl>49

Hi,

What does cl>49 mean after iter_clust? Because the annotation from Tasic 2016 only covers 0-49 clusters, cl>49 cannot be annotated.

My sample size is around 20k cells. This is the parameter that I used.
de.param <- de_param(padj.th = 0.05,
lfc.th = 1,
low.th = 1,
q1.th = 0.5,
q.diff.th = 0.7,
de.score.th = 150)

pca.clust.result <- iter_clust(norm.dat,
dim.method = "pca",
de.param = de.param)

Thank you.

library() order matters

If library(WGCNA) is called later than library(Matrix), there are errors that crop up related to matrices. We may need to always call sparse matrix-related computations using Matrix::

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.