Giter Site home page Giter Site logo

veredkl / scrna_advancedembeddings Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sigalgrabois/scrna_advancedembeddings

0.0 0.0 0.0 5.67 MB

Enhance K-means clustering for single-cell data analysis by integrating advanced embedding techniques like Universal Cell Embeddings (UCE) and Cell2Sentence. Compare with original data and PCA. Improve K-means++ by adjusting for cell density to maintain performance in high-dimensional, variable density, and biologically complex data.

Jupyter Notebook 100.00%

scrna_advancedembeddings's Introduction

Enhancing K-means Clustering for Single-Cell Data Analysis

By Sigal Grabois and Vered Klein

Overview

This project aims to enhance the robustness and adaptability of the K-means clustering algorithm for single-cell RNA-sequencing (scRNA-seq) data analysis, particularly in assessing cellular heterogeneity. We integrate advanced embedding techniques such as Universal Cell Embeddings (UCE) and Cell2Sentence, and compare their performance with traditional PCA and the original data. Additionally, we propose an enhancement to the K-means++ algorithm, adjusting it based on cell density in the dimensional space to better handle high-dimensional, variable density, and complex biological data.

Background and Motivation

Analyzing scRNA-seq data is essential for advancing our understanding of cell biology, exploring disease mechanisms, and enhancing the drug development process. A fundamental step in scRNA-seq data analysis is cell type clustering, which organizes cells into groups based on gene expression patterns. This initial clustering significantly influences subsequent analyses and interpretations.

The Problem

Traditional methods for analyzing scRNA-seq data mainly rely on raw gene expression data, which can be noisy and lack comprehensive information. Other self-supervised methods depend on cell type annotations, limiting their applicability to unseen cell types or datasets. Non-zero-shot methods typically require model tuning for each new dataset, making the representations non-universal without re-training, which is inefficient and time-consuming.

Our Solution

Advanced Embedding Techniques

  • Universal Cell Embeddings (UCE): Using a self-supervised model that maps single-cell gene expression profiles into a universal embedding space, capturing the vast molecular diversity across different cell types and species.
  • Cell2Sentence: Another embedding technique for creating meaningful representations of single-cell data.
  • PCA and Original Data: Used as benchmarks for comparison.

Enhanced K-means++ Algorithm

We propose modifying the K-means++ algorithm by incorporating cell density in the dimensional space to improve clustering accuracy. This modification ensures that denser areas have a higher impact on the centroid’s new location, enhancing the algorithm's performance in high-dimensional and variable density data.

Results

Data Sets Used

  • 10K PBMC Data Set: Includes 11,990 cell samples.
  • Mouse Kidney Data Set: Includes 35,833 cell samples.

Evaluation

We evaluate the clustering performance using score accuracy between known labels to the majority of the labels within each cluster. The results show that integrating advanced embedding techniques with the K-means++ algorithm improves clustering accuracy and robustness.

Accuracy vs. Number of Clusters for Different Methods

Conclusion

The integration of advanced embedding techniques with enhanced clustering algorithms holds promise for improving the analysis of heterogeneous cellular data. Future studies should explore the scalability of these methods and their applicability to larger and more diverse datasets.

References

  1. UCE
  2. Cell2Sentence
  3. Lin et al.
  4. Li et al.
  5. Hua et al.
  6. Liu et al.

Requirements

  • numpy~=1.26.4
  • requests~=2.31.0
  • pandas~=2.2.2
  • anndata~=0.10.7
  • scipy~=1.13.0

scrna_advancedembeddings's People

Contributors

sigalgrabois avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.