Giter Site home page Giter Site logo

contrastive-sc's Introduction

Constrative-sc

This repository contains the pytorch implementation of the paper "Contrastive sel-supervised clustering of scRNA-seq data", by Madalina Ciortan under the supervision of Matthieu Defrance (BMC Bioinformatics )

We adapted the self-supervised contrastive learning framework, initially proposed for image processing, to scRNA-seq data. An artificial neural network learns an embedding for each cell through a representation training phase. The embedding is then clustered with a general clustering algorithm (i.e. KMeans or Leiden community detection). Our method, contrastive-sc, has been compared with another ten state-of-the-art techniques. A broad experimental study has been conducted on both simulated and real-world datasets, assessing multiple external and internal clustering performance metrics (i.e. ARI, NMI, Silhouette, Calinski).

Overview of the repository

  • notebooks folder contains all jupyter notebooks to run the project, as detailed below.
  • others folder contains the code to reproduce all experiments with scanpy, sczi, scDeepCluster
  • R folder contains the scrips to generate the simulated data in folder R/simulated_data (both balanced and imbalanced)
  • outoput contains model dumps and the results of running all experiments, needed to reproduce the plots
  • docker contains the Dockerfile to create the image used to run all python experiments
  • real_data contains the biological scRNA-seq data, downloaded from scDeepCluster, as detailed below
  • train.py contains the main functionalities for training and evaluating the model results
  • model.py contains the network definition
  • st_loss.py contains the implementation of the loss functions
  • utils.py contains various utility functions

Overview of notebooks

  • Main.ipynb represents the main entry point, contains code snipped to train the model on scRNA-seq data
  • Benchmark_real_data, Benchmark_simulated_data contain the code to reproduce all experiments on contrastive-sc
  • Plots_simulated_data, Plots_real_scRNAseq contains code to reproduce all figures
  • Grid_search* comprise all ablation studies on network architecture, learning rate, data augmentation strategies, gene selection strategy

Environment Setup

We have employed a docker container to facilitate reproducing the paper results.

Python environment

It can be launched by running the following:

cd docker  
docker build -t contrastive-sc .

The image has been created for GPU usage. In order to run it on CPU, in the Dockerfile, the line "pytorch/pytorch:1.4-cuda10.1-cudnn7-runtime" should be replaced with a CPU version.

The command above created a docker container tagged as contrastive-sc . Assuming the project has been cloned locally in a parent folder named notebooks, the image can be launched locally with:

docker run -it --runtime=nvidia -v ~/notebooks:/workspace/notebooks -p 8888:8888 contrastive-sc

This starts up a jupyter notebook server, which can be accessed at http://localhost:8888/tree/notebooks

R environment

We followed the instructions on this tutorial in order to create an R docker container which comes with most single-cell related libraries already installed. In order to launch it on port 8787, execute the following:

docker run -d -p 8787:8787 -e USER='rstudio' -e PASSWORD='rstudioSC' -e ROOT=TRUE -v ~/notebooks/deep_clustering:/home/rstudio/projects vbarrerab/rstudio_singlecell

Data

The simulated datasets can be downloaded from this Google Drive link (~400MB). Alternatively, it can be generated by running R/all_balanced.r or R/all_imbalanced.R.

The single cell data has been collected from scDeepCluster repository and scziDesk repository. It should be saved to real_data folder.

Reproducing the competing methods' results

The implementation used for benchmarking the methods in R used the script made available by scziDesk and can be found in R/run_methods.r. It has been enriched with the computation of silhouette and calinski scores.

The remaining python methods have been made available in others folder.

contrastive-sc's People

Contributors

ciortanmadalina avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.