Giter Site home page Giter Site logo

broadinstitute / celligner2 Goto Github PK

View Code? Open in Web Editor NEW
4.0 14.0 0.0 257.52 MB

A new version of the celligner package using VAEs. Inspired from the Theis lab's scArches.

License: The Unlicense

Makefile 0.01% Python 0.29% Jupyter Notebook 91.99% Dockerfile 0.01% HTML 7.72%
rna-seq omics machine-learning vae-pytorch alignment cancer-genomics

celligner2's Introduction

celligner2

codecov CI

Created by Jérémie Kalfon @jkobject (BroadInsitute, Celligner2 is a new version of the celligner tool to align cancer transcriptomics data through tumors and models. Find out more about celligner1 here: Global computational alignment of tumor and cell line transcriptional profiles

This method is based on the trVAE/scArches method from the Theis Lab and adds multiple features to improve its performance for our needs. Amongst those:

  • Semi-supervision to classify cell type and any other feature provided. This improves the latent space and makes the model focus on what the researcher is interested about.
  • Improved surgery by allowing to increase model size and freezing trained weight.
  • Multi dataset MMD on latent space together with better batch mixing. These are improvements to method already there and allows the user to :
    • have multiple dataset at once.
    • perform better correction when large bath effects exist. (e.g. between Cancer cell lines and frozen tumor tissues)
  • Explainable AI tools like LRP with GSEA to look at pathway enrichment to understand the features the model is looking at to make a prediction.
  • QC methods: getting at quality (using scIB). making interactive umap plots. looking at reconstruction, classifications and more..

A next phase of development regards the addition of the expimap_mode. In this mode we have copied the code coming from expimap so that the model can use a different latent space, based on gene sets and a decoder that is replaced by a linear model masked by the genes in each gene set. references to the expimap mode can be seen in places with the #expimap comment. only a partial implementation of that was made. This means some arguments and functions have been copied from the expimap ode and started to be used and adapted to the Celligner2 codebase. Running it currently would yields bugs as this is not finished. Some references to the graph NN model or improvements to the architectural surgery might be seen in the code and don't have functional implications yet.

More about the model on this presentation: Celligner2.0 Update

Install it

git clone https://github.com/broadinstitute/celligner2.git
cd ..
pip install -e .

pypi

/!\ not functional yet

pip install celligner2

Usage

For information on usage please see the different notebooks in runs/. Unfortunately a general demo notebook is not yet present. The latest version of the run is in -v4.ipynb.

For information about data generation please see the data/ folder.

from celligner2 import BaseClass
from celligner2 import base_function

BaseClass().base_method()
base_function()

/!\ not functional yet

$ python -m celligner2
#or
$ celligner2

About the Code

The code model is the one used by pytorch and the Theis lab. More can be understood by looking at the code and the usage in the notebook Some base model functions are implemented as different class (othermodels/base/_base.py) to be extended by the model/celligner2model.py. This file contains the full definition of the model (with the training, data management and some usage). The model architecture however is listed in the model/celligner2.py file. additional key model functions are model/modules and model/losses. The training definition is in trainers/celligner2/trainer.py which is extended by trainers/celligner2/semisupervised.py. Dataset management (encoding / preprocessing etc..) is defined in dataset/ and dataset/celligner2/_ . Finally, plotting/ contains plotting/celligner2_eval.py which is the evaluator of the model. it expects a trained celligner model and can produce many plots and evaluation of the model, including things related to its use post training, that would be better placed in the model/celligner2_model.py file.

The definition of things as /base and /celligner2 is made because initially scArches is a reimplementation of many models where each is reusing and reimplementing base modules/tools. We decided to keep it this way for ease of use / collaboration with the Theis lab.

Development

Read the CONTRIBUTING.md file.

Current ongoing tasks are in the Asana project: Celligner in the Celligner2 section.

celligner2's People

Contributors

jkobject avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.