cantinilab / hummus Goto Github PK

View Code? Open in Web Editor NEW

17.0 4.0 4.0 81.03 MB

Molecular interactions inference from single-cell multi-omics data

Home Page: https://cantinilab.github.io/HuMMuS/

License: GNU Affero General Public License v3.0

R 43.25% Python 54.79% CSS 1.96%

biological-networks epigenomics multi-omics multilayer-networks single-cell transcriptomics

hummus's Introduction

HuMMuS

Heterogeneous Multilayer network for Multi-omics Single-cell data

HuMMuS exploits multi-omics single-cell measurements to infer numerous regulatory mechanisms. Inter-omics (e.g. peak-gene, TF-peak) and intra-omics interactions (e.g. peak-peak, gene-gene, TF-TF) are considered to capture both regulatory interactions and macromolecule cooperations.

Overview

The current outputs available from HuMMuS are

gene regulatory networks (GRNs)
enhancers
TF - DNA binding regions
TF - target genes.

Read our publication for more details !

scRNA + scATAC

Like most of the current state-of-the-art methods to infer GRN, we propose a minimal version of HuMMuS based on scRNA-seq + scATAC-seq data (paired or unpaired).

Use of additional modalities

HuMMuS has been developed to be extendable to any additional biological modality of interest. It is then possible to add any additional network to an already existing modality (e.g. both prior-knowledge network and data-driven network of genes), or from a new modality (e.g. adding epigenetic or proteomic networks).
For now, such personalisation requires to use directly some hummuspy (python package) functions at the end of the pipeline and write some configuration files manually. It will be simplified soon !

Tutorials/Vignettes

Infer a gene regulatory network and other outputs from unpaired/paired scRNA+scATAC data shows the application of HuMMuS to the Chen dataset, used in the benchmark of HuMMuS publication.

Installation

HuMMuS is for now ready only in R but requires some python dependencies (hummuspy).

HuMMuS python dependency

Python package hummuspy should preferably be installed using pip (from the terminal in a conda environment for e.g)

conda create -n hummuspy_env python
conda activate hummuspy_env
pip install hummuspy

Alternatively, you can also install it directly from R using the reticulate package:

library(reticulate)
py_install("hummuspy", envname = "r-reticulate", method="auto")

HuMMuS R package

Core R package can be installed directly from R:

devtools::install_github("cantinilab/HuMMuS", ref="dev_SeuratV5")

# If you only work SeuratV4, you can also use main branch that will soon be deprecated
#devtools::install_github("cantinilab/HuMMuS")

Before running HuMMuS, if you're using multiple conda environment you need to make sure to that reticulate points toward the one where hummuspy is installed. You can precise it at the beginning of your code :

library(reticulate)
# Using a specific conda environment
envname = "hummuspy_env" # or "r-reticulate" for e.g.
use_condaenv(envname, required = TRUE)

For more details on how to setup the reticulate connection, see: https://rstudio.github.io/reticulate

scATAC processing

To compute directly the scATAC data with HuMMuS, we currently only propose to use Cicero. It requires the version running with Monocle3. You then need to install both Monocle3, and Cicero:

devtools::install_github("cole-trapnell-lab/monocle3")
devtools::install_github("cole-trapnell-lab/cicero-release", ref = "monocle3")

If you encounter some troubles with Monocle3 installation, on ubuntu you can try to run: sudo apt-get install libgdal-dev libgeos-dev libproj-dev. You can also go on their github page for more help. Having a previous version of Monocle (1 or 2) still in your R session can cause some troubles. If you encounter some even after restarting your R session,, try to remove.packages("monocle") before to reinstall both Monocle3 and Cicero

Data accessibility

To reproduce HuMMuS results presented in the manuscript, preprocessed data are accessible here
For quick tests, the Chen dataset preprocessed is accessible directly through the package as a Seurat object: load(chen_dataset), along with a subset version load(chen_dataset_subset).

Cite us

Trimbour R., Deutschmann I. M., Cantini L. Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS. Bioinformatics (2024), btae143. doi: https://doi.org/10.1093/bioinformatics/btae143

hummus's People

Contributors

Stargazers

Watchers

Forkers

deeenes o-mics das2000sidd ronfinn

hummus's Issues

Adding PPI information

Hello, thank you for this tool! It would be very useful for me as I have scRNA-seq, scATAC-seq and proteomic data.

I would like to incorporate the PPI data into the GRN.
I saw this information in the readme "For now, such personalisation requires to use directly some hummuspy (python package) functions at the end of the pipeline and write some configuration files manually. It will be simplified soon !", however if you could please give me some more information about how to write the config files in that case it would be great!

Thank you so much
Best regards,
Maria

Error in add_network() function within Running Cicero for Hummus Object

Hi Rémi,

I hope you're doing well!

I'm mostly through your documentation, but I encountered an error today while trying to run Cicero via this script:

# Compute ATAC peak networks
hummus_case <- compute_atac_peak_network(hummus_case,
                                         atac_assay = "peaks",
                                         verbose = 1,
                                         genome = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38,
                                         store_network = FALSE)

hummus_control <- compute_atac_peak_network(hummus_control,
                                            atac_assay = "peaks",
                                            verbose = 1,
                                            genome = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38,
                                            store_network = FALSE)

Here's the output and error message I received:
[1] "Starting Cicero"
[1] "Calculating distance_parameter value"
[1] "Running models"
[1] "Assembling connections"
[1] "Successful cicero models: 10131"
[1] "Other models: "

Zero or one element in range
3454
[1] "Models with errors: 0"
[1] "Done"

236663 peak edges with a coaccess score > 0 were found.
Peak network construction time: 5.47283
Error in add_network(object = hummus, network = atac_peak_network, network_name = network_name, :
Object is not a multiplex, a multilayer nor an hummus object.

And, here's what my Hummus objects look like:

> hummus_case
An object of class "Hummus_Object"
<S4 Type Object>
attr(,"assays")
attr(,"assays")$RNA
Assay (v5) data with 36601 features for 15621 cells
First 10 features:
 MIR1302-2HG, FAM138A, OR4F5, AL627309.1, AL627309.3, AL627309.2, AL627309.5, AL627309.4, AP006222.2, AL732372.1 
Layers:
 counts.Gene Expression.RNA_72, counts.Gene Expression.RNA_203, counts.Gene Expression.RNA_271, counts.Gene
Expression.RNA_294 

attr(,"assays")$SCT
SCTAssay data with 27672 features for 15621 cells, and 4 SCTModel(s) 
First 10 features:
 AL627309.1, AL627309.5, AL627309.4, LINC01409, FAM87B, LINC01128, LINC00115, FAM41C, SAMD11, NOC2L 

attr(,"assays")$integrated
SCTAssay data with 3000 features for 15621 cells, and 1 SCTModel(s) 
Top 10 variable features:
 IGKC, VCAN, IGHA1, IGLC2, AL136456.1, LINC02694, IGLC3, TCF7L2, BANK1, GNLY 

attr(,"assays")$prediction.score.celltype.l1
Assay data with 8 features for 15621 cells
First 8 features:
 other T, CD8 T, B, CD4 T, DC, NK, Mono, other 

attr(,"assays")$prediction.score.celltype.l2
Assay data with 30 features for 15621 cells
First 10 features:
 gdT, CD8 TEM, CD8 TCM, dnT, B intermediate, CD4 TCM, pDC, NK, B naive, CD14 Mono 

attr(,"assays")$prediction.score.celltype.l3
Assay data with 57 features for 15621 cells
First 10 features:
 gdT-3, CD8 TEM-2, CD8 TCM-1, dnT-2, B intermediate lambda, CD4 TCM-3, B intermediate kappa, CD8 TEM-1, CD4 TCM-1, pDC 

attr(,"assays")$ATAC
ChromatinAssay data with 170277 features for 15621 cells
Variable features: 170277 
Genome: 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 6 

attr(,"assays")$peaks
ChromatinAssay data with 86762 features for 15621 cells
Variable features: 0 
Genome: 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 6 

attr(,"active.assay")
[1] "ATAC"
attr(,"multilayer")
Multilayer network containing  2  bipartite networks and  2  multiplex networks.
 
- Multiplex names:  TF, SCT 
- Bipartite names:  tf_peak, atac_rna 
attr(,"motifs_db")
Motifs database object with :
-  1503 motifs
-  914  TFs
-  1540 TF to motif names mapping
> hummus_control
An object of class "Hummus_Object"
<S4 Type Object>
attr(,"assays")
attr(,"assays")$RNA
Assay (v5) data with 36601 features for 8363 cells
First 10 features:
 MIR1302-2HG, FAM138A, OR4F5, AL627309.1, AL627309.3, AL627309.2, AL627309.5, AL627309.4, AP006222.2, AL732372.1 
Layers:
 counts.Gene Expression.RNA_280, counts.Gene Expression.RNA_302 

attr(,"assays")$SCT
SCTAssay data with 27672 features for 8363 cells, and 2 SCTModel(s) 
First 10 features:
 AL627309.1, AL627309.5, AL627309.4, LINC01409, FAM87B, LINC01128, LINC00115, FAM41C, SAMD11, NOC2L 

attr(,"assays")$integrated
SCTAssay data with 3000 features for 8363 cells, and 0 SCTModel(s) 
Top 10 variable features:
 IGKC, VCAN, IGHA1, IGLC2, AL136456.1, LINC02694, IGLC3, TCF7L2, BANK1, GNLY 

attr(,"assays")$prediction.score.celltype.l1
Assay data with 8 features for 8363 cells
First 8 features:
 other T, CD8 T, B, CD4 T, DC, NK, Mono, other 

attr(,"assays")$prediction.score.celltype.l2
Assay data with 30 features for 8363 cells
First 10 features:
 gdT, CD8 TEM, CD8 TCM, dnT, B intermediate, CD4 TCM, pDC, NK, B naive, CD14 Mono 

attr(,"assays")$prediction.score.celltype.l3
Assay data with 57 features for 8363 cells
First 10 features:
 gdT-3, CD8 TEM-2, CD8 TCM-1, dnT-2, B intermediate lambda, CD4 TCM-3, B intermediate kappa, CD8 TEM-1, CD4 TCM-1, pDC 

attr(,"assays")$ATAC
ChromatinAssay data with 170277 features for 8363 cells
Variable features: 170277 
Genome: 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 6 

attr(,"assays")$peaks
ChromatinAssay data with 86762 features for 8363 cells
Variable features: 0 
Genome: 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 6 

attr(,"active.assay")
[1] "ATAC"
attr(,"multilayer")
Multilayer network containing  2  bipartite networks and  2  multiplex networks.
 
- Multiplex names:  TF, SCT 
- Bipartite names:  tf_peak, atac_rna 
attr(,"motifs_db")
Motifs database object with :
-  1503 motifs
-  914  TFs
-  1540 TF to motif names mapping

I would really appreciate any suggestions on what may be worth trying from here! And, thanks for your help!

Best,
Daniel

Troubleshooting Converting Seurat Object to Hummus Object with 10X-Multiome Dataset

Hello,

Thank you for creating this wonderful package. I'm excited to give it a test run with my 10X-Multiome dataset. As I started to follow along with your vignette using my own dataset, I encountered an error message when transitioning into a hummus object: "hummus <- as(pbmc, 'hummus_object')". The error is as follows: Error in validObject(object = .Object) : invalid class "Seurat" object: 'assays' must be a named list.

Here's what my "pbmc" object looks like:
An object of class Seurat
324407 features across 23619 samples within 8 assays
Active assay: RNA (36601 features, 0 variable features)
6 layers present: counts.Gene Expression.RNA_280, counts.Gene Expression.RNA_302, counts.Gene Expression.RNA_72, counts.Gene Expression.RNA_203, counts.Gene Expression.RNA_271, counts.Gene Expression.RNA_294
7 other assays present: SCT, integrated, prediction.score.celltype.l1, prediction.score.celltype.l2, prediction.score.celltype.l3, ATAC, peaks
5 dimensional reductions calculated: pca, integrated_dr, ref.umap, lsi, umap

Could you give me some suggestions on what may be worth trying from here?

Thanks,
Daniel

no 'Initiate_Hummus_Object' function

Hi!
Don't forget to add this part haha :)
hummus_object function results in ' no method for coercing this S4 class to a vector'.