Giter Site home page Giter Site logo

st_simulation's Introduction

Simulation of Spatial Transcriptomics spots from single-cell reference

This repo provides a collection of scripts used to generate simulated spatial transcriptomics data as a mixture of single-cell transcriptomics profiles (adapting code and model from Andersson et al. 2019). Parameters are chosen to simulate the characteristics of 10X Genomics Visium chips.

Run simulation

Initial input:

  • AnnData object of raw counts per single-cells (saved as h5ad file)
  • A table of cell type annotations per cell that we want to deconvolve (saved as csv file)

Step 1: Split single-cell dataset: we split the cells in the single-cell dataset in a 'generation set', that will be used to simulate the ST spots, and a 'validation' set, that will be used to train the deconvolution models that we want to benchmark. From the command line:

python split_sc.py <counts_h5ad> <annotation_csv> --annotation_col annotation_1  --out_dir <output_directory>

Output: generation and validation count matrices and cell type annotations are saved as pickle files, with a random seed identifying the split.

Step 2: Build design matrix: in this step we define which cell types are (A) low/high density and (B) Uniformly present in all the spots or localized in few spots (regional). To generate synthetic spots with ~10 cells per spot (as seen with nuclear segmentation on Visium spots) we reccommend setting the mean number of cells per spot per cell type < 5.

n_spots=100
seed=$(ls labels_generation* | sed 's/.*_//' | sed 's/.p//')
python ST_simulation/assemble_design.py \
    $seed \
  --tot_spots $n_spots --mean_high 3 --mean_low 1 \
  --out_dir <output_directory>

Output: synthetic_ST_seed${seed}_${assemble_id}_design.csv contains the design used for the simulation:

Column Data
uniform is the cell type uniformly located across spots (1) or localized in a small subset of spots (0)
density is the cell type present in a spot at low density (1) or high density (0)
nspots total number of spots in which the cell type is located
mean_ncells mean number of cells per spot

Step 3: Assemble cell type composition per spot: based on the design matrix, we define the cell type composition of each spot i.e. how many cells per cell type are in each spot. An assemble ID is used to identify the assembly (we assemble many composition matrices with the same design).

id=1
python cell2location/pycell2location/ST_simulation/assemble_composition.py \
    $seed \
    --tot_spots $n_spots --assemble_id $id

Output: synthetic_ST_seed${seed}_${assemble_id}_composition.csv contains the number of cells per cell type in each spot, for benchmarking deconvolution models.

(Step 4) Assemble simulated ST spots

python assemble_st.py ${seed} --assemble_id $id

Output:

  • synthetic_ST_seed${seed}_${assemble_id}_counts.csv contains the count matrix for the simulated ST spots
  • synthetic_ST_seed${seed}_${assemble_id}_umis.csv contains the number of UMIs per cell type in each spot, for benchmarking deconvolution methods that model number of UMIs

Speeding up the simulation

The current implementation is not optimized for speed, it takes ~ 2 minutes to assemble 100 spots. At the moment my suggestion to simulate thousands of spots is to assemble the design matrix once (step 1 above), then run steps 2 and 3 many times using wrapper

run_simulation2.sh <seed> <n_spots> <id> 

then merge in one object

python merge_synthetic_ST.py . $seed

st_simulation's People

Contributors

vitkl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.