Giter Site home page Giter Site logo

Comments (11)

martinjzhang avatar martinjzhang commented on June 8, 2024 1

Thank you @yyoshiaki for sharing the results w/ and w/o --adj-prop. These results are helpful for informing our future investigation!

Best,
Martin

from scdrs.

maryellenlynall avatar maryellenlynall commented on June 8, 2024 1

Helpful to see this exchange - thank you.
I too am having issues with not seeing an expected signal in 10X data so keen to try imputing - can I ask if either of you have a tool / settings you recommend for imputation given you got good improvement with imputation here. Thanks!

from scdrs.

martinjzhang avatar martinjzhang commented on June 8, 2024

Hi,

Thank you for the question. The imputation results look great!

Another possibility for the low power is that the background cells are also moderately relevant to the disease (PBMCs) and may kill the signal. We are working on a feature that could potentially address this issue (see #30 (comment)).

Have you tried adjusting for cell type proportions using scdrs compute-score --adj-prop? Monocyte count seems to have
a signal in the raw data results. So maybe the cell type proportions are not very balanced and the disease signals associated to the major cell types are killed because scDRS uses all cells from the data set as the background (which are mostly from the major cell types).

from scdrs.

yyoshiaki avatar yyoshiaki commented on June 8, 2024

Thank you very much for your quick response.
Regarding #30, I'm looking forward to seeing the new feature or tutorials.

I also tried --adj-prop, but the following error occurred on the pbmc3k dataset...
I'm attaching commands and codes for reproducing the issue.

scdrs compute-score --h5ad-file ../scanpy/pbmc/pbmc.h5ad --h5ad-species human --gs-file ../data/gs_file/magma_10kb_top1000_zscore.75_traits.batch/batch0.gs --gs-species human --cov-file ../scanpy/pbmc/pbmc.cov.tsv --flag-filter-data False --flag-raw-count True --flag-return-ctrl-raw-score False --flag-return-ctrl-norm-score True --adj-prop cell_type --out-folder ../scanpy/pbmc/scDRS_nofilter_adj-prop
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file ../scanpy/pbmc/pbmc.h5ad \
--h5ad-species human \
--cov-file ../scanpy/pbmc/pbmc.cov.tsv \
--gs-file ../data/gs_file/magma_10kb_top1000_zscore.75_traits.batch/batch0.gs \
--gs-species human \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop cell_type \
--flag-filter-data False \
--flag-raw-count True \
--n-ctrl 1000 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score True \
--out-folder ../scanpy/pbmc/scDRS_nofilter_adj-prop

Loading data:
--h5ad-file loaded: n_cell=2638, n_gene=32738 (sys_time=0.1s)
First 3 cells: ['AAACATACAACCAC-1', 'AAACATTGAGCTAC-1', 'AAACATTGATCAGC-1']
First 5 genes: ['MIR1302-10', 'FAM138A', 'OR4F5', 'RP11-34P13.7', 'RP11-34P13.8']
--cov-file loaded: covariates=['n_genes', 'const'] (sys_time=0.1s)
First 5 values for 'n_genes': [781, 1352, 1131, 960, 522]
First 5 values for 'const': [1, 1, 1, 1, 1]
--gs-file loaded: n_trait=5 (sys_time=0.1s)
Print info for first 3 traits:
First 3 elements for 'PASS_ADHD_Demontis2018': ['ST3GAL3', 'KDM4A', 'PTPRF'], [6.4588, 6.2164, 5.8681]
First 3 elements for 'PASS_Alzheimers_Jansen2019': ['BCAM', 'CEACAM16', 'PVR'], [8.2095, 7.7059, 7.3852]
First 3 elements for 'PASS_AtrialFibrillation_Nielsen2018': ['KCNN3', 'DCST1', 'CAV2'], [7.911, 7.8622, 7.8516]

Preprocessing:
/home/yyasumizu/Programs/scDRS/scdrs/pp.py:378: RuntimeWarning: invalid value encountered in log10
  x = np.log10(df_gene["ct_mean"].values[not_const])
Traceback (most recent call last):
  File "/home/yyasumizu/anaconda3/envs/scDRS/bin/scdrs", line 740, in <module>
    fire.Fire()
  File "/home/yyasumizu/anaconda3/envs/scDRS/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/yyasumizu/anaconda3/envs/scDRS/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/yyasumizu/anaconda3/envs/scDRS/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/yyasumizu/anaconda3/envs/scDRS/bin/scdrs", line 211, in compute_score
    scdrs.preprocess(
  File "/home/yyasumizu/Programs/scDRS/scdrs/pp.py", line 260, in preprocess
    df_gene, df_cell = compute_stats(
  File "/home/yyasumizu/Programs/scDRS/scdrs/pp.py", line 380, in compute_stats
    model.fit()
  File "_loess.pyx", line 899, in _loess.loess.fit
ValueError: b'Extrapolation not allowed with blending'

Installation of scDRS

conda create -n scDRS python=3.9
conda activate scDRS

git clone https://github.com/martinjzhang/scDRS.git
cd scDRS
git checkout -b v102 v1.0.2
pip install -e .

Preps of datasets

from scdrs.

martinjzhang avatar martinjzhang commented on June 8, 2024

Hi,

Could you check that adata.obs.columns contains the column cell_type (input to adj-prop) and that your data doesn't contain negative values? If so and the error persists, could you send us a minimal reproducible example?

from scdrs.

yyoshiaki avatar yyoshiaki commented on June 8, 2024

Thank you, it contains the cell_type column.

In [8]: adata.obs
Out[8]: 
                        cell_type  n_genes
cell_id                                   
AAACATACAACCAC-1      CD4 T cells      781
AAACATTGAGCTAC-1          B cells     1352
AAACATTGATCAGC-1      CD4 T cells     1131
AAACCGTGCTTCCG-1  CD14+ Monocytes      960
AAACCGTGTATGCG-1         NK cells      522

Here is the notebook for the preparation https://github.com/yyoshiaki/scDRS_example/blob/master/notebook/prep_data.ipynb and h5ad https://github.com/yyoshiaki/scDRS_example/blob/master/scanpy/pbmc/pbmc.h5ad file.

from scdrs.

martinjzhang avatar martinjzhang commented on June 8, 2024

Hi @yyoshiaki ,

Thank you for the information. The error corresponds to a numerical issue due to the inclusion of many genes with 0 expression (16159/32738 of genes in the data). This is a boundary case we are not handling perfectly right now, so scDRS may or may not give errors when you try different things. This probably explains why the error didn't occur before. I will add it to my to-do list.

I suggest turning on the --flag-filter-data True flag to avoid this boundary case. This flag filters out genes expressed in <50 cells and cells expressing <250 genes.

from scdrs.

martinjzhang avatar martinjzhang commented on June 8, 2024

Hi @yyoshiaki ,

Thank you again for reporting the issue. We just pushed a new version v1.0.3 to fix this issue. Now you can run scDRS without turning on --flag-filter-data.

To use this version, please pull info from the latest git repo and update using pip install -e .. We will upload this version to PyPI after we have made a good number of changes to the software.

from scdrs.

yyoshiaki avatar yyoshiaki commented on June 8, 2024

Hi Martin,

Thank you very much for the quick response and fixation.
I could confirm --adj-prop without --flag-filter-data used v1.0.3.
I've actually turned off the filter option because for genes only expressed in small clusters, 50 cells seems a bit stringent. It resulted in generating the boundary case.

Anyway, adj-prop slightly changed the result.
I'm not sure whether it is an improvement, but I will play around using several datasets.

Thank you again for making such a nice tool and maintaining it.

Yoshi


left : no --adj-prop, right : --adj-prop
image

cf. # cells for each cluster

CD4 T cells          1144
CD14+ Monocytes       480
B cells               342
CD8 T cells           316
NK cells              154
FCGR3A+ Monocytes     150
Dendritic cells        37
Megakaryocytes         15

from scdrs.

martinjzhang avatar martinjzhang commented on June 8, 2024

Hi @maryellenlynall

Try MAGIC van Dijk Cell 2018. Our preliminary results show that it improves power and provides conservative estimates.

from scdrs.

maryellenlynall avatar maryellenlynall commented on June 8, 2024

Super, thanks!

from scdrs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.