gqi / daesc Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 1.0 262 KB

R 88.86% Shell 11.14%

daesc's People

Contributors

Stargazers

Watchers

Forkers

sachinkavindaa

daesc's Issues

Pre-processing the data from 10xchromium

Hello,

I am now struggling to generate the base quality score table for the bam file generated from cellranger pipeline. As the cellranger mapped the sequencing file to grch38 genome, the vcf file from gatk resource bundle is not compatible. While the vcf files from 1000 genome project are doable, they are splited into different chromosome and each file take up a bit of storage place. Is there any suggestion to get around with this problem or if there's any way to merge the base quality score table generated for each chromosome into one?

Identifying imprinted genes from reciprocal cross and scRNA-seq data

I am impressed by your tool and would like to leverage its capabilities in my research. Using a reciprocal mouse strain cross, I have generated stem cell lines from two distinct rat strains with characterized SNPs. I have performed scRNA-seq on these cell lines from both reciprocal crosses. I am specifically interested in identifying imprinted genes within this dataset using your DAESC algorithm. Can you please advise on the best approach to integrate my SNP and scRNA-seq data into your tool for this purpose?

Thank you

Implicit phasing

I'm using this tool and I noticed that it supports implicit phasing. Could you please clarify how implicit phasing differs from the standard (explicit) phasing? Additionally, is it possible for me to obtain the phased values using this tool, and if so, how?

Thank you!

Skewed p-value distribution

Hello DAESC team,

I have a question about interpreting the result using the DAESC-MIX model. I have tested ~10000 genes for allelic imbalance among three conditions. Then I plotted a histogram of p-values.

Looking at the p-value distribution, it is very skewed, identifying almost all genes as significant.
Can you share your insights on this?

The dataset used here is 10X Visium.

Thank you!

Specifying the design matrix for differential ASE

Hello DAESC dev team!

First of all, thank you for developing such flexible framework to test for allelic imbalance.

I am trying to incorporate DAESC into my analysis pipeline that tests for differential ASE across multiple individuals accounting for the disease status (condition), spatial location (cortical layers) and cell type.

To begin with, I applied DAESC on a toy dataset from one gene, two conditions, and two individuals per condition. Here, I fit the baseline model (DAESC-BB) specifying the design matrix x as a binary numeric array denoting the condition.

Now, I want to extend this by also taking into account the spatial information and the interaction terms. Since both condition and spatial location as in cortical layers are categorical variables, I tried using model.matrix function to encode the information in one-hot matrix. Here is the structure of the data frame I am working with and how I am invoking the daesc_bb function.

str(cur.df)
'data.frame': 3046 obs. of 8 variables:
$ gene : chr "HES6" "HES6" "HES6" "HES6" ...
$ barcode : chr "CTCTCTAACTGCCTAG" "TCGGCGTACTGCACAA" "GGGCAGGATTTCTGTG" "CGGTTCCGGCTTCTTG" ...
$ allele1_count: num 1 1 1 1 1 1 0 3 0 1 ...
$ allele2_count: num 1 0 0 0 1 1 1 0 1 0 ...
$ total_count : num 2 1 1 1 2 2 1 3 1 1 ...
$ condition : Factor w/ 2 levels "HC","PD": 1 1 1 1 1 1 1 1 1 1 ...
$ sample_id : chr "BN0339" "BN0339" "BN0339" "BN0339" ...
$ layer : Factor w/ 7 levels "Layer 1","Layer 2",..: 7 4 7 4 5 3 6 6 5 6 ...

myformula = ~ cur.df$condition + cur.df$layer + cur.df$condition:cur.df$layer + 0
one_hot = model.matrix(myformula)

str(one_hot)
num [1:3046, 1:14] 1 1 1 1 1 1 1 1 1 1 ...
- attr(, "dimnames")=List of 2
..$ : chr [1:3046] "1" "2" "3" "4" ...
..$ : chr [1:14] "cur.df$conditionHC" "cur.df$conditionPD" "cur.df$layerLayer 2" "cur.df$layerLayer 3" ...
- attr(, "assign")= int [1:14] 1 1 2 2 2 2 2 2 3 3 ...
- attr(*, "contrasts")=List of 2
..$ cur.df$condition: chr "contr.treatment"
..$ cur.df$layer : chr "contr.treatment"

res = daesc_bb(y=cur.df$allele1_count, n=cur.df$total_count, subj=cur.df$sample_id, x=one_hot)

When I run this as is, I get the following error:

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
cur.df.conditionPD
NA
Error in aod::betabin(cbind(y, n - y) ~ ., random = ~1, data = data.frame(y, :
Initial values for the fixed effects contain at least one missing value.

I think the reason is in the design matrix, where the first two columns have essentially the same information. After dropping the first column in one_hot, it runs without error, but I want to double check whether this is the intended use of this variable when supplying multiple categorical variables.

Finally, I also want to add cell type information as another independent variable. In my case, cell type is not a categorical variable, since the data was generated using Visium. So for each cell type, the input data will be its estimated abundance. If I were to input all (1) condition, (2) layer and (3) cell type into x, how should I structure it?

Thank you!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.