kuanhao-chao / rnaseqrdata Goto Github PK

The experiment package for RNASeqWorkflow

rnaseqrdata's Introduction

👋 Hi, I'm Kuan-Hao Chao

🎓 I'm currently a third-year Ph.D. Candidate in Computer Science at the Center for Computational Biology, Johns Hopkins University, working with Steven Salzberg and Mihaela Pertea. . My academic journey started in Electrical Engineering at National Taiwan University (NTU), shifting towards computer science in my final year at the College of Engineering & Computer Science at Australian National University (ANU) 🦘🐨

🧬 My research interest intersects deep learning with genomics and transcriptomics:

In transcriptional regulatory networks, my work uses sequence models to decode DNA patterns, aiming to uncover insights into how cis-regulatory DNA sequences and trans-regulators interact. I am developing a yeast large language model (LLM) to better understand the mechanisms of yeast gene expression regulation.
In splice site predictiong, I built a deep dilated residual convolutional neural network to decode the complexities of RNA splicing, alternative splicing, and the impact of genetic variants on cryptic splicing (Learn more).
In genome assembly, I assembled and annotated the first gapless Southern Chinese Han genome, Han1, using PacBio HiFi and Oxford Nanopore long reads, with T2T-CHM13 as a guide (Learn more).
In pangenome indexing, I applied new renaming heuristics and an SMT solver to make the Wheeler graph recognition problem computationally feasible (Learn more).
In genome annotation, I used graph-based methods to stitch together fragmented DNA and protein alignments, thereby assembling them into more accurate annotations. (Learn more).

💻 I am an advocate for open-source software, embracing the philosophy of “build what you need, use what you build”. I invite you to explore my NEWS page for the latest updates on my projects.

💬 Feel free to reach out to me for collaborations, discussions, or just to say hi! Coffee chat! ☕️

🔍 Discover more about my work on my personal website.

rnaseqrdata's People

Contributors

Stargazers

Forkers

rnaimehaom

rnaseqrdata's Issues

Bug 4 vs 3 - RNASeqDifferentialAnalysis(exp)

I have this error when I compare 4 sample A versus 3 sample B with the function RNASeqDifferentialAnalysis(exp):

"Error : Aesthetics must be either length 1 or the same as the data (7): fill"

In 3 A versus 3 B I dont have this error.

Bug : sample name in RNASeqDifferentialAnalysis

Bonjour,

Les noms des échantillons dans les fichiers résultats issus des 3 outils : "ballgown", "deseq2" et "edgeR" ("DESeq2_normalized_result.csv",...) peuvent ne pas correspondre au vrai nom de l'échantillon.

Par exemple si on a "B1","B2" et "B3" on peut avoir écrit "B1.B", "B2.B", "B3.B" dans le fichier résultat mais en réalité le "B1.B" correspond à l'échantillon B3.

Je pense que cela est du à l'odre donné dans le fichier "phenodata.csv". Par exemple si j'écris :

A1 - A
A2 - A
B3 - B
A3 - A
B1 - B
B2 - B
L'échantillon B3 va devenir l'échantillon B1, le B1 va devenir l'échantillon B2 etc

La biliothèque devrait s'appuyer sur les noms des fichiers .bam... pour être sûr de l'identité de l'échantillon au lieu de s'appuyer sur le fichier "phenodata.csv"

Cordialement,
Jérémy Tournayre

Hello,

The names of the samples in the result files from the 3 tools: "ballgown", "deseq2" and "edgeR" ("DESeq2_normalized_result.csv", ...) may not match the real name of the sample.

For example if we have "B1", "B2" and "B3" it can be written "B1.B", "B2.B", "B3.B" in the result file but in fact the "B1.B" corresponds to sample B3.

I think this is due to the order given in the file "phenodata.csv". For example if I write in "phenodata.csv" :
A1 - A
A2 - A
B3 - B
A3 - A
B1 - B
B2 - B
Sample B3 will become sample B1, B1 will become sample B2 etc.

The library should rely on the names of the .bam files for example to be sure of the identity of the sample instead of relying on the file "phenodata.csv"

Sincerely,
Jérémy Tournayre

bug : check files - identical : sort

I have to add the sort command on these rows to avoid a bug :
gtf.check <- identical(target.gtf.files, actual.found.gtf.files)
-> gtf.check <- identical(sort(target.gtf.files), sort(actual.found.gtf.files))

   bam.check <- identical(target.bam.files, actual.found.bam.files)

-> bam.check <- identical(sort(target.bam.files), sort(actual.found.bam.files))

  sam.check <- identical(target.sam.files, actual.found.sam.files)

-> sam.check <- identical(sort(target.sam.files), sort(actual.found.sam.files))

kuanhao-chao / rnaseqrdata Goto Github PK

rnaseqrdata's Introduction

👋 Hi, I'm Kuan-Hao Chao

rnaseqrdata's People

Contributors

Stargazers

Forkers

rnaseqrdata's Issues

Bug 4 vs 3 - RNASeqDifferentialAnalysis(exp)

Bug : sample name in RNASeqDifferentialAnalysis

bug : check files - identical : sort

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent