Giter Site home page Giter Site logo

rnaseqrdata's Introduction

👋 Hi, I'm Kuan-Hao Chao

🎓 I'm currently a third-year Ph.D. Candidate in Computer Science at the Center for Computational Biology, Johns Hopkins University, working with Steven Salzberg and Mihaela Pertea. . My academic journey started in Electrical Engineering at National Taiwan University (NTU), shifting towards computer science in my final year at the College of Engineering & Computer Science at Australian National University (ANU) 🦘🐨


🧬 My research interest intersects deep learning with genomics and transcriptomics:

  • In transcriptional regulatory networks, my work uses sequence models to decode DNA patterns, aiming to uncover insights into how cis-regulatory DNA sequences and trans-regulators interact. I am developing a yeast large language model (LLM) to better understand the mechanisms of yeast gene expression regulation.
  • In splice site predictiong, I built a deep dilated residual convolutional neural network to decode the complexities of RNA splicing, alternative splicing, and the impact of genetic variants on cryptic splicing (Learn more).
  • In genome assembly, I assembled and annotated the first gapless Southern Chinese Han genome, Han1, using PacBio HiFi and Oxford Nanopore long reads, with T2T-CHM13 as a guide (Learn more).
  • In pangenome indexing, I applied new renaming heuristics and an SMT solver to make the Wheeler graph recognition problem computationally feasible (Learn more).
  • In genome annotation, I used graph-based methods to stitch together fragmented DNA and protein alignments, thereby assembling them into more accurate annotations. (Learn more).

💻 I am an advocate for open-source software, embracing the philosophy of “build what you need, use what you build”. I invite you to explore my NEWS page for the latest updates on my projects.

💬 Feel free to reach out to me for collaborations, discussions, or just to say hi! Coffee chat! ☕️

🔍 Discover more about my work on my personal website.

rnaseqrdata's People

Contributors

kuanhao-chao avatar

Stargazers

 avatar

Forkers

rnaimehaom

rnaseqrdata's Issues

Bug 4 vs 3 - RNASeqDifferentialAnalysis(exp)

I have this error when I compare 4 sample A versus 3 sample B with the function RNASeqDifferentialAnalysis(exp):

"Error : Aesthetics must be either length 1 or the same as the data (7): fill"

In 3 A versus 3 B I dont have this error.

Bug : sample name in RNASeqDifferentialAnalysis

Bonjour,

Les noms des échantillons dans les fichiers résultats issus des 3 outils : "ballgown", "deseq2" et "edgeR" ("DESeq2_normalized_result.csv",...) peuvent ne pas correspondre au vrai nom de l'échantillon.

Par exemple si on a "B1","B2" et "B3" on peut avoir écrit "B1.B", "B2.B", "B3.B" dans le fichier résultat mais en réalité le "B1.B" correspond à l'échantillon B3.

Je pense que cela est du à l'odre donné dans le fichier "phenodata.csv". Par exemple si j'écris :

A1 - A
A2 - A
B3 - B
A3 - A
B1 - B
B2 - B
L'échantillon B3 va devenir l'échantillon B1, le B1 va devenir l'échantillon B2 etc

La biliothèque devrait s'appuyer sur les noms des fichiers .bam... pour être sûr de l'identité de l'échantillon au lieu de s'appuyer sur le fichier "phenodata.csv"

Cordialement,
Jérémy Tournayre

Hello,

The names of the samples in the result files from the 3 tools: "ballgown", "deseq2" and "edgeR" ("DESeq2_normalized_result.csv", ...) may not match the real name of the sample.

For example if we have "B1", "B2" and "B3" it can be written "B1.B", "B2.B", "B3.B" in the result file but in fact the "B1.B" corresponds to sample B3.

I think this is due to the order given in the file "phenodata.csv". For example if I write in "phenodata.csv" :
A1 - A
A2 - A
B3 - B
A3 - A
B1 - B
B2 - B
Sample B3 will become sample B1, B1 will become sample B2 etc.

The library should rely on the names of the .bam files for example to be sure of the identity of the sample instead of relying on the file "phenodata.csv"

Sincerely,
Jérémy Tournayre

bug : check files - identical : sort

I have to add the sort command on these rows to avoid a bug :
gtf.check <- identical(target.gtf.files, actual.found.gtf.files)
-> gtf.check <- identical(sort(target.gtf.files), sort(actual.found.gtf.files))

   bam.check <- identical(target.bam.files, actual.found.bam.files)

-> bam.check <- identical(sort(target.bam.files), sort(actual.found.bam.files))

  sam.check <- identical(target.sam.files, actual.found.sam.files)

-> sam.check <- identical(sort(target.sam.files), sort(actual.found.sam.files))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.