Giter Site home page Giter Site logo

Comments (13)

bobGSmith avatar bobGSmith commented on June 1, 2024 1

@bschilder Ok nice. Let me know if there's anything you think I should change. I made the parallel EWCE function just save the completed analysis as separate rds files, as that makes it easier to run the analysis in chunks and then come back to it later. I made another function that reads through the results output directory to check which gene lists have already been analysed. And finally theres a function for merging all of the small rds files into one data frame with, an extra column for the gene list names (e.g. the phenotype). I should probably add an option to do BH corrections on the merge results function.

from rare_disease_celltyping.

bobGSmith avatar bobGSmith commented on June 1, 2024

github.com/ovrhuman/RareDiseaseEWCE
Made a repo for all the functions as a package here. @NathanSkene

from rare_disease_celltyping.

bobGSmith avatar bobGSmith commented on June 1, 2024

Changed the rmd file for my report to call functions from the RareDiseaseEWCE package. I just need to finish off with the parallel EWCE results generating function and add that to the package. Just running it on the unfinished phenotypes now but need to tweak some things for logging failed phenotypes and merging all results to one dataframe at the end.

from rare_disease_celltyping.

NathanSkene avatar NathanSkene commented on June 1, 2024

Just having a look through the functions here... many of them seem related to processing the HPO ontology. Might be worth pulling them out into a seperate repo. What percentage would you say are HPO specific? Could have an "HPO_explorer" package?

If you were to group the functions into "HPO ontology related functions" and "other", what would "other" be?

Ideally you want the functions within a package to be a somewhat coherant set which can have a vignette describing their general function.

from rare_disease_celltyping.

bobGSmith avatar bobGSmith commented on June 1, 2024

The results generating stuff (not included yet) is for running EWCE in parallel and merging results into 1 data frame. Then as you say there are functions for doing stuff with the HPO, like finding ontology levels or identifying connected components within a subset. Most of those are then used in another bunch of functions which are for plotting the HPO EWCE results. I could split it up into generating results EWCE results from multiple gene lists, generic HPO_explorer stuff, and specific HPO EWCE results visualisation and analysis?

It might be possible to make some of the stuff for plotting HPO EWCE results more generic for cases when you're working with multiple gene lists as well. Would have to take out bits where it does something HPO specific and make that part into its own function for the other package i suppose?

from rare_disease_celltyping.

NathanSkene avatar NathanSkene commented on June 1, 2024

I'd say the stuff for parallelising EWCE to run on multiple gene lists should be incorporated into the EWCE package. Or, given that there's functions for doing plotting etc as well, you could setup a new MultiEWCE package?

I'd defo lean towards seperating the EWCE related stuff from the HPO specific stuff though.

from rare_disease_celltyping.

bobGSmith avatar bobGSmith commented on June 1, 2024

Ok yeah makes sense. I will try to split it in to two separate packages for now then.

from rare_disease_celltyping.

NathanSkene avatar NathanSkene commented on June 1, 2024

For the HPO focused package @bschilder figured out a neat trick for storing the data files within github repos release files. Probably worth using that to store the HPO ontology data. Would save worrying about them changing the URL or file structure.

from rare_disease_celltyping.

bschilder avatar bschilder commented on June 1, 2024

I used a package called piggyback which lets you store large files as "Assets". Here's some example code of how I implemented this:
https://github.com/neurogenomics/phenomix/blob/main/R/get_data.R

For the HPO itself, there's several ways you could do this. You could either use the ontologyIndex package which includes the HPO as a dataset, or you could download it in tabular format like I did here. In this case, I converted it to a binary matrix and then put it with all it's metadata in a Seurat object:

https://github.com/neurogenomics/phenomix/blob/main/R/prepare_HPO.R

I've uploaded the processed Seurat object here:
https://github.com/neurogenomics/phenomix/releases/tag/latest

from rare_disease_celltyping.

bobGSmith avatar bobGSmith commented on June 1, 2024

Thanks @bschilder, at the moment I've just got ontologyIndex as a dependency. Do you think it would be better to store it like that?

I've also finished splitting the packages up into HPOExplorer (general HPO functions), MultiEWCE (Stuff related to EWCE on multiple gene lists), and HPOEWCE which is just really plotting functions fairly specific to this project.

from rare_disease_celltyping.

bschilder avatar bschilder commented on June 1, 2024

@ovrhuman I think keeping ontologyIndex as a dep is fine bc then you can benefit from any updates that might occur later.

from rare_disease_celltyping.

bschilder avatar bschilder commented on June 1, 2024

Regarding MultiEWCE, I'll take a look and see if there's some functions that would useful to add to EWCE 2.0 that I'm working on.

from rare_disease_celltyping.

bschilder avatar bschilder commented on June 1, 2024

Packages now passing CRAN and Bioc checks here:
https://github.com/neurogenomics/HPOExplorer
https://github.com/neurogenomics/MultiEWCE

from rare_disease_celltyping.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.