Giter Site home page Giter Site logo

Explore mode for recipes about ml4h HOT 6 CLOSED

erikr avatar erikr commented on July 22, 2024
Explore mode for recipes

from ml4h.

Comments (6)

erikr avatar erikr commented on July 22, 2024

@paolodi You made this work with interpretations and tested it on UKBB tensors. What else must be done before the PR?

Also we should merge my Partners branch with master soon so I can revise my tmaps and explore to use interpretations. Can we discuss next week with the team?

from ml4h.

erikr avatar erikr commented on July 22, 2024

I need to:

  • Resolve difference between intersect and union counts for test cohort.
  • Add missing_fraction to summary_stats_string
  • Add missing_fraction to summary_stats_continuous
  • Add variance to summary_stats_continuous
  • Parallelize

from ml4h.

erikr avatar erikr commented on July 22, 2024

@paolodi note the latest version of explore is in recipes.py in my branch er_tensorize_partners_ecgs rather than er_explore_tensormaps. I want to delete the latter branch as I have not used it for anything. What do you think?

from ml4h.

paolodi avatar paolodi commented on July 22, 2024

I'd leave it there, as that's the branch the PR refers to. Deleting it would close the PR again. I'll take care of merging it with er_tensorize_partners_ecgs and once it works again with master we can ask for an additional pair of eyes

from ml4h.

erikr avatar erikr commented on July 22, 2024

Thanks!

from ml4h.

erikr avatar erikr commented on July 22, 2024

@paolodi I finished more modifications (378a4ab), so please checkout recipes.py from the latest er_tensorize_partners_ecgs branch.

  • Resolve difference between intersect and union counts for test cohort.

Differences are expected behavior because ∩ across tmaps will be lower than union. Duh.

  • Add missing_fraction to summary_stats_string
  • Add missing_fraction to summary_stats_continuous
  • Add variance to summary_stats_continuous
  • Parallelize

iterating through many hd5s is i/o bound; parallelizing slowed down overall run time

I also now calculate summary stats using Pandas functions whenever possible, since they are fast, and work on pd.Series of np.arrays. I had trouble using Numpy functions with our nested data structure.

The most important change is that tensor_to_df is not called for each interpretation.

tensor_to_df iterates through every tensor and consolidates them into a big dataframe.

Now, for each interpretation, we only iterate through the keys (of the df) that belong to that interpretation.

Calling it once instead of three times reduces the run time to 33% of before!

Other changes for future PRs are:

  • Improving performance with JIT compilation via Numba
  • Modularizing most of explore into functions, and housing those functions inside of explore.py
  • Plotting histograms for some of the summary stats (especially continuous)

from ml4h.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.