Giter Site home page Giter Site logo

joshspeagle / frankenz Goto Github PK

View Code? Open in Web Editor NEW
16.0 11.0 3.0 644.71 MB

A photometric redshift monstrosity

License: MIT License

Python 100.00%
pure-python bayesian-inference mit-license photo-z nearest-neighbors self-organizing-map growing-neural-gas template-fitting

frankenz's People

Contributors

joshspeagle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frankenz's Issues

workflow reorganization

This is part of a larger initiative to reorganize the workflow so that most of the computation is internal. This way a user should just be able to do something like:

import object
obj = object(data, params)
object.fit(new_data)
object.compute_pdf(params)
zpdfs = object.zpdfs

This should enable a very clean workflow for general users, while advanced users can still access internal quantities for more advanced analyses.

grid-based object

In line with the SOM and KMCkNN objects (and their utilities), I should also establish an object to deal with a grid-based approach.

som utility?

I'd imagine that my pre-existing SOM code should be actually incorporated into this package as a helpful way to visualize/analyze data. There should be associated plotting utilities, as well as an easy way to turn those results into photo-z predictions for users who are interested in a comparison.

better nn searches

Internal tests have found that the biases involved with using a simple k nearest-neighbor search (rather than a radius nearest-neighbor search) can be more severe than I had thought for objects with noisy photometry (i.e. when objects have a lot of potential neighbors). Some tests involving switching over to a ball tree and using the Mahalanobis distance when searching appears to perform much better, so that should be the method of choice.

Also, I should increase the default K to be ~100, since 25 appears to be too small when comparing results for individual objects.

This will also require moving over to sklearn for the nearest neighbor searches.

hierarchical bayes

Either spin this off as a standalone package or integrate it into frankenz.

stretch goal: selection effects (general overlap integral support)

One of the results I really liked deriving was the nice way that hard selection boundaries modify the naive Gaussian PDF. I'd like to add in some way to account for this effect in the code.

The process of doing so will probably entail adding some type of functionality to deal with overlap integrals for arbitrary PDFs. This should be doable (in theory) using path sampling if we phrase the problem as starting from
q_0 = N(F_g|F,C_g) N(F_h|F,C_h)
with
z_0 = \int q_0 dF = N(F_g|F_h,C_g+C_h)
and evolving to
q_1 = P(F_g|F,C_g) P(F_h|F,C_h) P(F)
with
z_1 = \int q_1 dF
since we know z_0 and path sampling gives us a way to estimate z_1/z_0. The big challenge is getting an MCMC sampler to draw independent samples along the path, although that could be a fun problem to solve.

smoothing kernel

I currently don't have one implemented even though it should be straightforward to add in theory. I should probably add this in to whatever I code up for #5.

plotting utilities

Currently, I have a lot of plots which I'm making by hand. I should add a lot of these into the plotting module so I don't have to do that as much anymore. These should include:

  • PDF v CDF_vals
  • ECDF v CDF_vals
  • 2-D network colormap
  • network node plots (SEDs)

assign brown seds types?

Might be worth it so they can work within the BPZ-style prior, even if it is ultimately arbitrary.

outlier modeling

Add in outlier modeling to the samplers, where the outlier model is/is not allowed to float. This should allow users to have 5 possible combinations of options:

  • population
  • population w/ outliers
  • hierarchical
  • hierarchical w/ outliers
  • hierarchical w/ outliers + outlier model

gng utility

I want to implement a growing neural gas to deal with some of the issues the SOM has. I'll need a way to de-project it though so users can plot it in 2 dimensions. Maybe I can use other sklearn manifold-learning algorithms for that.

add post-nn utilities

Right now the scheme I've constructed for turning KMCkNN results into predictions involves mostly manual operations. I should probably automate this so that users can just call a function and convert their results into redshift PDFs. This should ideally take place through some type of object so that other intermediate quantities such as posterior values, etc. actually get saved internally.

no impute data module

Currently there is no way to impute missing data. I need to re-add this feature in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.