This is intended as a discussion issue where we can hash out an initial design for the package. The goal is to
- tie down the structure of the package, including how the internals should be designed to lead to make implementing a pleasant user-facing API
- determine what flavours of approximate inference we want to target first / what is feasible for @sharanry to target over the summer.
None of this is set in stone, so please feel free to chime in with any thoughts you might have on the matter. In particular if you think that I've missed something obvious from the design that could restrict us down the line, now would be a good time to bring it up.
Background
In an ideal world, the API for GPs with non-Gaussian likelihoods would be "Turing" or "Soss", in the sense that we would just put a GP into a probabilistic programme, and figure out everything from there. This package, however, is not aiming for that level of generality. Rather it is aiming for the tried-and-tested GP + likelihood function API, and providing a robust and well-defined API + collection of approximate inference algorithms to deal with this.
API
Taking a bottom-up approach to design, my thinking is that the following basic structure should be sufficient for our needs:
f = GP(m, k)
fx = LatentGP(f, x, ϕ)
log_density(fx, f)
where
f
is some GP whose inputs are of type Tx
,
x
is some subtype of AbstractVector{Tx}
,
ϕ
is a function from AbstractVector{<:Real}
to Real
that computes the log likelihood a particular sample from f
at x
, and
log_density(fx, f) := logpdf(fx, f) + ϕ(f)
(it's not clear to me whether this function is ever non-trivial)
This structure encompasses all of the standard things that you'll see in ML, but is a little more general, as the likelihood function isn't restricted to be independent over outputs. To make things convenient for users, we can set up a couple of common cases of ϕ
such as factorised likelihoods: a type that specifies that ϕ(f) = sum(n -> ϕ[n](f[n]), eachindex(x))
, and special cases of likelihoods for classification etc (the various things implemented in GPML). I've not figured out exactly what special cases we want here, so we need to put some thought into that.
This interface obviously precludes expressing that the likelihood is a function of entire sample paths from f
-- see e.g. [1] for an instance of this kind of thing. I can't imagine this being too much of an issue as all of the techniques for actually working with such likelihoods necessarily involve discretising the function, which we can handle. This means that they can still be implemented in an only slightly more ugly manner. If this does turn out to be an actual issue for a number of users, we can always generalise the likelihood a bit.
Note that this approach feels quite stan-like, in that it just requires the user to specify a likelihood function.
Approximate Inference + Approximate Inference Interface
This is the bit of the design that I'm least comfortable with. I think that we should focus on getting NUTS / ESS working in the first instance, but it's not at all clear to me what the appropriate interface is for approximate inference with MCMC, given that we're working outside of a PPL. In the first instance I would propose to simply provide well documented examples that show how to leverage the above structure in conjunction with e.g. AdvancedHMC to perform approximate inference. It's possible that we really only want to provide this functionality at the GPML.jl level, since you really need to include all of the parameters of the model, both the function f
and any kernel parameters, to do anything meaningful.
The variational inference setting is probably a bit clearer what to do, because you can meaningfully talk about ELBOs etc without talking too much about any kernel parameters. e.g. we might implement function along the lines of elbo(fx, q)
, where q
is some approximate posterior over f(x)
. It's going to be a little bit down the line before we start looking at this though, possibly we won't get to it at all over the summer, although it would definitely be good to look at how to get some of the stuff from AugmentedGaussianProcesses into this package. @theogf do you have any thoughts on the kinds of things that would be necessary from an interface-perspective to make this feasible?
Summary
In short, this package is likely to be quite small for a while -- more or less just a single new type and some corresponding documentation while we consider MCMC. I would envisage that this package will come into its own when we really start going for variational inference a little bit further down the line.
@yebai @sharanry @devmotion @theogf -- I would appreciate your input.
[1] - Cotter, Simon L., et al. "MCMC methods for functions: modifying old algorithms to make them faster." Statistical Science (2013): 424-446.