juliagaussianprocesses / gplikelihoods.jl Goto Github PK
View Code? Open in Web Editor NEWProvides likelihood functions for Gaussian Processes.
License: MIT License
Provides likelihood functions for Gaussian Processes.
License: MIT License
Please edit directly to add others.
Do we want to include Sparse Gaussian Processes in this package or create an extra one for it?
In my experience a lot of things can be written once for both sparse and non-sparse models, so having them in the same package might simplify things
I'm trying to implement the GP with a Poisson likelihood whose parameter may vary with the observations1:
$$ y_i \sim Poisson(e_i exp(f_i))$$
where f_i is the GP. Is this possible?
Approximate inference for disease mapping with sparse Gaussian processes, Jarno Vanhatalo, Ville Pietiläinen, Aki Vehtari (Eq. 1) ↩
Complement to JuliaGaussianProcesses/ApproximateGPs.jl#53
The expectation of the log-likelihood under q(f), ∫ q(f) log p(y | f) df, isn't specific to ApproximateGPs themselves, and if someone added a new likelihood here, it'd be great if they had visibility into also providing e.g. analytic implementation of expected_loglik or at least thinking about what _default_quadrature should be.
expected_loglik
currently has the following signature:
expected_loglik(quadrature::QuadratureMethod, y::AbstractVector, q_f::AbstractVector{<:Normal}, lik)
Some questions on the design:
To me it would seem more natural to either order it as
expected_loglik(quadrature, q_f, lik, y)
or expected_loglik(quadrature, lik, q_f, y)
-- we're unlikely to dispatch on y
(it's implicitly defined by what the likelihood is already), and the first form matches the common maths notation much more closely.
expected_loglik
takes a Vector y
, a Vector q_f
, and a lik
function that maps from scalar f
to UnivariateDistribution; expected_loglik
then does the broadcasting internally. Is that what we want?
expected_loglik
handle scalars only (but this might result in some performance loss for e.g. recomputing the Gauss-Hermite points and weights for each data point)?lik
to take the full Vector fs
and return e.g. a Product() of UnivariateDistributions (though this might make the expectation code more complicated)?pdf
instead of loglikelihood
(NB- should that be logpdf
? the difference is in summing over ys...). Should we have expected_pdf
separately (maybe re-using internal quadrature methods)?#63 added an implementation of a negative binomial likelihood. This prescribed the parametrisation in terms of (number of successes r, failure probability p), where the number of successes is a (scalar, equal for all observations) likelihood parameter and the failure probability is modelled with a latent GP f, squashed through the logistic link function by default, so p = logistic(f). This parametrisation appears to be common e.g. in intermittent count time series modelling. However, in other areas different parametrisations are more common, e.g. directly using the latent GP to parametrise the mean of the observations, and a scalar parameter to measure overdispersion. Moreover, there are multiple variants of the latter parametrisation (Type I and II being the most common). This is somewhat problematic as, unlike with the "bare-bones" Distributions object, a user cannot re-use the NegativeBinomialLikelihood
object for one of the other parametrisations. I'm not quite sure what we should do here, just noting that it seems potentially rather confusing to give a specific name to one meaning of something that actually has multiple meanings in different areas (with no easy way of converting)... 🤔
Parametrisations:
NegativeBinomialLikelihood
:
Looks like there was a typo in the @JuliaRegistrator command: 4b33f7a#commitcomment-86096069
With the NBParamII
for sufficiently large values of f
(something like 800), p
will be equal to zero and many estimator starts to break down. This might be something to be dealt in Distributions.jl
though.
I have the feeling that LikelihoodFunctions is a bit too general for this package. If I understand correctly, this package aims at propose models with Gaussian and non-Gaussian likelihoods. Shouldn't the name be then GPLikelihoods
, NonConjugateGPs
or something like this?
https://juliagaussianprocesses.github.io/GPLikelihoods.jl/ (whether stable/ or dev/) gives a 404
@devmotion Would that be okay to copy the code of BernoulliLogit
and BinomialLogit
from Turing
to have a special case on Bernoulli{<:LogisticLink}
?
For convenience, the old version of expected_loglikelihood
(Gauss-Hermite quadrature method) looked like this:
GPLikelihoods.jl/src/expectations.jl
Lines 83 to 109 in e9b7da9
#90 introduced a work-around/hack for two (possibly interrelated) issues of that implementation:
NegativeBinomialLikelihood
,The type-instability of the broadcast could be related to JuliaLang/julia#45748 (see also JuliaGaussianProcesses/KernelFunctions.jl#458, which documents a strange behavior in which inference depends on previous evaluations, which I also observed in the Broadcast
construction, but could not resolve).
I therefore tried to get rid of the Broadcast
entirely, hoping that type stability would improve performance. First I tried a custom implementation of pairwise sum for two function arguments (i.e. I implemented sum(f, X, Y)
, which is equivalent to mapreduce(f, +, X, Y)
based on the implementation of mapreduce(f, +, X)
in Base, see #77 for the reason behind this). That implementation can be found in here.
Although that implementation makes the function type stable, it was still not very performant. For this reason, in #90 I chose an implementation which allocates an explicit array, which I believed to be more Zygote-friendly. The large improvements over the old versions seen in the benchmarks confirmed this intuition (see #90 (comment)).
However, the solution is not very clean, as it pays for this performance improvement with additional allocations in the forward pass. It is also unclear whether this implementation will be as beneficial or generalize to other AD backends. The potentially better approach would be to define an rrule
for the broadcasted sum, as suggested in #90 (comment) (although even then it would be nice to have the function be type-stable anyway).
Implement a likelihood with the Student-T distribution.
The input GP corresponds to the mean in the following definition of the non-standardized Student-t distribution.
Right now to compute logpdf
or loglikelihood
given a pair of GP samples f
and observations y
one needs to do:
logpdf(likelihood(f), y)
would that make sense to have a small wrapper
function logpdf(like::Likelihood, f_y::NamedTuple)
logpdf(likelihood(f_y.f), f_y.y)
end
since this is what rand
would return for a LatentGP
:
https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/blob/e7c67bd24c454f0b8179032ecbb82284887159c0/src/latent_gp.jl#L37
(and similarly for loglikelihood
)
Open question from #42:
expected_loglik
takes a Vectory
, a Vectorq_f
, and alik
function that maps from scalarf
to UnivariateDistribution;expected_loglik
then does the broadcasting internally. Is that what we want?
- Would it be better (cleaner) to have
expected_loglik
handle scalars only (but this might result in some performance loss for e.g. recomputing the Gauss-Hermite points and weights for each data point)?- Should we expect
lik
to take the full Vectorfs
and return e.g. a Product() of UnivariateDistributions (though this might make the expectation code more complicated)?- How would we want to handle the heteroskedastic case, where we do want to include the correlations across the two outputs at each data point, but independence between different data points (not sure how we would handle that on the AbstractGPs side given that multi-output is all "output as input")?
Could extend it so you can directly pass a FiniteGP
as follows:
function expected_loglikelihood(
quad, lik::AbstractFactorizingLikelihood, q_f::AbstractMvNormal, y::AbstractVector
)
mean, std = mean_and_std(q_f)
return expected_loglikelihood(quad, lik, Normal.(mean, std), y)
end
Would have to think how to handle likelihoods depending on multiple function evaluations (e.g. heteroskedastic likelihood with a vector-valued GP).
All that I need to specify a likelihood is the mapping from f to a distribution, and in many cases the only degree of freedom is the inverse link function: it would be super convenient if I could just write PoissonLikelihood(log1pexp)
to try out a softplus invlink, without having to code up a subtype of AbstractLink and having to remember what I have to do to implement it.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I think before implementing any more likelihoods we should think carefully about what standardised tests we want to run on them, similarly to what I've just done in this PR: JuliaGaussianProcesses/KernelFunctions.jl#159
Looking at the various open PRs (#17, #16 , #15, and maybe #14), and the already implemented likelihoods (Gaussian
and Poisson
) there appears to be quite a lot of repeated code in the tests. It would be great to get a handle on this before it becomes too large a job to easily refactor.
Pulling out some common themes from these, it looks like we want to be testing that
@sharanry you've done most of the work on this so far. Is there anything else that's obvious to you that we're missing?
There's also quite a lot of code that manually generates input and output data / constructs appropriate "latent" AbstractGPs
at the minute -- in my opinion it would be a very good idea to factor this code out as well.
Right now it's quite inconsistent through the code
I was studying the implementation of the expected likelihood and stumbled upon this:
GPLikelihoods.jl/src/expectations.jl
Lines 84 to 96 in 498af31
This uses Broadcast.broadcasted
, which is not documented. If I'm not mistaken, this means that it is not part of Julia's public API, and could break at any point in time. Besides that, I wonder what is the advantage over this:
function expected_loglikelihood(
gh::GaussHermiteExpectation, lik, q_f::AbstractVector{<:Normal}, y::AbstractVector
)
# Compute the expectation via Gauss-Hermite quadrature
# using a reparameterisation by change of variable
# (see e.g. en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature)
return sum(expected_loglikelihood(gh, lik, q_fᵢ, yᵢ) for (q_fᵢ, yᵢ) in zip(q_f, y))
end
This also seems to be type-stable, while the current implementation isn't.
JuliaStats/Distributions.jl#1536 is proposing different parametrizations of the NegativeBinomial likelihood, particularly the scale-shape parametrization that corresponds to the NBParamMean types introduced in #80. Once JuliaStats/Distributions.jl#1536 is released as part of Distributions.jl, we should update our code accordingly (specifically, we should be able to remove the _nb_mean_excessvar_to_r_p
code).
Do we even need to depend on AbstractGPs in this package? AFAICT, there's no functionality which actually depends on it...
Right now the nodes and weight are recomputed every time expected_loglik
is called. The constructor of GaussHermite
could simply store the two as vectors.
This issue, or rather discussion of the desired API, came up in #48.
With this PR now ExpLink
etc. is a type alias of Link{typeof(exp)}
and ExpLink()
calls Link(exp)
. Intentionally, the PR was non-breaking. However, it leads to the following questions:
Personally, I am favour of 2 and probably of 1:
ExpLink
over Link(exp)
but there would be fewer definitions and the API would be cleaner if the former would be removed.Link{typeof(exp)}
instead of ExpLink
. However, I think it is questionable if this use case is a strong enough argument to include type aliases in GPLikelihoods. Instead in my opinion it would be sufficient if users and downstream packages define (possibly internal) type aliases if it is useful for their implementation. So currently I think type aliases should only be included but not necessarily exported if they are useful and used in GPLikelihoods.@theogf suggested to add a macro @link CosLink cos
that would define both the type alias and constructor (hence it assumes that we don't deprecate any of them). I'm not in favour of a macro, both because I think we should deprecate probably both definitions and because I think the definitions are already quite short and simple and so the macro feels too magical given these limited benefits.
I've always wanted to have support for ordinal regression, presumably implemented along the lines of [1]. Not particular reason -- just because it's fun.
[1] - Chu, Wei and Zoubin Ghahramani. "Gaussian processes for ordinal regression." Journal of machine learning research 6.7 (2005).
Here is to discuss about the design concerning variational inference methods.
So far in AugmentedGaussianProcesses.jl things are done this way:
∇E_μ
and ∇E_Σ
which return the gradient of the expected log likelihood given mu and Sigma (the second one returns an arrays as usually you don't need the covariance terms of Sigma).I cannot count how many times I lost nerves on debugging issues on my script to only find out that I used categorical variables instead of 0-1 for my BernoulliLikelihood
.
Therefore here is my proposal:
Currently, the CategoricalLikelihood
is defined to take a vector of C-1 inputs to produce a distribution with C classes by appending a 0 to the input vector before going through the softmax.
This is fine for simple stuff, but I think there are some cases where you'd want to give all C inputs - e.g. if you're doing multi-class classification with a different kernel for each class, I don't think this would be possible with the current version?
Should I make a PR to change it or is there a reason to keep it as is? (would always still be possible to use the current version by appending a zero yourself)
"GPLikelihoods.jl" is currently not a registered package. This makes it more difficult than necessary for users of this package such as https://github.com/rossviljoen/SparseGPs.jl. It is overdue to register the code inside this repo.
Can't we have a unique
(l::AbstractLikelihood)(fs::AbstractVector{<:Real}) = Product(map(l, fs))
I think we need 1.6 for this feature but we are already bounding to this version so that seems like a valid thing.
Originally posted by @theogf in #72 (review)
The package InverseFunctions.jl will be available in the registry soon and contain inverse
+ definitions for functions in base. If other packages such as LogExpFunctions implement this interface we could replace GPLikelihoods.inverse
with InverseFunctions.inverse
and remove our definitions.
This is intended as a discussion issue where we can hash out an initial design for the package. The goal is to
None of this is set in stone, so please feel free to chime in with any thoughts you might have on the matter. In particular if you think that I've missed something obvious from the design that could restrict us down the line, now would be a good time to bring it up.
In an ideal world, the API for GPs with non-Gaussian likelihoods would be "Turing" or "Soss", in the sense that we would just put a GP into a probabilistic programme, and figure out everything from there. This package, however, is not aiming for that level of generality. Rather it is aiming for the tried-and-tested GP + likelihood function API, and providing a robust and well-defined API + collection of approximate inference algorithms to deal with this.
Taking a bottom-up approach to design, my thinking is that the following basic structure should be sufficient for our needs:
f = GP(m, k)
fx = LatentGP(f, x, ϕ)
log_density(fx, f)
where
f
is some GP whose inputs are of type Tx
,x
is some subtype of AbstractVector{Tx}
,ϕ
is a function from AbstractVector{<:Real}
to Real
that computes the log likelihood a particular sample from f
at x
, andlog_density(fx, f) := logpdf(fx, f) + ϕ(f)
(it's not clear to me whether this function is ever non-trivial)This structure encompasses all of the standard things that you'll see in ML, but is a little more general, as the likelihood function isn't restricted to be independent over outputs. To make things convenient for users, we can set up a couple of common cases of ϕ
such as factorised likelihoods: a type that specifies that ϕ(f) = sum(n -> ϕ[n](f[n]), eachindex(x))
, and special cases of likelihoods for classification etc (the various things implemented in GPML). I've not figured out exactly what special cases we want here, so we need to put some thought into that.
This interface obviously precludes expressing that the likelihood is a function of entire sample paths from f
-- see e.g. [1] for an instance of this kind of thing. I can't imagine this being too much of an issue as all of the techniques for actually working with such likelihoods necessarily involve discretising the function, which we can handle. This means that they can still be implemented in an only slightly more ugly manner. If this does turn out to be an actual issue for a number of users, we can always generalise the likelihood a bit.
Note that this approach feels quite stan-like, in that it just requires the user to specify a likelihood function.
This is the bit of the design that I'm least comfortable with. I think that we should focus on getting NUTS / ESS working in the first instance, but it's not at all clear to me what the appropriate interface is for approximate inference with MCMC, given that we're working outside of a PPL. In the first instance I would propose to simply provide well documented examples that show how to leverage the above structure in conjunction with e.g. AdvancedHMC to perform approximate inference. It's possible that we really only want to provide this functionality at the GPML.jl level, since you really need to include all of the parameters of the model, both the function f
and any kernel parameters, to do anything meaningful.
The variational inference setting is probably a bit clearer what to do, because you can meaningfully talk about ELBOs etc without talking too much about any kernel parameters. e.g. we might implement function along the lines of elbo(fx, q)
, where q
is some approximate posterior over f(x)
. It's going to be a little bit down the line before we start looking at this though, possibly we won't get to it at all over the summer, although it would definitely be good to look at how to get some of the stuff from AugmentedGaussianProcesses into this package. @theogf do you have any thoughts on the kinds of things that would be necessary from an interface-perspective to make this feasible?
In short, this package is likely to be quite small for a while -- more or less just a single new type and some corresponding documentation while we consider MCMC. I would envisage that this package will come into its own when we really start going for variational inference a little bit further down the line.
@yebai @sharanry @devmotion @theogf -- I would appreciate your input.
[1] - Cotter, Simon L., et al. "MCMC methods for functions: modifying old algorithms to make them faster." Statistical Science (2013): 424-446.
Following issue #231 in AbstractGPs.jl, I would like to propose an interface that allows to automatically use Distributions.jl's large library of distributions. Something like (I haven't tested this yet):
struct GPLikelihood{T <: Distribution, Tl <: AbstractLink} <: AbstractLikelihood
d::T
invlink::Tl # takes a value of the latent variable and outputs a parameter vector for `d`
end
(l::GPLikelihood)(f::Real) = l.d(l.invlink(f)...)
function (l::GPLikelihood)(fs::AbstractVector{<:Real})
return Product((l.d).(l.invlink.(fs)...)) # might need different notation to broadcast correctly
end
and then have convenience structs for special cases:
const BernoulliLikelihood(invlink) = GPLikelihood(Bernoulli, invlink)
const CategoricalLikelihood(invlink) = GPLikelihood(Categorical, invlink)
....
Would this be useful?
Should we have a global abstract type for the Likelihood
objects? That would be practical (at least for me)
I am planning to write a nlatent
function for the package, returning the dimensionality of the input needed for the likelihood.
It is very practical to initialize things with a given likelihood.
However, the current version of CategoricalLikelihood
has no way of knowing the number of latent needed. what do you think about enforcing the number of outputs in the likelihood?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.