juliagaussianprocesses / gplikelihoods.jl Goto Github PK

Provides likelihood functions for Gaussian Processes.

License: MIT License

Julia 100.00%

gaussian-processes likelihood likelihood-functions

gplikelihoods.jl's Introduction

GPLikelihoods

GPLikelihoods.jl provides a collection of likelihoods to be used as building blocks for defining non-Gaussian problems. It is intended to be mainly consumed by other packages within the JuliaGaussianProcesses ecosystem (such as ApproximateGPs.jl) for approximate Gaussian process regression, classification, event counting, etc.

gplikelihoods.jl's People

Contributors

Stargazers

Watchers

Forkers

isgasho hendersontrent filipepfarias waweros song921012

gplikelihoods.jl's Issues

Question on implementation and type stability

I was studying the implementation of the expected likelihood and stumbled upon this:

GPLikelihoods.jl/src/expectations.jl

Lines 84 to 96 in 498af31

    
           function expected_loglikelihood( 
        
               gh::GaussHermiteExpectation, lik, q_f::AbstractVector{<:Normal}, y::AbstractVector 
        
           ) 
        
               # Compute the expectation via Gauss-Hermite quadrature 
        
               # using a reparameterisation by change of variable 
        
               # (see e.g. en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature) 
        
               return sum(Broadcast.instantiate( 
        
                   Broadcast.broadcasted(y, q_f) do yᵢ, q_fᵢ  # Loop over every pair 
        
                       # of marginal distribution q(fᵢ) and observation yᵢ 
        
                       expected_loglikelihood(gh, lik, q_fᵢ, yᵢ) 
        
                   end, 
        
               )) 
        
           end

This uses Broadcast.broadcasted, which is not documented. If I'm not mistaken, this means that it is not part of Julia's public API, and could break at any point in time. Besides that, I wonder what is the advantage over this:

function expected_loglikelihood(
    gh::GaussHermiteExpectation, lik, q_f::AbstractVector{<:Normal}, y::AbstractVector
)
    # Compute the expectation via Gauss-Hermite quadrature
    # using a reparameterisation by change of variable
    # (see e.g. en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature)
    return sum(expected_loglikelihood(gh, lik, q_fᵢ, yᵢ) for (q_fᵢ, yᵢ) in zip(q_f, y))
end

This also seems to be type-stable, while the current implementation isn't.

Precompute the nodes and weights for `GaussHermite`

Right now the nodes and weight are recomputed every time expected_loglik is called. The constructor of GaussHermite could simply store the two as vectors.

Move expected_loglik here & additional design discussion

Complement to JuliaGaussianProcesses/ApproximateGPs.jl#53

The expectation of the log-likelihood under q(f), ∫ q(f) log p(y | f) df, isn't specific to ApproximateGPs themselves, and if someone added a new likelihood here, it'd be great if they had visibility into also providing e.g. analytic implementation of expected_loglik or at least thinking about what _default_quadrature should be.

expected_loglik currently has the following signature:

expected_loglik(quadrature::QuadratureMethod, y::AbstractVector, q_f::AbstractVector{<:Normal}, lik)

Some questions on the design:

To me it would seem more natural to either order it as
expected_loglik(quadrature, q_f, lik, y) or expected_loglik(quadrature, lik, q_f, y) -- we're unlikely to dispatch on y (it's implicitly defined by what the likelihood is already), and the first form matches the common maths notation much more closely.
expected_loglik takes a Vector y, a Vector q_f, and a lik function that maps from scalar f to UnivariateDistribution; expected_loglik then does the broadcasting internally. Is that what we want?

Would it be better (cleaner) to have expected_loglik handle scalars only (but this might result in some performance loss for e.g. recomputing the Gauss-Hermite points and weights for each data point)?
Should we expect lik to take the full Vector fs and return e.g. a Product() of UnivariateDistributions (though this might make the expectation code more complicated)?
How would we want to handle the heteroskedastic case, where we do want to include the correlations across the two outputs at each data point, but independence between different data points (not sure how we would handle that on the AbstractGPs side given that multi-output is all "output as input")?

If we want to support e.g. Expectation Propagation as well, we should also provide ∫ q(f) p(y | f) df: the quadrature fallbacks are going to be exactly the same, just evaluating pdf instead of loglikelihood (NB- should that be logpdf? the difference is in summing over ys...). Should we have expected_pdf separately (maybe re-using internal quadrature methods)?

TestUtils

I think before implementing any more likelihoods we should think carefully about what standardised tests we want to run on them, similarly to what I've just done in this PR: JuliaGaussianProcesses/KernelFunctions.jl#159

Looking at the various open PRs (#17, #16 , #15, and maybe #14), and the already implemented likelihoods (Gaussian and Poisson) there appears to be quite a lot of repeated code in the tests. It would be great to get a handle on this before it becomes too large a job to easily refactor.

Pulling out some common themes from these, it looks like we want to be testing that

the API is satisfied
the length is correct
functor works properly

@sharanry you've done most of the work on this so far. Is there anything else that's obvious to you that we're missing?

There's also quite a lot of code that manually generates input and output data / constructs appropriate "latent" AbstractGPs at the minute -- in my opinion it would be a very good idea to factor this code out as well.

Missing Likelihoods

Please edit directly to add others.

Sparse GPs

Do we want to include Sparse Gaussian Processes in this package or create an extra one for it?
In my experience a lot of things can be written once for both sparse and non-sparse models, so having them in the same package might simplify things

Update NegativBinomial parametrizations with more stable Distributions.jl backend

JuliaStats/Distributions.jl#1536 is proposing different parametrizations of the NegativeBinomial likelihood, particularly the scale-shape parametrization that corresponds to the NBParamMean types introduced in #80. Once JuliaStats/Distributions.jl#1536 is released as part of Distributions.jl, we should update our code accordingly (specifically, we should be able to remove the _nb_mean_excessvar_to_r_p code).

Using `ScientificTypes` for safe use

I cannot count how many times I lost nerves on debugging issues on my script to only find out that I used categorical variables instead of 0-1 for my BernoulliLikelihood.

Therefore here is my proposal:

We use ScientificTypes.jl to enforce that the outputs are restricted to be in a certain form (binary, categorical, continuous, positive, etc...).
We dispatch the different likelihoods to their appropriate domain.
We still allow for any kind of input anyway, but we perform a check on the type and domain of the inputs.

unify definitions of likelihood called on vector

Can't we have a unique

(l::AbstractLikelihood)(fs::AbstractVector{<:Real}) = Product(map(l, fs))

I think we need 1.6 for this feature but we are already bounding to this version so that seems like a valid thing.

Originally posted by @theogf in #72 (review)

AbstractLikelihood supertype

Should we have a global abstract type for the Likelihood objects? That would be practical (at least for me)

`NegativeBinomial` likelihood enforces single parametrization

#63 added an implementation of a negative binomial likelihood. This prescribed the parametrisation in terms of (number of successes r, failure probability p), where the number of successes is a (scalar, equal for all observations) likelihood parameter and the failure probability is modelled with a latent GP f, squashed through the logistic link function by default, so p = logistic(f). This parametrisation appears to be common e.g. in intermittent count time series modelling. However, in other areas different parametrisations are more common, e.g. directly using the latent GP to parametrise the mean of the observations, and a scalar parameter to measure overdispersion. Moreover, there are multiple variants of the latter parametrisation (Type I and II being the most common). This is somewhat problematic as, unlike with the "bare-bones" Distributions object, a user cannot re-use the NegativeBinomialLikelihood object for one of the other parametrisations. I'm not quite sure what we should do here, just noting that it seems potentially rather confusing to give a specific name to one meaning of something that actually has multiple meanings in different areas (with no easy way of converting)... 🤔

Parametrisations:

current NegativeBinomialLikelihood:
- number of successes r = likelihood parameter, failure probability p = logistic(f)
- mean = exp(f) / r
- var = mean * p = mean^2 * r / (1 + mean * r) (?)
alternative parametrisations:
- Type I / Quasi-Poisson:
  - latent GP f, likelihood parameter \phi
  - mean = invlink(f) e.g. exp(f)
  - var = mean * \phi
- Type II:
  - latent GP f, likelihood parameter k
  - mean = invlink(f) e.g. exp(f)
  - var = mean + mean^2 * k

Implement the Student-T Likelihood

Implement a likelihood with the Student-T distribution.
The input GP corresponds to the mean in the following definition of the non-standardized Student-t distribution.

Ordinal Regression

I've always wanted to have support for ordinal regression, presumably implemented along the lines of [1]. Not particular reason -- just because it's fun.

[1] - Chu, Wei and Zoubin Ghahramani. "Gaussian processes for ordinal regression." Journal of machine learning research 6.7 (2005).

Use InverseFunctions

The package InverseFunctions.jl will be available in the registry soon and contain inverse + definitions for functions in base. If other packages such as LogExpFunctions implement this interface we could replace GPLikelihoods.inverse with InverseFunctions.inverse and remove our definitions.

Package Design

This is intended as a discussion issue where we can hash out an initial design for the package. The goal is to

tie down the structure of the package, including how the internals should be designed to lead to make implementing a pleasant user-facing API
determine what flavours of approximate inference we want to target first / what is feasible for @sharanry to target over the summer.

None of this is set in stone, so please feel free to chime in with any thoughts you might have on the matter. In particular if you think that I've missed something obvious from the design that could restrict us down the line, now would be a good time to bring it up.

Background

In an ideal world, the API for GPs with non-Gaussian likelihoods would be "Turing" or "Soss", in the sense that we would just put a GP into a probabilistic programme, and figure out everything from there. This package, however, is not aiming for that level of generality. Rather it is aiming for the tried-and-tested GP + likelihood function API, and providing a robust and well-defined API + collection of approximate inference algorithms to deal with this.

API

Taking a bottom-up approach to design, my thinking is that the following basic structure should be sufficient for our needs:

f = GP(m, k)
fx = LatentGP(f, x, ϕ)
log_density(fx, f)

where

f is some GP whose inputs are of type Tx,
x is some subtype of AbstractVector{Tx},
ϕ is a function from AbstractVector{<:Real} to Real that computes the log likelihood a particular sample from f at x, and
log_density(fx, f) := logpdf(fx, f) + ϕ(f) (it's not clear to me whether this function is ever non-trivial)

This structure encompasses all of the standard things that you'll see in ML, but is a little more general, as the likelihood function isn't restricted to be independent over outputs. To make things convenient for users, we can set up a couple of common cases of ϕ such as factorised likelihoods: a type that specifies that ϕ(f) = sum(n -> ϕ[n](f[n]), eachindex(x)), and special cases of likelihoods for classification etc (the various things implemented in GPML). I've not figured out exactly what special cases we want here, so we need to put some thought into that.

This interface obviously precludes expressing that the likelihood is a function of entire sample paths from f -- see e.g. [1] for an instance of this kind of thing. I can't imagine this being too much of an issue as all of the techniques for actually working with such likelihoods necessarily involve discretising the function, which we can handle. This means that they can still be implemented in an only slightly more ugly manner. If this does turn out to be an actual issue for a number of users, we can always generalise the likelihood a bit.

Note that this approach feels quite stan-like, in that it just requires the user to specify a likelihood function.

Approximate Inference + Approximate Inference Interface

This is the bit of the design that I'm least comfortable with. I think that we should focus on getting NUTS / ESS working in the first instance, but it's not at all clear to me what the appropriate interface is for approximate inference with MCMC, given that we're working outside of a PPL. In the first instance I would propose to simply provide well documented examples that show how to leverage the above structure in conjunction with e.g. AdvancedHMC to perform approximate inference. It's possible that we really only want to provide this functionality at the GPML.jl level, since you really need to include all of the parameters of the model, both the function f and any kernel parameters, to do anything meaningful.

The variational inference setting is probably a bit clearer what to do, because you can meaningfully talk about ELBOs etc without talking too much about any kernel parameters. e.g. we might implement function along the lines of elbo(fx, q), where q is some approximate posterior over f(x). It's going to be a little bit down the line before we start looking at this though, possibly we won't get to it at all over the summer, although it would definitely be good to look at how to get some of the stuff from AugmentedGaussianProcesses into this package. @theogf do you have any thoughts on the kinds of things that would be necessary from an interface-perspective to make this feasible?

Summary

In short, this package is likely to be quite small for a while -- more or less just a single new type and some corresponding documentation while we consider MCMC. I would envisage that this package will come into its own when we really start going for variational inference a little bit further down the line.

@yebai @sharanry @devmotion @theogf -- I would appreciate your input.

[1] - Cotter, Simon L., et al. "MCMC methods for functions: modifying old algorithms to make them faster." Statistical Science (2013): 424-446.

`expected_loglikelihood` design

Open question from #42:

expected_loglik takes a Vector y, a Vector q_f, and a lik function that maps from scalar f to UnivariateDistribution; expected_loglik then does the broadcasting internally. Is that what we want?

Would it be better (cleaner) to have expected_loglik handle scalars only (but this might result in some performance loss for e.g. recomputing the Gauss-Hermite points and weights for each data point)?

Should we expect lik to take the full Vector fs and return e.g. a Product() of UnivariateDistributions (though this might make the expectation code more complicated)?

How would we want to handle the heteroskedastic case, where we do want to include the correlations across the two outputs at each data point, but independence between different data points (not sure how we would handle that on the AbstractGPs side given that multi-output is all "output as input")?

Could extend it so you can directly pass a FiniteGP as follows:

function expected_loglikelihood(
    quad, lik::AbstractFactorizingLikelihood, q_f::AbstractMvNormal, y::AbstractVector
)
    mean, std = mean_and_std(q_f)
    return expected_loglikelihood(quad, lik, Normal.(mean, std), y)
end

Would have to think how to handle likelihoods depending on multiple function evaluations (e.g. heteroskedastic likelihood with a vector-valued GP).

Abandon `AbstractLink` type

All that I need to specify a likelihood is the mapping from f to a distribution, and in many cases the only degree of freedom is the inverse link function: it would be super convenient if I could just write PoissonLikelihood(log1pexp) to try out a softplus invlink, without having to code up a subtype of AbstractLink and having to remember what I have to do to implement it.

Improve `expected_loglikelihood` differentiability and type-stability

For convenience, the old version of expected_loglikelihood (Gauss-Hermite quadrature method) looked like this:

GPLikelihoods.jl/src/expectations.jl

Lines 83 to 109 in e9b7da9

    
           # Compute the expected_loglikelihood over a collection of observations and marginal distributions 
        
           function expected_loglikelihood( 
        
               gh::GaussHermiteExpectation, lik, q_f::AbstractVector{<:Normal}, y::AbstractVector 
        
           ) 
        
               # Compute the expectation via Gauss-Hermite quadrature 
        
               # using a reparameterisation by change of variable 
        
               # (see e.g. en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature) 
        
               return sum(Broadcast.instantiate( 
        
                   Broadcast.broadcasted(y, q_f) do yᵢ, q_fᵢ  # Loop over every pair 
        
                       # of marginal distribution q(fᵢ) and observation yᵢ 
        
                       expected_loglikelihood(gh, lik, q_fᵢ, yᵢ) 
        
                   end, 
        
               )) 
        
           end 
        
           # Compute the expected_loglikelihood for one observation and a marginal distributions 
        
           function expected_loglikelihood(gh::GaussHermiteExpectation, lik, q_f::Normal, y) 
        
               μ = mean(q_f) 
        
               σ̃ = sqrt2 * std(q_f) 
        
               return invsqrtπ * sum(Broadcast.instantiate( 
        
                   Broadcast.broadcasted(gh.xs, gh.ws) do x, w # Loop over every 
        
                       # pair of Gauss-Hermite point x with weight w 
        
                       f = σ̃ * x + μ 
        
                       loglikelihood(lik(f), y) * w 
        
                   end, 
        
               )) 
        
           end

#90 introduced a work-around/hack for two (possibly interrelated) issues of that implementation:

Computing the gradient with Zygote was 1-3 orders of magnitude slower than expected, when called with NegativeBinomialLikelihood,
The function was generally type-unstable #77.

The type-instability of the broadcast could be related to JuliaLang/julia#45748 (see also JuliaGaussianProcesses/KernelFunctions.jl#458, which documents a strange behavior in which inference depends on previous evaluations, which I also observed in the Broadcast construction, but could not resolve).

I therefore tried to get rid of the Broadcast entirely, hoping that type stability would improve performance. First I tried a custom implementation of pairwise sum for two function arguments (i.e. I implemented sum(f, X, Y), which is equivalent to mapreduce(f, +, X, Y) based on the implementation of mapreduce(f, +, X) in Base, see #77 for the reason behind this). That implementation can be found in here.

Although that implementation makes the function type stable, it was still not very performant. For this reason, in #90 I chose an implementation which allocates an explicit array, which I believed to be more Zygote-friendly. The large improvements over the old versions seen in the benchmarks confirmed this intuition (see #90 (comment)).

However, the solution is not very clean, as it pays for this performance improvement with additional allocations in the forward pass. It is also unclear whether this implementation will be as beneficial or generalize to other AD backends. The potentially better approach would be to define an rrule for the broadcasted sum, as suggested in #90 (comment) (although even then it would be nice to have the function be type-stable anyway).

Package name

I have the feeling that LikelihoodFunctions is a bit too general for this package. If I understand correctly, this package aims at propose models with Gaussian and non-Gaussian likelihoods. Shouldn't the name be then GPLikelihoods, NonConjugateGPs or something like this?

Variational Inference Design

Here is to discuss about the design concerning variational inference methods.

So far in AugmentedGaussianProcesses.jl things are done this way:

There are two functions ∇E_μ and ∇E_Σ which return the gradient of the expected log likelihood given mu and Sigma (the second one returns an arrays as usually you don't need the covariance terms of Sigma).
Then the gradient given the ELBO is actually given the natural gradient because there is the relationship nat. grad. of ELBO given natural parameters (eta_1 (Sigma^-1 mu), eta_2 (-0.5Sigma^-1)) is equal to normal grad. of ELBO given normal parameters (mu, Sigma).
- With the augmentation procedure it's possible to get analytical gradients and set the gradient to 0 and it becomes a coordinate ascent update (or a partial one with a learning rate for the stochastic case)
- Without it, I just use a natural grad. ascent scheme.
Then natural parameters are mapped back to normal parameters.

Categorical takes C-1 inputs for C classes

Currently, the CategoricalLikelihood is defined to take a vector of C-1 inputs to produce a distribution with C classes by appending a 0 to the input vector before going through the softmax.

This is fine for simple stuff, but I think there are some cases where you'd want to give all C inputs - e.g. if you're doing multi-class classification with a different kernel for each class, I don't think this would be possible with the current version?

Should I make a PR to change it or is there a reason to keep it as is? (would always still be possible to use the current version by appending a zero yourself)

@willtebbutt

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

BernoulliLogit for Bernoulli{<:LogisticLink}

@devmotion Would that be okay to copy the code of BernoulliLogit and BinomialLogit from Turing to have a special case on Bernoulli{<:LogisticLink} ?

Interface with Distributions

Following issue #231 in AbstractGPs.jl, I would like to propose an interface that allows to automatically use Distributions.jl's large library of distributions. Something like (I haven't tested this yet):

struct GPLikelihood{T <: Distribution, Tl <: AbstractLink} <: AbstractLikelihood
    d::T
    invlink::Tl # takes a value of the latent variable and outputs a parameter vector for `d`
end

(l::GPLikelihood)(f::Real) = l.d(l.invlink(f)...)

function (l::GPLikelihood)(fs::AbstractVector{<:Real})
    return Product((l.d).(l.invlink.(fs)...)) # might need different notation to broadcast correctly
end

and then have convenience structs for special cases:

const BernoulliLikelihood(invlink) =  GPLikelihood(Bernoulli, invlink)
const CategoricalLikelihood(invlink) = GPLikelihood(Categorical, invlink)
....

Would this be useful?

Docs not published

https://juliagaussianprocesses.github.io/GPLikelihoods.jl/ (whether stable/ or dev/) gives a 404

Package registration, name, interaction with rest of ecosystem

"GPLikelihoods.jl" is currently not a registered package. This makes it more difficult than necessary for users of this package such as https://github.com/rossviljoen/SparseGPs.jl. It is overdue to register the code inside this repo.

There are concerns about too many small packages - we could move this code into a separate package - what package would that be?
This package currently has no dependency in either direction with AbstractGPs.jl - it is used by SparseGPs.jl, but could also be used e.g. by GLM implementations (such as https://github.com/JuliaStats/GLM.jl) - can we combine forces, and perhaps rename to "just" LikelihoodFunctions.jl (as proposed in #4)?
Update: GLM.jl defines link functions in https://github.com/JuliaStats/GLM.jl/blob/master/src/glmtools.jl

Release 0.4.5 has not been tagged

Looks like there was a typo in the @JuliaRegistrator command: 4b33f7a#commitcomment-86096069

Have a `logpdf` and `loglikelihood` for `NamedTuple` of f and y

Right now to compute logpdf or loglikelihood given a pair of GP samples f and observations y one needs to do:

logpdf(likelihood(f), y)

would that make sense to have a small wrapper

function logpdf(like::Likelihood, f_y::NamedTuple)
  logpdf(likelihood(f_y.f), f_y.y)
end

since this is what rand would return for a LatentGP :
https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/blob/e7c67bd24c454f0b8179032ecbb82284887159c0/src/latent_gp.jl#L37

(and similarly for loglikelihood)

`CategoricalLikelihood` should contain the input dimensionality

I am planning to write a nlatent function for the package, returning the dimensionality of the input needed for the likelihood.
It is very practical to initialize things with a given likelihood.

However, the current version of CategoricalLikelihood has no way of knowing the number of latent needed. what do you think about enforcing the number of outputs in the likelihood?

Some parametrizations of `NegativeBinomial` can underflow `p`

With the NBParamII for sufficiently large values of f (something like 800), p will be equal to zero and many estimator starts to break down. This might be something to be dealt in Distributions.jl though.

Unify `link/invlink` in the docs and code

Right now it's quite inconsistent through the code

Remove type aliases for `ExpLink` etc. and/or the constructors?

This issue, or rather discussion of the desired API, came up in #48.

With this PR now ExpLink etc. is a type alias of Link{typeof(exp)} and ExpLink() calls Link(exp). Intentionally, the PR was non-breaking. However, it leads to the following questions:

Should we deprecate the type aliases and remove them eventually?
Should we deprecate the constructors and remove them eventually?

Personally, I am favour of 2 and probably of 1:

I don't see a clear advantage of eg. ExpLink over Link(exp) but there would be fewer definitions and the API would be cleaner if the former would be removed.
It is less convenient to dispatch on Link{typeof(exp)} instead of ExpLink. However, I think it is questionable if this use case is a strong enough argument to include type aliases in GPLikelihoods. Instead in my opinion it would be sufficient if users and downstream packages define (possibly internal) type aliases if it is useful for their implementation. So currently I think type aliases should only be included but not necessarily exported if they are useful and used in GPLikelihoods.

@theogf suggested to add a macro @link CosLink cos that would define both the type alias and constructor (hence it assumes that we don't deprecate any of them). I'm not in favour of a macro, both because I think we should deprecate probably both definitions and because I think the definitions are already quite short and simple and so the macro feels too magical given these limited benefits.

AbstractGPs Dependency

Do we even need to depend on AbstractGPs in this package? AFAICT, there's no functionality which actually depends on it...

Question: Scaled Poisson expectation

I'm trying to implement the GP with a Poisson likelihood whose parameter may vary with the observations¹:
$$ y_i \sim Poisson(e_i exp(f_i))$$
where f_i is the GP. Is this possible?

Approximate inference for disease mapping with sparse Gaussian processes, Jarno Vanhatalo, Ville Pietiläinen, Aki Vehtari (Eq. 1) ↩

	function expected_loglikelihood(
	gh::GaussHermiteExpectation, lik, q_f::AbstractVector{<:Normal}, y::AbstractVector
	)
	# Compute the expectation via Gauss-Hermite quadrature
	# using a reparameterisation by change of variable
	# (see e.g. en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature)
	return sum(Broadcast.instantiate(
	Broadcast.broadcasted(y, q_f) do yᵢ, q_fᵢ # Loop over every pair
	# of marginal distribution q(fᵢ) and observation yᵢ
	expected_loglikelihood(gh, lik, q_fᵢ, yᵢ)
	end,
	))
	end

	# Compute the expected_loglikelihood over a collection of observations and marginal distributions
	function expected_loglikelihood(
	gh::GaussHermiteExpectation, lik, q_f::AbstractVector{<:Normal}, y::AbstractVector
	)
	# Compute the expectation via Gauss-Hermite quadrature
	# using a reparameterisation by change of variable
	# (see e.g. en.wikipedia.org/wiki/Gauss%E2%80%93Hermite_quadrature)
	return sum(Broadcast.instantiate(
	Broadcast.broadcasted(y, q_f) do yᵢ, q_fᵢ # Loop over every pair
	# of marginal distribution q(fᵢ) and observation yᵢ
	expected_loglikelihood(gh, lik, q_fᵢ, yᵢ)
	end,
	))
	end

	# Compute the expected_loglikelihood for one observation and a marginal distributions
	function expected_loglikelihood(gh::GaussHermiteExpectation, lik, q_f::Normal, y)
	μ = mean(q_f)
	σ̃ = sqrt2 * std(q_f)
	return invsqrtπ * sum(Broadcast.instantiate(
	Broadcast.broadcasted(gh.xs, gh.ws) do x, w # Loop over every
	# pair of Gauss-Hermite point x with weight w
	f = σ̃ * x + μ
	loglikelihood(lik(f), y) * w
	end,
	))
	end