Giter Site home page Giter Site logo

gpd's Introduction

gpd - Gaussian process derivatives for marginal effects

Package Dependencies:

  • GNU Octave version 3.2.x+ or MATLAB version 7.x+
  • gpml-matlab-v3.6-2015-07-07

Package Summary: This package adds functionality that allows calculating marginal effects from Gaussian process regression models trained in the MATLAB package gpml. To my knowledge, no other MATLAB package provides this functionality. Only {@meanZero} mean functions and {@covSEiso} or {@covSEard} covariance functions are supported at this time. Future functionality will include support for:

  1. {@meanConst}, {@meanLinear}, and {@meanSum, {@meanLinear, @meanConst}} mean functions,
  2. sums of {@covSEiso} and {@covSEard} covariance functions, and
  3. covariate masking for mean and covariance functions.
    The first feature I will add is sums of {@covSEiso} and {@covSEard}, and the second feature I will add is supporting for masking in the covariance functions. Mean function and mean function masking is the last feature that will be added.

Package Descripton: This package implements a routine to estimate Gaussian process regression model derivatives found in notes by a former Ph.D. student Andrew McHutchon that was associated with Carl Edward Rasmussen's machine learning group. Andrew McHutchon's Cambridge website is no longer operational, so these notes were accessed through the most recent archived version available from the Wayback Machine. Andrew McHutchon's notes are dated April 17, 2013, and the most recent Wayback Machine archive is from February 25, 2021. The purpose of Andrew's notes seems to be to calculate Taylor series approximations to help improve GP estimates; it does not seem to be used for calculating marginal effects for general inference with GPs. A number of applications in engineering and the natural sciences cite Andrew McHutchon's unpublished working paper. Citations of Andrew McHutchon's "Differentiating Gaussian Processes" working paper can be found on SemanticScholar.

This method has been implemented in political science in a published paper as well as a working paper by JBrandon Duck-Mayr. JB has implemented routines from his two GP papers with R packages gpirt and gpmss.

  1. GPIRT: A Gaussian Process Model for Item Response Theory
  2. Inference in Gaussian Process Models for Political Science

Yehu Chen has GP derivatives implemented in some of his repositories, including an ordinal extension to the GPIRT model made by JB and his co-authors.

Herb Susmann also has applications of derivatives of Gaussian processes on his blog.

  1. July 7, 2020 - Derivatives of a Gaussian Process
  2. December 1, 2020 - Conditioning on Gaussian Process Derivative Observations
  3. December 11, 2020 - Derivative Gaussian Processes in Stan

Table of Contents

1. Using the gpd package

This section describes how to use package functions and provides a demo for general usage.

1.1 Package functions

This section provides detail for functions in the package. I provide detail for the Inputs required for each function, the Outputs expected from each function, and a Description of what each function does.

There are five functions in the package gpd.

  1. [ mean_vec, var_vec ] = me(hyp, meanfunc, covfunc, X, y, xs)
  2. [ MEs, VARs ] = pme(hyp, meanfunc, covfunc, X, y, Xs)
  3. [gmm_mean, gmm_mean_var, cred95] = ame(hyp, meanfunc, covfunc, X, y, Xs)
  4. plt = plotme(d, hyp, meanfunc, covfunc, X, y, Xs, ~)
  5. [ plt, gridX ] = gridme(d, numsteps, hyp, meanfunc, covfunc, X, y, interaction_indices)

All functions rely on having already trained a Gaussian process regression model in gpml. In notation below, the input hyp is a struct of the learned hyperparameters from training a Gaussian process regression model. For example,

hyp = minimize_v2(initial_hyp, @gp, p, inffunc, meanfunc, covfunc, likfunc, normalize(X), normalize(y));

This package assumes MAP estimates are used, but there is nothing to preclude the user from passing HMC estimates for model hyperparameters as long as they have the appropriate struct format.

1.1.1 [ mean_vec, var_vec ] = me(hyp, meanfunc, covfunc, X, y, xs)

Inputs:

  • hyp - Model hyperparameters learned from the parent gpml model
  • meanfunc - The meanfunc of the corresponding gpml model (only {@meanZero} is supported at this time)
  • covfunc - The covfunc of the parent gpml model (only {@covSEiso} and {@covSEard} are supported at this time)
  • X - The NxD non-normalized training inputs used when training the parent gpml model
  • y - The Nx1 non-normalized training outputs used when training the parent gpml model
  • xs - A 1xD non-normalized test point

Outputs: Two Dx1 vectors where D is the number of columns in X. The first output is the Dx1 vector of expected marginal effects w.r.t. explanatory variables k = 1, ..., D calculated with respect to test input xs. The second output is the Dx1 vector of diagonal entries of the variance-covariance matrix associated with the Dx1 output from the first position. Since diagonals are reported, the output from the function me(hyp, meanfunc, covfunc, X, y, xs) reports the marginal distribution of the expected marginal effect of each explanatory variable evaluated at test point xs.

Description: The function me() calculates the marginal effect of a single test point. Core functionality of the package relies on the me() function. For any test point xs, it calculates the Dx1 vector of expected marginal effects with respect to each epxlanatory variable k = 1, ..., D and the Dx1 vector of variances associated with the expected marginal effect. The Dx1 vector of variances in (2) is pulled from the diagonal of the variance-covariance matrix associated with (1).

1.1.2 [ MEs, VARs ] = pme(hyp, meanfunc, covfunc, X, y, Xs)

Inputs:

  • hyp - Model hyperparameters learned from the parent gpml model
  • meanfunc - The meanfunc of the corresponding gpml model (only {@meanZero} is supported at this time)
  • covfunc - The covfunc of the parent gpml model (only {@covSEiso} and {@covSEard} are supported at this time)
  • X - The NxD non-normalized training inputs used when training the parent gpml model
  • y - The Nx1 non-normalized training outputs used when training the parent gpml model
  • Xs (optional) - A MxD non-normalized matrix of test points

Outputs: Two DxM vectors. The jth column of the first output is the Dx1 vector of epxected marginal effects associated with the jth test point Xs(j,:). The jth column of the second output is the Dx1 vector of variances associated with the expected marginal effect of each of the D explanatory variables evaluated at test point Xs(j,:).

Description: The function pme() will predict marginal effects for a set of test points. The function calls me(hyp, meanfunc, covfunc, X, y, Xs) for each row j = 1, ..., M in Xs. Since calling me() on a single test point xs produces two Dx1 vectors, calling pme() on MxD test data Xs produces two DxM vectors. When Xs is omitted from the function call, then the function assumes that marginal effect calculations are made with respect to the training inputs (i.e., Xs = X).

1.1.3 [gmm_mean, gmm_mean_var, cred95] = ame(hyp, meanfunc, covfunc, X, y, Xs)

Inputs:

  • hyp - Model hyperparameters learned from the parent gpml model
  • meanfunc - The meanfunc of the corresponding gpml model (only {@meanZero} is supported at this time)
  • covfunc - The covfunc of the parent gpml model (only {@covSEiso} and {@covSEard} are supported at this time)
  • X - The NxD non-normalized training inputs used when training the parent gpml model
  • y - The Nx1 non-normalized training outputs used when training the parent gpml model
  • Xs (optional) - A MxD non-normalized matrix of test points

Outputs: A Dx1 vector where the kth entry is the expected marginal effect of the kth explanatory variable from a pme() call averaged across the M test inputs in Xs from the pme() call (i.e., a mean of marginal effects), a Dx1 vector of the sample variance corresponding to the first output (i.e., the variance of marginal effects), and a Dx2 matrix of 95% credible intervals calculated for each dimension. When Xs is omitted from the input, calculations are made on the training sample so that Xs = X.

Description: The function ame() calculates average marginal effects for all explanatory variables with respect to a set of test points. Calls pme(hyp, meanfunc, covfunc, X, y, Xs) to generate summary statistics across the test inputs using general method of moments. When Xs is omitted from the function call, then the function assumes that marginal effect calculations are made with respect to the training inputs (i.e., Xs = X).

1.1.4 plt = plotme(d, hyp, meanfunc, covfunc, X, y, Xs, ~)

Inputs:

  • d - The explanatory variable d in { 1, ..., D } for which plots will be made
  • hyp - Model hyperparameters learned from the parent gpml model
  • meanfunc - The meanfunc of the corresponding gpml model (only {@meanZero} is supported at this time)
  • covfunc - The covfunc of the parent gpml model (only {@covSEiso} and {@covSEard} are supported at this time)
  • X - The NxD non-normalized training inputs used when training the parent gpml model
  • y - The Nx1 non-normalized training outputs used when training the parent gpml model
  • Xs (optional) - A MxD non-normalized matrix of test points
  • ~ (optional) - A placeholder for the function gridme() that denotes whether to omit some plot information; any non-empty input in the eigth position activates this function feature

Outputs: A plot object.

Description: The function plotme() plots marginal effects for explanatory variable d. The function plotme() is the main plotting function of the package. Other plotting functions such as gridme() (see below) depend on it. Returns a plot object that is already labeled appropriately. When the final two inputs are omitted, then calculations are made with respect to the training sample so that Xs = X. When Xs is specified but the final function input is omitted, calculations are made with respect to predictions over test inputs Xs. When any value is passed to the final (eighth) function input, then plotme() omits some information for the plot so it can be customized for plotting interactions. The last function input is used whenever gridme() is passed inputs indicating interactions are of interest.

1.1.5 [ plt, gridX ] = gridme(d, numsteps, hyp, meanfunc, covfunc, X, y, interaction_indices)

Inputs:

  • d - The explanatory variable d in { 1, ..., D } for which plots will be made. Gridding on dimension d creates numsteps observations over min(X(:,d)) - 2*sqrt(var(X(:,d))) and max(X(:,d)) + 2*sqrt(var(X(:,d))) with other explanatory variables held at their mean.
  • numsteps -
  • hyp - Model hyperparameters learned from the parent gpml model
  • meanfunc - The meanfunc of the corresponding gpml model (only {@meanZero} is supported at this time)
  • covfunc - The covfunc of the parent gpml model (only {@covSEiso} and {@covSEard} are supported at this time)
  • X - The NxD non-normalized training inputs used when training the parent gpml model
  • y - The Nx1 non-normalized training outputs used when training the parent gpml model
  • interaction_indices (optional) - A vector with length between 2 and D with unique integer entries between 1 and D that specify which dimensions of the explanatory variables are to be gridded. Each explanatory variable is gridded so that numsteps observations are made over min(X(:,d)) - 2*sqrt(var(X(:,d))) and max(X(:,d)) + 2*sqrt(var(X(:,d))). All dimensions not specified in interaction_indices are held at their mean.

Outputs: A plot object and the gridded data used to generate the plot.

Description: The function gridme() creates gridded plots of marginal effects for explanatory variable d. The function gridme() automates some of the prediction process. The function gridme() calls plotme() and generates gridded data for the dth dimension with other explanatory variables held at their mean. The grid will have numpsteps points. When interaction_indices is specified, then all dimensions in the vector interaction_indices = [k1, k2, ...] will be gridded when making predictions. All other dimensions not in interaction_indices will be held at their mean.

1.2 Package demo

In this subsection, I demonstrate general usage of the gpd package. I create data having known properties and demonstrate each function's usage.

Links to the following repo code correspond to the figures in the the Simulations section of this README. The slow build of a variety of simple data generating processes and how to estimate their anticipated effects is a useful way to get intuition on how to use the gpd package.

2. Simulations

This section provides plots from gridme() which demonstrate the method more than the gpd package itself. In particular, the plots for data generating processes with interactions between explanatory variables show that the method demonstrates tremendous flexibility when fitting marginal effects that may vary w.r.t. to other explanatory variables.

2.1 Univariate functions

Gaussian process regression models applied to functions with a single input are equivalent when specified with either the isotropic squared exponential covariance function ({@covSEiso} in gpml) or the automatic relevance determination squared exponential covariance function ({@covSEard} in gpml).

$$X \sim N(0,1)$$

2.1.1 Linear, quadratic, and cubic expansions

Linear Quadratic Cubic

2.2 Bivariate functions with independent normal covariates

With two function inputs, models trained using the isotropic squared exponential covariance function ({@covSEiso} in gpml) and the automatic relevance determination squared exponential covariance function ({@covSEard} in gpml) are distinct. In the former, covariates share a length scale. In the latter, each covariate has its own length scale. Independent normal covariates in the data generating process mean that there isn't covariance between inputs to help with learning. Model performance in this section will not be as good as in Section 2.3 when the bivariate functions are generated with jointly normal covariates.

$$X \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \right)$$

2.2.1 A linear bivariate function with independent normal covariates

covSEiso - X1 covSEiso - X2
covSEard - X1 covSEard - X2

2.2.2 A linear bivariate function with interactions between independent normal covariates

covSEiso - X1 covSEiso - X2
covSEard - X1 covSEard - X2

2.3 Bivariate functions with jointly normal covariates

Jointly normal covariates in the data generating process should imply that covariance between inputs can help with learning a better model fit. Model performance in this section should be superior to Section 2.2 where the bivariate functions are generated with independent normal covariates.

$$X \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{bmatrix} 1 & \pm 0.9 \\ \pm 0.9 & 1 \end{bmatrix} \right)$$

2.3.1 A linear bivariate function with jointly normal covariates

2.3.1.1 Positive correlation

covSEiso - X1 covSEiso - X2
covSEard - X1 covSEard - X2

2.3.1.2 Negative correlation

covSEiso - X1 covSEiso - X2
covSEard - X1 covSEard - X2

2.3.2 A linear bivariate function with interactions between jointly normal covariates

2.3.2.1 Positive correlation

covSEiso - X1 covSEiso - X2
covSEard - X1 covSEard - X2

2.3.1.2 Negative correlation

covSEiso - X1 covSEiso - X2
covSEard - X1 covSEard - X2

gpd's People

Contributors

johnsontr avatar

Watchers

 avatar

gpd's Issues

Insufficient gridding for gridme

Default usage of gridme suggests that holding other explanatory variables at their means is a useful representation of the marginal effect of interest. I think this is suppressing useful information from the training data and making the standard errors around predicted marginal effects too rigid and perhaps too large. Test whether this is the case by specifying all dimensions of the data as interactions so that everything is gridded. Do plots look more interesting? If so, branch a fix for gridme functionality.

Extraneous interaction plot information

Interaction plots have some leftover information presented in them, like the mean of sample marginal effects as a labeled number on the Y axis. Verify interaction plots don't contain any more information than necessary.

Interaction plots shifted for interactions

I think I am plotting interaction effects wrong. For linear bivariate function with interactions between jointly normal covariates, the mean function plot is shifted compared to the predicted marginal effects. For the same DGP using independent normal covariates, the mean function plot seems to be rotated. I think I am plotting w.r.t. the wrong axis for these interactions, and the predicted marginal effects should exactly overlap the mean function. I might want to have two functions for plotting; one for plotting marginal effects without interactions, and another for plotting marginal effects where I need more control over axes than what plotme() currently allows for nargin 8.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.