Giter Site home page Giter Site logo

cpd's Introduction

CPD: Algorithms for Conditioned Symmetric Positive Definite Matrix Estimation Under Constraints

This code implements several methods for solving the problem of finding a symmetric positive definite (SPD) matrix that minimizes an objective functional under the constraint of a bounded condition number. This work was motivated by a specific application: estimating the covariance matrix of the Empirical Best Linear Prediction (EBLP) model. Specific API calls are provided for this use case, but also provided for the general "near-PD" case.

CPD: the problem of finding a symmetric positive definite (SPD) matrix that satisfies a condition (e.g., method of moments equations for a covariance matrix satisfied to a given tolerance) and is well-conditioned. Typically, one cannot obtain both, and we need to find a point of optimal tradeoff between accuracy (small constraint residual) and stability (small condition number).

Near PD: a special case of CPD where the functional is the Frobenius norm between a given matrix and the desired SPD matrix with bounded condition number. That is, the task is to find a conditioned SPD matrix nearest to a matrix. The regularization parameter is either the condition number bound or a trade-off parameter as in the CPD formulation.

RCO (Regularized Cholesky Optimization): an algorithm for solving CPD that minimizes a regularized functional of the matrix that balances minimizing constraint satisfaction plus the (approximate) condition number of the matrix. The minimization is performed for the entries of the Cholesky factor. The algorithm:

  • Using a continuation method to compute the entire ROC curve fast.
  • Each continuation step optimization is solved by Newton-CG.
  • The optimal regularization parameter is picked around the knee of the curve (the default is by a given residual tolerance).

** TN (Tanaka-Nakata): implementation of the paper Tanaka, M., Nakata, N., Positive definite matrix approximation with condition number constraint", Opt. Let. 8(3), 2014 for finding a bounded condition number nearest SPD matrix. Uses a new O(n) dynamic programming algorithm to solve the eigenvalue optimization problem.

Installation

  • Install conda.
  • Create a conda environment from the attached environment.yml: conda env create -f environment.yml.
  • If you'd also like to develop/run unit tests, use the full environment file instead: conda env create -f environment.full.yml.
  • Add src to your PYTHONPATH.

Full Environment for Testing & Development

  • Complete the installation steps above, but create a conda environment using conda env create -f environment_full.yml. instead of environment.yml.
  • Install nodejs (e.g., on mac, brew install nodejs).
  • Run jupyter labextension install @jupyter-widgets/jupyterlab-manager plotlywidget

Testing

The project contains Pytest unit tests for the main modules. To run all tests, run pytest test.

Examples

Near PD

Given an n x n matrix a, call near_pd() to get a conditioned matrix near a in the Frobeius norm norm(b - a)/norm(a).

  b, info = cpd.near_pd.near_pd(a, "rco")      # Regularized Cholesky Optimization
  b, info = cpd.near_pd.near_pd(a, "tn")       # Tanaka-Nakata
  b, info = cpd.near_pd.near_pd(a, "higham")   # Higham near-PD with diagonal perturbation.
  b, info = cpd.near_pd.near_pd(a, "tn", leeway_factor=1.05)       # Stay within 5% Frobenius norm error

EBLP

import cpd
import os

# Load data from files; or crate p, w, c directly.
DATA_DIR = "/path/to/input_files"
a_file_name = os.path.join(DATA_DIR, "A.txt")
c_file_name = os.path.join(DATA_DIR, "C.txt")
w_file_name = os.path.join(DATA_DIR, "Weights_BxB.txt")
pi_file_name = os.path.join(DATA_DIR, "Pi_s.txt")
_, c, w, _, p = cpd.data.load_data(a_file_name, c_file_name, w_file_name, pi_file_name)

# Create optimizer; pass in an ArrayList of arrays p and an array c if not loading the
# data from files as above. Only needs to be called once per RHS term list p.
# Note: p is an cpd.linalg.ArrayList, not a list of matrices.
optimizer = cpd.eblp.create_optimizer("rco", p, w)
# Now call the optimizer fo ra particular LHS matrix c.
g, info = optimizer.optimize(c)

# Objective function value^(1/2) = Relative error in satisfying the moment equations f(G) = C.
print(info[1] ** 0.5)  # 0.0855678

To use the T-N method instead, use

optimizer = cpd.eblp.create_optimizer("tn", p, w)
g, info = optimizer.optimize(c)

General Constraints

Inputs:

  • Constraint functional.
  • Gradient of the constraint functional (for BFGS minimization).
  • Solution quality metric (e.g., the matrix condition number).

Output:

  • Optimal covariance matrix.

Contents

  • data - test data.
  • notebooks: Juypter notebooks.
  • src: source code.
  • test: unit tests.

References

  • Dan, Katherine, JR, Improving Accuracy and Stability of Aggregate Student Growth Measures Using Empirical Best Linear Prediction (JEBStats, in review).

TODO

  • Test another data set.
  • Integrate into R simulation - Katherine.
  • Write research memo/paper.

cpd's People

Contributors

orenlivne avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.