Giter Site home page Giter Site logo

fcit's Introduction

License

A Fast Conditional Independence Test (FCIT).

Introduction

Let x, y, z be random variables. Then deciding whether P(y | x, z) = P(y | z) can be difficult, especially if the variables are continuous. This package implements a simple yet efficient and effective conditional independence test, described in [link to arXiv when we write it up!]. Important features that differentiate this test from competition:

  • It is fast. Worst-case speed scales as O(n_data * log(n_data) * dim), where dim is max(x_dim + z_dim, y_dim). However, amortized speed is O(n_data * log(n_data) * log(dim)).
  • It applies to cases where some of x, y, z are continuous and some are discrete, or categorical (one-hot-encoded).
  • It is very simple to understand and modify.
  • It can be used for unconditional independence testing with almost no changes to the procedure.

We have applied this test to tens of thousands of samples of thousand-dimensional datapoints in seconds. For smaller dimensionalities and sample sizes, it takes a fraction of a second. The algorithm is described in [arXiv link coming], where we also provide detailed experimental results and comparison with other methods. However for now, you should be able to just look through the code to understand what's going on -- it's only 90 lines of Python, including detailed comments!

Usage

Basic usage is simple, and the default settings should work in most cases. To perform an unconditional test, use dtit.test(x, y):

import numpy as np
from fcit import fcit

x = np.random.rand(1000, 1)
y = np.random.randn(1000, 1)

pval_i = fcit.test(x, y) # p-value should be uniform on [0, 1].
pval_d = fcit.test(x, x + y) # p-value should be very small.

To perform a conditional test, just add the third variable z to the inputs:

import numpy as np
from fcit import fcit

# Generate some data such that x is indpendent of y given z.
n_samples = 1000
z = np.random.dirichlet(alpha=np.ones(2), size=n_samples)
x = np.vstack([np.random.multinomial(20, p) for p in z]).astype(float)
y = np.vstack([np.random.multinomial(20, p) for p in z]).astype(float)

# Check that x and y are dependent (p-value should be uniform on [0, 1]).
pval_d = fcit.test(x, y)
# Check that z d-separates x and y (the p-value should be small).
pval_i = fcit.test(x, y, z)

Installation

pip install fcit

Requirements

Tested with Python 3.6 and

  • joblib >= 0.11
  • numpy >= 1.12
  • scikit-learn >= 0.18.1
  • scipy >= 0.16.1

fcit's People

Contributors

kjchalup avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.