Giter Site home page Giter Site logo

realrate / causing Goto Github PK

View Code? Open in Web Editor NEW
52.0 4.0 8.0 4.65 MB

Causing: CAUsal INterpretation using Graphs

Home Page: https://github.com/realrate/Causing

License: MIT License

Python 98.58% Makefile 1.42%
causality-analysis mediation-analysis effects-modeling causal-networks latent-variables dag graph-theory do-calculus structural-equation-modeling structural-analysis

causing's Introduction

Causing: CAUSal INterpretation using Graphs

License: MIT Python 3.7

Causing is a multivariate graphical analysis tool helping you to interpret the causal effects of a given equation system.

Get a nice colored graph and immediately understand the causal effects between the variables.

Input: You simply have to put in a dataset and provide an equation system in form of a python function. The endogenous variables on the left-hand side are assumed to be caused by the variables on the right-hand side of the equation. Thus, you provide the causal structure in form of a directed acyclic graph (DAG).

Output: As an output, you will get a colored graph of quantified effects acting between the model variables. You can immediately interpret mediation chains for every individual observation - even for highly complex nonlinear systems.

Here is a table relating Causing to other approaches:

Causing is Causing is NOT
✅ causal model given ❌ causal search
✅ DAG directed acyclic graph ❌ cyclic, undirected, or bidirected graph
✅ latent variables ❌ just observed / manifest variables
✅ individual effects ❌ just average effects
✅ direct, total, and mediation effects ❌ just total effects
✅ structural model ❌ reduced model
✅ small and big data ❌ big data requirement
✅ graphical results ❌ just numerical results
✅ XAI explainable AI ❌ black box neural network

The Causing approach is quite flexible. It can be applied to highly latent models with many of the modeled endogenous variables being unobserved. Exogenous variables are assumed to be observed and deterministic. The most severe restriction certainly is that you need to specify the causal model / causal ordering.

Causal Effects

Causing combines total effects and mediation effects in one single graph that is easy to explain.

The total effects of a variable on the final variable are shown in the corresponding nodes of the graph. The total effects are split up over their outgoing edges, yielding the mediation effects shown on the edges. Just education has more than one outgoing edge to be interpreted in this way.

The effects differ from individual to individual. To emphasize this, we talk about individual effects. And the corresponding graph, combining total and mediation effects is called the Individual Mediation Effects (IME) graph.

Software

Causing is free software written in Python 3. Graphs are generated using Graphviz. See dependencies in setup.py. Causing is available under MIT license. See LICENSE.

The software is developed by RealRate, an AI rating agency aiming to re-invent the rating market by using AI, interpretability, and avoiding any conflict of interest. See www.realrate.ai.

When starting python -m causing.examples example after cloning / downloading the Causing repository you will find the results in the output folder. The results are saved in SVG files. The IME files show the individual mediation effects graphs for the respective individual.

See causing/examples for the code generating some examples.

Start your Model

To start your model, you have to provide the following information, as done in the example code below:

  • Define all your model variables as SymPy symbols.
  • Note that in Sympy some operators are special, e.g. Max() instead of max().
  • Provide the model equations in topological order, that is, in order of computation.
  • Then the model is specified with:
    • xvars: exogenous variables
    • yvars: endogenous variables in topological order
    • equations: previously defined equations
    • final_var: the final variable of interest used for mediation effects

1. A Simple Example

Assume a model defined by the equation system:

Y1 = X1

Y2 = X2 + 2 * Y12

Y3 = Y1 + Y2.

This gives the following graphs. Some notes to understand them:

  • The data used consists of 200 observations. They are available for the x variables X1 and X2 with mean(X1) = 3 and mean(X2) = 2. Variables Y1 and Y2 are assumed to be latent / unobserved. Y3 is assumed to be manifest / observed. Therefore, 200 observations are available for Y3.

  • To allow for benchmark comparisons, each individual effect is measured with respect to the mean of all observations.

  • Nodes and edges are colored, showing positive (green) and negative (red) effects they have on the final variable Y3.

  • Individual effects are based on the given model. For each individual, however, its own exogenous data is put into the given graph function to yield the corresponding endogenous values. The effects are computed at this individual point. Individual effects are shown below just for individual no. 1 out of the 200 observations.

  • Total effects are shown below in the nodes and they are split up over the outgoing edges yielding the Mediation effects shown on the edges. Note, however, that just outgoing edges sum up to the node value, incoming edges do not. All effects are effects just on the final variable of interest, assumed here to be Y3.

Individual Mediation Effects (IME)

As you can see in the right-most graph for the individual mediation effects (IME), there is one green path starting at X1 passing through Y1, Y2, and finally ending in Y3. This means that X1 is the main cause for Y3 taking on a value above average with its effect on Y3 being +29.81. However, this positive effect is slightly reduced by X2. In total, accounting for all exogenous and endogenous effects, Y3 is +27.07 above average. You can understand at one glance why Y3 is above average for individual no. 1.

You can find the full source code for this example here.

2. Application to Education and Wages

To dig a bit deeper, here we have a real-world example from social sciences. We analyze how the wage earned by young American workers is determined by their educational attainment, family characteristics, and test scores.

This 5-minute introductory video gives a short overview of Causing and includes this real data example: See Causing Introduction Video.

See here for a detailed analysis of the Education and Wages example: An Application of Causing: Education and Wages.

3. Application to Insurance Ratings

The Causing approach and its formulas together with an application are given in:

Bartel, Holger (2020), "Causal Analysis - With an Application to Insurance Ratings" DOI: 10.13140/RG.2.2.31524.83848 https://www.researchgate.net/publication/339091133

Note that in this early paper the mediation effects on the final variable of interest are called final effects. Also, while the current Causing version just uses numerically computed effects, that paper uses closed formulas.

The paper proposes simple linear algebra formulas for the causal analysis of equation systems. The effect of one variable on another is the total derivative. It is extended to endogenous system variables. These total effects are identical to the effects used in graph theory and its do-calculus. Further, mediation effects are defined, decomposing the total effect of one variable on a final variable of interest over all its directly caused variables. This allows for an easy but in-depth causal and mediation analysis.

The equation system provided by the user is represented as a structural neural network (SNN). The network's nodes are represented by the model variables and its edge weights are given by the effects. Unlike classical deep neural networks, we follow a sparse and 'small data' approach. This new methodology is applied to the financial strength ratings of insurance companies.

Keywords: total derivative, graphical effect, graph theory, do-Calculus, structural neural network, linear Simultaneous Equations Model (SEM), Structural Causal Model (SCM), insurance rating

Award

RealRate's AI software Causing is a winner of the PyTorch AI Hackathon.

We are excited to be a winner of the PyTorch AI Hackathon 2020 in the Responsible AI category. This is quite an honor given that more than 2,500 teams submitted their projects.

devpost.com/software/realrate-explainable-ai-for-company-ratings.

Contact

Dr. Holger Bartel
RealRate
Cecilienstr. 14, D-12307 Berlin
[email protected]
Phone: +49 160 957 90 844
www.realrate.ai

causing's People

Contributors

bbkchdhry avatar bbkrr avatar holger-bartel avatar holgerbartel avatar karlb avatar salistha-shakya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

causing's Issues

Create github action to upload releases to PyPI

We want to update our PyPI Causing release whenever we made meaningful changes to Causing. This should happen automatically whenever we create a release on github. PyPI secrets have already been saved in the Causing github repo.

Cyclic Models

Check if the current Causing implementation 2.x can handle cyclic models.

These ideas are imported from:

Cyclic example model:

  1. Y1 <- Y2
  2. Y2 <- Y1

Effect of Y2 on Y1: As in the acyclic case, simply intervene on Y2 and compute Y1_new:
effect = Y1_new - Y1_old
So do not cyclicly converge.

  • Check if consistent with our closed formula (should be)

Estimation: compare SSE

The estimated SSE of the simple linear model must be lower than the model's SSE. Otherwise, no suggestions can be read from the NN's results.

Causing: Rename Node-Effects and Edge-Effects

This issue is for Causing.

Because of an historical notation, see https://realrate.ai/download/publications/RealRate%20Kausalanalyse%20mit%20NN.pdf, we use names and variable names that are too complicated from our point of view today. See See Sections 3.3 Exogene finale Effekte, 3.4 Endogene finale Effekte, and 4.3 Partieller, totaler und finaler Graph. So we should rename the variables in the public Causing repository https://github.com/realrate/Causing like this:

Node effects

exj_indiv -> xnodeeffect
eyj_indiv -> ynodeeffect

exj_indivs -> xnodeeffects
eyj_indivs -> ynodeeffects

Edge effects

eyx_indiv -> xedgeeffect
eyy_indiv -> yedgeeffect

eyx_indivs -> xedgeeffects
eyy_indivs -> yedgeeffects

Also, please check, if we use those names in the RealRate-Private repository.

Causing Test Cases

Check the numerical output of the simple example model and the wage example.

Normalize bias for better interpretation

Because yvars have vastly different scales, the bias terms which are generated during the estimation also do. This makes it hard to decide if the biases are unusually high or not.

  • normalize bias by dividing by the yvars stddev (yielding a so called statistical t-value)
  • highlight high normalized biases in the output with t-values being larger than 2 in absolute value (and maybe hide less important/less unusual output)

Eliminate Identification Matrices

Instead of using the identification matrices idx and idy - we should use mx and my with None where there is no edge. Then we have all the information available without redundancies.
I wil create an issue for that.

The identification matrices have already been eliminated in the generation of the dot file #19 but without making use of None yet

Check and update documentation

To make Causing usable by people outside of RealRate, we provide documentation at

This documentation has been written a while ago and has only been updated sporadically. Read the documentation and look for the following problems (on the develop branch):

  • Is it understandable to new users (being able to use it is sufficient, no deep math understanding required)?
  • Is it up to date with the code base (general content should be ok, but check code samples and code references)?
  • Does the code yield the same results as shown in the documentation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.