Giter Site home page Giter Site logo

sap-archive / contextual-ai Goto Github PK

View Code? Open in Web Editor NEW
85.0 13.0 11.0 54.25 MB

Contextual AI adds explainability to different stages of machine learning pipelines - data, training, and inference - thereby addressing the trust gap between such ML systems and their users. It does not refer to a specific algorithm or ML method — instead, it takes a human-centric view and approach to AI.

Home Page: https://contextual-ai.readthedocs.io/en/latest

License: Apache License 2.0

Dockerfile 0.02% Shell 0.03% Python 14.63% Jupyter Notebook 85.19% CSS 0.09% JavaScript 0.04%
machine-learning explainability report-generator

contextual-ai's Introduction

Contextual AI

Build Status Documentation Status Coverage Status Language grade: Python PyPI - Python Version PyPI version License Gitter REUSE status

Contextual AI adds explainability to different stages of machine learning pipelines - data, training, and inference - thereby addressing the trust gap between such ML systems and their users.

🖥 Installation

Contextual AI has been tested with Python 3.6, 3.7, and 3.8. You can install it using pip:

$ pip install contextual-ai

Building locally

$ sh build.sh
$ pip install dist/*.whl

⚡️ Quickstart 1 - Explain the predictions of a model

In this simple example, we will attempt to generate explanations for some ML model trained on 20newsgroups, a text classification dataset. In particular, we want to find out which words were important for a particular prediction.

from pprint import pprint
from sklearn import datasets
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer

# Main Contextual AI imports
import xai
from xai.explainer import ExplainerFactory

# Train on a subset of the 20newsgroups dataset (text classification)
categories = [
    'rec.sport.baseball',
    'soc.religion.christian',
    'sci.med'
]

# Fetch and preprocess data
raw_train = datasets.fetch_20newsgroups(subset='train', categories=categories)
raw_test = datasets.fetch_20newsgroups(subset='test', categories=categories)
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(raw_train.data)
y_train = raw_train.target
X_test = vectorizer.transform(raw_test.data)
y_test = raw_test.target

# Train a model
clf = MultinomialNB(alpha=0.1)
clf.fit(X_train, y_train)

############################
# Main Contextual AI steps #
############################
# Instantiate the text explainer via the ExplainerFactory interface
explainer = ExplainerFactory.get_explainer(domain=xai.DOMAIN.TEXT)

# Build the explainer
def predict_fn(instance):
    vec = vectorizer.transform(instance)
    return clf.predict_proba(vec)

explainer.build_explainer(predict_fn)

# Generate explanations
exp = explainer.explain_instance(
    labels=[0, 1, 2], # which classes to produce explanations for?
    instance=raw_test.data[9],
    num_features=5 # how many words to show?
)

print('Label', raw_train.target_names[raw_test.target[9]], '=>', raw_test.target[9])
pprint(exp)

Input text:

From: [email protected] (Stephen A. Creps)\nSubject: Re: The doctrine of Original Sin\nOrganization: Indiana University\nLines: 63\n\nIn article <[email protected]> [email protected] writes:\n>>If babies are not supposed to be baptised then why doesn\'t the Bible\n>>ever say so.  It never comes right and says "Only people that know\n>>right from wrong or who are taught can be baptised."\n>\n>This is not a very sound argument for baptising babies
...

Output explanations:

Label soc.religion.christian => 2
{0: {'confidence': 6.79821e-05,
     'explanation': [{'feature': 'Bible', 'score': -0.0023500809763485468},
                     {'feature': 'Scripture', 'score': -0.0014344577715211986},
                     {'feature': 'Heaven', 'score': -0.001381196356886895},
                     {'feature': 'Sin', 'score': -0.0013723724408794883},
                     {'feature': 'specific', 'score': -0.0013611914394935848}]},
 1: {'confidence': 0.00044,
     'explanation': [{'feature': 'Bible', 'score': -0.007407412195931125},
                     {'feature': 'Scripture', 'score': -0.003658367757678809},
                     {'feature': 'Heaven', 'score': -0.003652181996607397},
                     {'feature': 'immoral', 'score': -0.003469502264458387},
                     {'feature': 'Sin', 'score': -0.003246609821338066}]},
 2: {'confidence': 0.99948,
     'explanation': [{'feature': 'Bible', 'score': 0.009736539971486623},
                     {'feature': 'Scripture', 'score': 0.005124375636024145},
                     {'feature': 'Heaven', 'score': 0.005053514624616295},
                     {'feature': 'immoral', 'score': 0.004781252244149238},
                     {'feature': 'Sin', 'score': 0.004596128058053568}]}}

⚡️ Quickstart 2 - Generate an explainability report

Another functionality of contextual-ai is the ability to generate PDF reports that compile the results of data analysis, model training, feature importance, error analysis, and more. Here's a simple example where we provide an explainability report for the Titanic dataset. The full tutorial can be found here.

import os
import sys
from pprint import pprint
from xai.compiler.base import Configuration, Controller

json_config = 'basic-report-explainer.json'

controller = Controller(config=Configuration(json_config))
pprint(controller.config)

The Controller is responsible for ingesting the configuration file basic-report-explainer.json and parsing the specifications of the report. The configuration file looks like this:

{'content_table': True,
 'contents': [{'desc': 'This section summarized the training performance',
               'sections': [{'component': {'attr': {'labels_file': 'labels.json',
                                                    'y_pred_file': 'y_conf.csv',
                                                    'y_true_file': 'y_true.csv'},
                                           'class': 'ClassificationEvaluationResult',
                                           'module': 'compiler',
                                           'package': 'xai'},
                             'title': 'Training Result'}],
               'title': 'Training Result'},
              {'desc': 'This section provides the analysis on feature',
               'sections': [{'component': {'_comment': 'refer to document '
                                                       'section xxxx',
                                           'attr': {'train_data': 'train_data.csv',
                                                    'trained_model': 'model.pkl'},
                                           'class': 'FeatureImportanceRanking'},
                             'title': 'Feature Importance Ranking'}],
               'title': 'Feature Importance Analysis'},
              {'desc': 'This section provides a model-agnostic explainer',
               'sections': [{'component': {'attr': {'domain': 'tabular',
                                                    'feature_meta': 'feature_meta.json',
                                                    'method': 'lime',
                                                    'num_features': 5,
                                                    'predict_func': 'func.pkl',
                                                    'train_data': 'train_data.csv'},
                                           'class': 'ModelAgnosticExplainer',
                                           'module': 'compiler',
                                           'package': 'xai'},
                             'title': 'Result'}],
               'title': 'Model-Agnostic Explainer'},
              {'desc': 'This section provides the analysis on data',
               'sections': [{'component': {'_comment': 'refer to document '
                                                       'section xxxx',
                                           'attr': {'data': 'titanic.csv',
                                                    'label': 'Survived'},
                                           'class': 'DataStatisticsAnalysis'},
                             'title': 'Simple Data Statistic'}],
               'title': 'Data Statistics Analysis'}],
 'name': 'Report for Titanic Dataset',
 'overview': True,
 'writers': [{'attr': {'name': 'titanic-basic-report'}, 'class': 'Pdf'}]}

The Controller also triggers the rendering of the report:

controller.render()

Which produces this PDF report which visualizes data distributions, training results, feature importances, local prediction explanations, and more!

alt text

🚀 What else can it do?

Contextual AI spans three pillars, or scopes, of explainability, each addressing a different stage of a machine learning solution's lifecycle. We provide several features and functionalities for each:

Pre-training (Data)

  • Distributional analysis of data and features
  • Data validation
  • Tutorial

Training evaluation (Model)

  • Training performance
  • Feature importance
  • Per-class explanations
  • Simple error analysis
  • Tutorial

Inference (Prediction)

  • Explanations per prediction instance
  • Tutorial

We currently support the following explainers for each data type:

Tabular:

Text:

Looking to integrate your own explainer into contextual AI? Look at the following documentation to see how you can use our AbstractExplainer class to create your own explainer that satisfies our interface!

Formatter/Compiler

  • Produce PDF/HTML reports of outputs from the above using only a few lines of code
  • Tutorial

🤝 Contributing

We welcome contributions of all kinds!

  • Reporting bugs
  • Requesting features
  • Creatin pull requests
  • Providing discussions/feedback

Please refer to CONTRIBUTING.md for more.

Contributors

Licensing

Copyright 2020-2021 SAP SE or an SAP affiliate company and contextual-ai contributors. Please see our LICENSE for copyright and license information. Detailed information including third-party components and their licensing/copyright information is available via the REUSE tool.

contextual-ai's People

Contributors

ajinkyapatil8190 avatar dependabot[bot] avatar ngpgn avatar postalc avatar seansaito avatar sebastianwolf-sap avatar wangjin1024 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

contextual-ai's Issues

array index error in quickstart 1

While printing the target label, it prints the label at the index 0.

raw_test.target[0]

But shouldn't it actually be raw_test.target[9] since our instance is at index 9 here:

# Generate explanations

exp = explainer.explain_instance(
    labels=[0, 1, 2], # which classes to produce explanations for?
    instance=raw_test.data[9],
    num_features=5 # how many words to show?
)

I run the code with index 0 first and my label was shown as

rec.sport.baseball => 0 , (which is wrong classification)

but the rest of the explanation, the words and confidence scores were the same as your tutorial result. Then I changed the index to 9, and it showed the right label:

soc.religion.christian => 2

Explainer / model decoupling

Today, the explainers expect a predict_fn at the build step, which provides explainers with access to the black-box model for generating explanations. However, this introduces a dependency between the explainer and the black-box model which could produce difficulties in deploying explainer artifacts into some productive space.

We should come up with an option that allows users to properly decouple the explainer artifact and the predict_fn to enable different kinds of deployment strategies. Inspiration comes from https://github.com/SeldonIO/seldon-core, which allows production-level deployment of ML models (kudos to @vishalmour and @sinhadebarchan for the share).

Failing step 3. Text in data_explorer.ipynb

When in data_explorer.ipynb executing

labelled_analyzer = LabelledTextDataAnalyzer(preprocess_fn=preprocess,stop_words_by_languages=['english'],
                                             predefined_pattern=predefined_pattern)
labelled_analyzer.feed_all(texts,labels)
labelled_stats, all_stats = labelled_analyzer.get_statistics()
plotter.plot_labelled_text_stats(labelled_stats, all_stats)

the error is thrown:

...
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/stopwords

  Searched in:
    - '/home/jovyan/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/lib/python3.7/site-packages/xai/data'
**********************************************************************

Seems like a cell is missing with:

import nltk
nltk.download('stopwords')

Cannot use contextual-ai with python 3.9 based application due to hard dependency on scikit-learn version

Hi Team,

There is a strong dependency on contextual-ai on scikit-learn version 0.21.x. As a result, we cannot use this library beyond python 3.7. We are using contextual-ai==0.0.2. Consider the error log below:

File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 363, in render
09:22:36 self.render_contents(report=report,
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 264, in render_contents
09:22:36 Controller.render_contents(report=report,
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 260, in render_contents
09:22:36 Controller.render_component(report=report,
File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 325, in render_component
09:22:36 Controller.factory(report=report, package=package,
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 299, in factory
09:22:36 imported_module = import_module('.' + module, package=package)
09:22:36 File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module
09:22:36 return _bootstrap._gcd_import(name[level:], package, level)
09:22:36 File "", line 1030, in _gcd_import
09:22:36 File "", line 1007, in _find_and_load
09:22:36 File "", line 986, in _find_and_load_unlocked
09:22:36 File "", line 680, in _load_unlocked
09:22:36 File "", line 850, in exec_module
09:22:36 File "", line 228, in _call_with_frames_removed
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/init.py", line 13, in 09:22:36 from .evaluation import ClassificationEvaluationResult
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/evaluation.py", line 19, in 09:22:36 from xai.model.evaluation.result_compiler import ResultCompiler
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/model/evaluation/result_compiler.py", line 11, in 09:22:36 from sklearn.metrics.classification import precision_recall_fscore_support, confusion_matrix
09:22:36 ModuleNotFoundError: No module named 'sklearn.metrics.classification'

This module is only available till scikit-learn version 0.21.x. See: https://github.com/scikit-learn/scikit-learn/tree/0.22.X/sklearn/metrics and https://github.com/scikit-learn/scikit-learn/tree/0.21.X/sklearn/metrics

Is there any plan to make this library compatible with newer versions of python (v3.9 and v3.10)?

Although the requirements file pins the scikit-learn version to 0.22.2, the classification module is deprecated in version 0.22.2.

--Python 3.8.0 (default, Nov 6 2019, 15:49:01) --
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.

from sklearn.metrics.classification import precision_recall_fscore_support, confusion_matrix
/usr/local/homebrew/anaconda3/envs/py38/lib/python3.8/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.classification module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
warnings.warn(message, FutureWarning)

Error after Instantiating LimeTabularExplainer via the Explainer interface

After instantiating LimeTabularExplainer via Explainer interface, I am getting an error message:

TypeError: init() got an unexpected keyword argument 'sample_around_instance'

I guess this should be because of latest release. I dont see this error message on previous versions. Suggestions please?

The whitepaper is gated

The whitepaper mentioned in the tutorial notebooks is stored at SAP's SharePoint and is not publicly accessible without authorization.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.