sap-archive / contextual-ai Goto Github PK

Contextual AI adds explainability to different stages of machine learning pipelines - data, training, and inference - thereby addressing the trust gap between such ML systems and their users. It does not refer to a specific algorithm or ML method — instead, it takes a human-centric view and approach to AI.

Home Page: https://contextual-ai.readthedocs.io/en/latest

License: Apache License 2.0

Dockerfile 0.02% Shell 0.03% Python 14.63% Jupyter Notebook 85.19% CSS 0.09% JavaScript 0.04%

machine-learning explainability report-generator

contextual-ai's Introduction

Contextual AI

Contextual AI adds explainability to different stages of machine learning pipelines - data, training, and inference - thereby addressing the trust gap between such ML systems and their users.

🖥 Installation

Contextual AI has been tested with Python 3.6, 3.7, and 3.8. You can install it using pip:

$ pip install contextual-ai

Building locally

$ sh build.sh
$ pip install dist/*.whl

⚡️ Quickstart 1 - Explain the predictions of a model

In this simple example, we will attempt to generate explanations for some ML model trained on 20newsgroups, a text classification dataset. In particular, we want to find out which words were important for a particular prediction.

from pprint import pprint
from sklearn import datasets
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer

# Main Contextual AI imports
import xai
from xai.explainer import ExplainerFactory

# Train on a subset of the 20newsgroups dataset (text classification)
categories = [
    'rec.sport.baseball',
    'soc.religion.christian',
    'sci.med'
]

# Fetch and preprocess data
raw_train = datasets.fetch_20newsgroups(subset='train', categories=categories)
raw_test = datasets.fetch_20newsgroups(subset='test', categories=categories)
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(raw_train.data)
y_train = raw_train.target
X_test = vectorizer.transform(raw_test.data)
y_test = raw_test.target

# Train a model
clf = MultinomialNB(alpha=0.1)
clf.fit(X_train, y_train)

############################
# Main Contextual AI steps #
############################
# Instantiate the text explainer via the ExplainerFactory interface
explainer = ExplainerFactory.get_explainer(domain=xai.DOMAIN.TEXT)

# Build the explainer
def predict_fn(instance):
    vec = vectorizer.transform(instance)
    return clf.predict_proba(vec)

explainer.build_explainer(predict_fn)

# Generate explanations
exp = explainer.explain_instance(
    labels=[0, 1, 2], # which classes to produce explanations for?
    instance=raw_test.data[9],
    num_features=5 # how many words to show?
)

print('Label', raw_train.target_names[raw_test.target[9]], '=>', raw_test.target[9])
pprint(exp)

Input text:

From: [email protected] (Stephen A. Creps)\nSubject: Re: The doctrine of Original Sin\nOrganization: Indiana University\nLines: 63\n\nIn article <[email protected]> [email protected] writes:\n>>If babies are not supposed to be baptised then why doesn\'t the Bible\n>>ever say so.  It never comes right and says "Only people that know\n>>right from wrong or who are taught can be baptised."\n>\n>This is not a very sound argument for baptising babies
...

Output explanations:

Label soc.religion.christian => 2
{0: {'confidence': 6.79821e-05,
     'explanation': [{'feature': 'Bible', 'score': -0.0023500809763485468},
                     {'feature': 'Scripture', 'score': -0.0014344577715211986},
                     {'feature': 'Heaven', 'score': -0.001381196356886895},
                     {'feature': 'Sin', 'score': -0.0013723724408794883},
                     {'feature': 'specific', 'score': -0.0013611914394935848}]},
 1: {'confidence': 0.00044,
     'explanation': [{'feature': 'Bible', 'score': -0.007407412195931125},
                     {'feature': 'Scripture', 'score': -0.003658367757678809},
                     {'feature': 'Heaven', 'score': -0.003652181996607397},
                     {'feature': 'immoral', 'score': -0.003469502264458387},
                     {'feature': 'Sin', 'score': -0.003246609821338066}]},
 2: {'confidence': 0.99948,
     'explanation': [{'feature': 'Bible', 'score': 0.009736539971486623},
                     {'feature': 'Scripture', 'score': 0.005124375636024145},
                     {'feature': 'Heaven', 'score': 0.005053514624616295},
                     {'feature': 'immoral', 'score': 0.004781252244149238},
                     {'feature': 'Sin', 'score': 0.004596128058053568}]}}

⚡️ Quickstart 2 - Generate an explainability report

Another functionality of contextual-ai is the ability to generate PDF reports that compile the results of data analysis, model training, feature importance, error analysis, and more. Here's a simple example where we provide an explainability report for the Titanic dataset. The full tutorial can be found here.

import os
import sys
from pprint import pprint
from xai.compiler.base import Configuration, Controller

json_config = 'basic-report-explainer.json'

controller = Controller(config=Configuration(json_config))
pprint(controller.config)

The Controller is responsible for ingesting the configuration file basic-report-explainer.json and parsing the specifications of the report. The configuration file looks like this:

{'content_table': True,
 'contents': [{'desc': 'This section summarized the training performance',
               'sections': [{'component': {'attr': {'labels_file': 'labels.json',
                                                    'y_pred_file': 'y_conf.csv',
                                                    'y_true_file': 'y_true.csv'},
                                           'class': 'ClassificationEvaluationResult',
                                           'module': 'compiler',
                                           'package': 'xai'},
                             'title': 'Training Result'}],
               'title': 'Training Result'},
              {'desc': 'This section provides the analysis on feature',
               'sections': [{'component': {'_comment': 'refer to document '
                                                       'section xxxx',
                                           'attr': {'train_data': 'train_data.csv',
                                                    'trained_model': 'model.pkl'},
                                           'class': 'FeatureImportanceRanking'},
                             'title': 'Feature Importance Ranking'}],
               'title': 'Feature Importance Analysis'},
              {'desc': 'This section provides a model-agnostic explainer',
               'sections': [{'component': {'attr': {'domain': 'tabular',
                                                    'feature_meta': 'feature_meta.json',
                                                    'method': 'lime',
                                                    'num_features': 5,
                                                    'predict_func': 'func.pkl',
                                                    'train_data': 'train_data.csv'},
                                           'class': 'ModelAgnosticExplainer',
                                           'module': 'compiler',
                                           'package': 'xai'},
                             'title': 'Result'}],
               'title': 'Model-Agnostic Explainer'},
              {'desc': 'This section provides the analysis on data',
               'sections': [{'component': {'_comment': 'refer to document '
                                                       'section xxxx',
                                           'attr': {'data': 'titanic.csv',
                                                    'label': 'Survived'},
                                           'class': 'DataStatisticsAnalysis'},
                             'title': 'Simple Data Statistic'}],
               'title': 'Data Statistics Analysis'}],
 'name': 'Report for Titanic Dataset',
 'overview': True,
 'writers': [{'attr': {'name': 'titanic-basic-report'}, 'class': 'Pdf'}]}

The Controller also triggers the rendering of the report:

controller.render()

Which produces this PDF report which visualizes data distributions, training results, feature importances, local prediction explanations, and more!

🚀 What else can it do?

Contextual AI spans three pillars, or scopes, of explainability, each addressing a different stage of a machine learning solution's lifecycle. We provide several features and functionalities for each:

Pre-training (Data)

Distributional analysis of data and features
Data validation
Tutorial

Training evaluation (Model)

Training performance
Feature importance
Per-class explanations
Simple error analysis
Tutorial

Inference (Prediction)

Explanations per prediction instance
Tutorial

We currently support the following explainers for each data type:

Tabular:

Text:

LIME Text Explainer

Looking to integrate your own explainer into contextual AI? Look at the following documentation to see how you can use our AbstractExplainer class to create your own explainer that satisfies our interface!

Formatter/Compiler

Produce PDF/HTML reports of outputs from the above using only a few lines of code
Tutorial

🤝 Contributing

We welcome contributions of all kinds!

Reporting bugs
Requesting features
Creatin pull requests
Providing discussions/feedback

Please refer to CONTRIBUTING.md for more.

Contributors

Sean Saito
Wang Jin
Chai Wei Tah
Ni Peng
Amrit Raj
Karthik Muthuswamy

Licensing

Copyright 2020-2021 SAP SE or an SAP affiliate company and contextual-ai contributors. Please see our LICENSE for copyright and license information. Detailed information including third-party components and their licensing/copyright information is available via the REUSE tool.

contextual-ai's People

Contributors

Stargazers

Watchers

Forkers

seansaito mhaas gitter-badger hksri arijitchandra rena-ganba ngpgn kuzeydev sebastianwolf-sap isabella232 harel-coffee

contextual-ai's Issues

[rl-reuse_tool-3] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-3
Explanation: Is it registered in REUSE? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-assigned_teams-3] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-3
Explanation: Does it have enough admins on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-assigned_teams-3] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-3
Explanation: Does it have enough admins on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

array index error in quickstart 1

While printing the target label, it prints the label at the index 0.

raw_test.target[0]

But shouldn't it actually be raw_test.target[9] since our instance is at index 9 here:

# Generate explanations

exp = explainer.explain_instance(
    labels=[0, 1, 2], # which classes to produce explanations for?
    instance=raw_test.data[9],
    num_features=5 # how many words to show?
)

I run the code with index 0 first and my label was shown as

rec.sport.baseball => 0 , (which is wrong classification)

but the rest of the explanation, the words and confidence scores were the same as your tutorial result. Then I changed the index to 9, and it showed the right label:

soc.religion.christian => 2

Explainer / model decoupling

Today, the explainers expect a predict_fn at the build step, which provides explainers with access to the black-box model for generating explanations. However, this introduces a dependency between the explainer and the black-box model which could produce difficulties in deploying explainer artifacts into some productive space.

We should come up with an option that allows users to properly decouple the explainer artifact and the predict_fn to enable different kinds of deployment strategies. Inspiration comes from https://github.com/SeldonIO/seldon-core, which allows production-level deployment of ML models (kudos to @vishalmour and @sinhadebarchan for the share).

Failing step 3. Text in data_explorer.ipynb

When in data_explorer.ipynb executing

labelled_analyzer = LabelledTextDataAnalyzer(preprocess_fn=preprocess,stop_words_by_languages=['english'],
                                             predefined_pattern=predefined_pattern)
labelled_analyzer.feed_all(texts,labels)
labelled_stats, all_stats = labelled_analyzer.get_statistics()
plotter.plot_labelled_text_stats(labelled_stats, all_stats)

the error is thrown:

...
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load corpora/stopwords

  Searched in:
    - '/home/jovyan/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/opt/conda/lib/python3.7/site-packages/xai/data'
**********************************************************************

Seems like a cell is missing with:

import nltk
nltk.download('stopwords')

Query - Is contextual-ai library shap and lime tested on GPU environment ?

Hi ,

Is contextual-ai library shap and lime tested on GPU environment ? If yes, adding more GPU's will it improve the performance

Use extras to reduce size of dependencies

As contexual has a lot of depedencies that are not necessary for all use-cases, would it be possible to split the depedencies in different sets of extra requirements (similar to the strategy of https://hanxiao.io/2019/11/07/A-Better-Practice-for-Managing-extras-require-Dependencies-in-Python/ ). This would help reduce the size of the installed depedencies and also reduce potential version conflicts.

[rl-reuse_tool-4] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-4
Explanation: Is it compliant with REUSE rules? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

[rl-assigned_teams-5] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-assigned_teams-5
Explanation: Does teams have enough members on GitHub? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Cannot use contextual-ai with python 3.9 based application due to hard dependency on scikit-learn version

Hi Team,

There is a strong dependency on contextual-ai on scikit-learn version 0.21.x. As a result, we cannot use this library beyond python 3.7. We are using contextual-ai==0.0.2. Consider the error log below:

File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 363, in render
09:22:36 self.render_contents(report=report,
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 264, in render_contents
09:22:36 Controller.render_contents(report=report,
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 260, in render_contents
09:22:36 Controller.render_component(report=report,
File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 325, in render_component
09:22:36 Controller.factory(report=report, package=package,
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/base.py", line 299, in factory
09:22:36 imported_module = import_module('.' + module, package=package)
09:22:36 File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module
09:22:36 return _bootstrap._gcd_import(name[level:], package, level)
09:22:36 File "", line 1030, in _gcd_import
09:22:36 File "", line 1007, in _find_and_load
09:22:36 File "", line 986, in _find_and_load_unlocked
09:22:36 File "", line 680, in _load_unlocked
09:22:36 File "", line 850, in exec_module
09:22:36 File "", line 228, in _call_with_frames_removed
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/init.py", line 13, in 09:22:36 from .evaluation import ClassificationEvaluationResult
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/compiler/evaluation.py", line 19, in 09:22:36 from xai.model.evaluation.result_compiler import ResultCompiler
09:22:36 File "/home/.../.local/lib/python3.9/site-packages/xai/model/evaluation/result_compiler.py", line 11, in 09:22:36 from sklearn.metrics.classification import precision_recall_fscore_support, confusion_matrix
09:22:36 ModuleNotFoundError: No module named 'sklearn.metrics.classification'

This module is only available till scikit-learn version 0.21.x. See: https://github.com/scikit-learn/scikit-learn/tree/0.22.X/sklearn/metrics and https://github.com/scikit-learn/scikit-learn/tree/0.21.X/sklearn/metrics

Is there any plan to make this library compatible with newer versions of python (v3.9 and v3.10)?

Although the requirements file pins the scikit-learn version to 0.22.2, the classification module is deprecated in version 0.22.2.

--Python 3.8.0 (default, Nov 6 2019, 15:49:01) --
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.

from sklearn.metrics.classification import precision_recall_fscore_support, confusion_matrix
/usr/local/homebrew/anaconda3/envs/py38/lib/python3.8/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.classification module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
warnings.warn(message, FutureWarning)

[rl-reuse_tool-1] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-1
Explanation: Does README mention REUSE? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

Error after Instantiating LimeTabularExplainer via the Explainer interface

After instantiating LimeTabularExplainer via Explainer interface, I am getting an error message:

TypeError: init() got an unexpected keyword argument 'sample_around_instance'

I guess this should be because of latest release. I dont see this error message on previous versions. Suggestions please?

The whitepaper is gated

The whitepaper mentioned in the tutorial notebooks is stored at SAP's SharePoint and is not publicly accessible without authorization.

[rl-reuse_tool-2] Violation against OSS Rules of Play

A violation against the OSS Rules of Play has been detected.

Rule ID: rl-reuse_tool-2
Explanation: Does it have LICENSES directory with licenses? No

Find more information at: https://sap.github.io/fosstars-rating-core/oss_rules_of_play_rating.html

sap-archive / contextual-ai Goto Github PK

contextual-ai's Introduction

Contextual AI

🖥 Installation

Building locally

⚡️ Quickstart 1 - Explain the predictions of a model

Input text:

Output explanations:

⚡️ Quickstart 2 - Generate an explainability report

🚀 What else can it do?

Pre-training (Data)

Training evaluation (Model)

Inference (Prediction)

Formatter/Compiler

🤝 Contributing

Contributors

Licensing

contextual-ai's People

Contributors

Stargazers

Watchers

Forkers

contextual-ai's Issues

Recommend Projects

Recommend Topics

Recommend Org