Giter Site home page Giter Site logo

microsoft / responsible-ai-toolbox-mitigations Goto Github PK

View Code? Open in Web Editor NEW
51.0 9.0 4.0 170.61 MB

Python library for implementing Responsible AI mitigations.

Home Page: https://responsible-ai-toolbox-mitigations.readthedocs.io/en/latest/

License: MIT License

Python 7.65% Jupyter Notebook 92.35%
data-analysis data-science machine-learning python responsible-ai responsible-ml

responsible-ai-toolbox-mitigations's Introduction

Responsible AI Mitigations

This Responsible-AI-Toolbox-Mitigations repo consists of a python library that aims to empower data scientists and ML developers to measure their dataset balance and representation of different dataset cohorts, while having access to mitigation techniques they could incorporate to mitigate errors and fairness issues in their datasets. Together with the measurement and mitigation steps, ML professionals are empowered to build more accurate and fairer models.

This repo is a part of the Responsible AI Toolbox, a suite of tools providing a collection of model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

ResponsibleAIToolboxMitigationsOverview

The Responsible AI Mitigations Library helps AI practitioners explore different measurements and mitigation steps that may be most appropriate when the model underperforms for a given data cohort. The library currently has three modules:

  • DataProcessing offers mitigation techniques for improving model performance for specific cohorts.
  • DataBalanceAnalysis provides metrics for diagnosing errors that originate from data imbalance either on class labels or feature values.
  • Cohort provides classes for handling and managing cohorts, which allows the creation of custom pipelines for each cohort in an easy and intuitive interface. The module also provides techniques for learning different decoupled estimators (models) for different cohorts and combining them in a way that optimizes different definitions of group fairness.

In this library, we take a targeted approach to mitigating errors in Machine Learning models. This is complementary and different from the traditional blanket approaches which aim at maximizing a single-score performance number, such as overall accuracy, by merely increasing the size of traning data or model architecture. Since blanket approaches are often costly but also ineffective for improving the model in areas of poorest performance, with targeted approaches to model improvement we focus the improvement efforts in areas previously identified to have more errors and their underlying diagnoses of error. For example, if a practitioner has identified that the model is underperforming for a cohort of interest by using Error Analysis in the Responsible AI Dashboard, they may also continue the debugging process by finding out through Data Balance Analysis and find out that there is class imbalance for this particular cohort. To mitigate the issue, they then focus on improving class imbalance for the cohort of interest by using the Responsible AI Mitigations library. This and several other examples in the documentation of each mitigation function illustrate how targeted approaches may help practitioner best at mitigation giving them more control in the model improvement process.

Installation

Use the following pip command to install the Responsible AI Toolbox. Make sure you are using Python 3.7, 3.8, 3.9 or 3.10. If running in jupyter, please make sure to restart the jupyter kernel after installing. There are three installation options for the raimitigations package:

  • To install the minimum dependencies, use:
pip install raimitigations
  • To install the minimum dependencies + the packages required to run all of the notebooks in the notebooks/ folder:
pip install raimitigations[all]
  • To install all the dependencies used for development (such as pytest, for example), use:
pip install raimitigations[dev]

Documentation

To learn more about the supported dataset measurements and mitigation techniques covered in the raimitigations package, please check out this documentation.

Data Balance Analysis: Examples

Data Processing/Mitigations: Examples

Here is a set of tutorial notebooks that aim to explain how to use each one of the mitigation methods offered in the dataprocessing module.

Here is a set of case study scenarios where we use the transformations available in the dataprocessing module in order to train a model for a real-world dataset.

Handling Cohorts

Here is a set of tutorial notebooks that aim to explain how to manage cohorts.

Here is a set of case study notebooks showing how creating customized dataprocessing pipelines for each cohort can help in some scenarios.

Dependencies

RAI Toolbox Mitigations uses several libraries internally. The direct dependencies are the following:

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Installing Using dev Mode

After cloning this repo and moving to its root folder, install the package in editable mode with the development dependencies using:

> pip install -e .[dev]

Pre-Commit

This repository uses pre-commit hooks to guarantee that the code format is kept consistent. For development, make sure to activate pre-commit before creating a pull request. Any code pushed to this repository is checked for code consistency using Github Actions, so if pre-commit is not used when doing a commit, there is a chance that it fails in the format check workflow. Using pre-commit will avoid this.

To use pre-commit with this repository, first install pre-commit (NOTE: when installing the package with the [dev] tag, the pre-commit package will already be installed):

> pip install pre-commit

After installed, navigate to the root directory of this repository and activate pre-commit through the following command:

> pre-commit install

With pre-commit installed and activated, whenever you do a new commit, pre-commit will check all new code using the pre-commit hooks configured in the .pre-commit-config.yaml file, located in the root of the repository. Some of the hooks might make formatting changes to some of the files commited. If any file is changed or if any other hook fails, the commit will fail. If that happens, make the necessary modifications, add the files to the commit and try commiting one more time. Do this until all hooks are successful. Note that these same checks will be done after pushing anything, so if your commit was successful while using pre-commit, it will pass in the format check workflow as well.

Updating the Docs

The documentation is built using Sphinx, Pandoc, and Graphviz (to build the class diagrams). Graphviz and Pandoc must be installed separately (detailed instructions here for Graphviz and here for Pandoc). On Linux, this can be done with apt or yum (depending on your distribution):

> sudo apt install graphviz pandoc
> sudo yum install graphviz pandoc

Make sure Graphviz and Pandoc are installed before recompiling the docs. After that, update the documentation files, which are all located inside the docs/ folder. Finally, use:

> cd docs/
> make html

To view the documentation, open the file docs/_build/html/index.html in your browser.

Note for Windows users: if you are trying to update the docs in a Windows environment, you might get an error regarding the _sqlite3 module:

ImportError: DLL load failed while importing _sqlite3: The specified module could not be found.

To fix this, following the instructions found in this link.

Support

How to file issues and get help

This project uses GitHub Issues to track bugs and feature requests. Please search the existing issues before filing new issues to avoid duplicates. For new issues, file your bug or feature request as a new Issue.

For help and questions about using this project, please post your question in Stack Overflow using the raimitigations tag.

Microsoft Support Policy

Support for this package is limited to the resources listed above.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Research and Acknowledgements

Current Maintainers: Marah Abdin, Matheus Mendonça, Dany Rouhana, Mark Encarnación

Past Maintainers: Akshara Ramakrishnan, Irina Spiridonova

Research Contributors: Besmira Nushi, Rahee Ghosh Peshawaria, Ece Kamar

responsible-ai-toolbox-mitigations's People

Contributors

akshara-msft avatar akshararama avatar danyrouh avatar dependabot[bot] avatar irinasp avatar marah-abdin avatar mesameki avatar microsoftopensource avatar morrissharp avatar mrfmendonca avatar ms-kashyap avatar yrajas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

responsible-ai-toolbox-mitigations's Issues

Code coverage in codecov website

Once this repo becomes public, the code coverage could be added to a codecov account, currently the codecov is just printed out in the workflow.

Configure pre-commit

This is not an issue, but it's something we should discuss, since there are some benefits to using pre-commit hooks in a repo. I believe it makes sense, since it will force all future pushes (either internal pushes or pushes from the community) to be consistent in some aspect. For example, I like to use the autopep8 pre-commit hook, since it enforces that the code is according to the PEP8 standards. And in some cases, it even changes the code automatically when you do the commit.

Case3.ipynb: Invalid column name `sick-euthyroid`

When running the CTGAN section of case3.ipynb in 5 - Synthetic Data, I receive the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case3.ipynb Cell 45 in <cell line: 11>()
      [8](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3.ipynb#ch0000041?line=7) synth.fit()
     [10](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3.ipynb#ch0000041?line=9) conditions = {label_col:1}	# create more of the undersampled class
---> [11](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3.ipynb#ch0000041?line=10) syn_train_x, syn_train_y = synth.transform(X=train_x_sel, y=train_y, n_samples=200, conditions=conditions)
     [13](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3.ipynb#ch0000041?line=12) syn_train_y.value_counts()

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\sampler\synthesizer.py:570, in Synthesizer.transform(self, df, X, y, n_samples, conditions, strategy)
    568 if n_samples is not None:
    569     print(df.columns, conditions)
--> 570     samples = self.model.sample(n_samples, conditions=conditions)
    571 else:
    572     samples = self._generate_samples_strategy(df, strategy)

File c:\Users\morrissharp\Miniconda3\envs\rai\lib\site-packages\sdv\tabular\base.py:451, in BaseTabularModel.sample(self, num_rows, max_retries, max_rows_multiplier, conditions, float_rtol, graceful_reject_sampling)
    449 for column in conditions.columns:
    450     if column not in self._metadata.get_fields():
--> 451         raise ValueError(f'Invalid column name `{column}`')
    453 try:
    454     transformed_conditions = self._metadata.transform(conditions, on_missing_column='drop')

ValueError: Invalid column name `sick-euthyroid`

I am not sure what is going on. sick-euthyroid appears to be the name of the pandas Series that is passed in (train_y)

TODO: fill out support.md

Support.md needs to be filled out as per these instructions.

# TODO: The maintainer of this repo has not yet edited this file
**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
- **No CSS support:** Fill out this template with information about how to file issues and get help.
- **Yes CSS support:** Fill out an intake form at [aka.ms/spot](https://aka.ms/spot). CSS will work with/help you to determine next steps. More details also available at [aka.ms/onboardsupport](https://aka.ms/onboardsupport).
- **Not sure?** Fill out a SPOT intake as though the answer were "Yes". CSS will help you decide.

Rebalance class status progress message

The Rebalance class status message contains a string related to imputation instead:

No columns specified for imputation. These columns have been automatically identified:
[]
Running oversampling...

ValueError: case1_stat.ipynb error in CTGAN Section

I am receiving the following error when running the CTGAN section of of case1_stat.ipynb

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in <cell line: 5>()
      [2](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=1) result_base = test_base(df, label_col, N_EXEC, MODEL_NAME)
      [3](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=2) result_df = add_results_df(None, result_base, "Baseline")
----> [5](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=4) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=False, feat_sel_type=None, art_str=0.6, savefile="1_1.pkl")
      [6](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=5) result_df = add_results_df(result_df, restult_fs, "CTGAN 0.6")
      [8](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=7) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=False, feat_sel_type=None, art_str=0.9, savefile="1_2.pkl")

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in test_ctgan_first(df, label_col, n_exec, model_name, rcorr, feat_sel_type, art_str, savefile)
    [245](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=244) if art_str is not None:
    [246](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=245) 	train_x, train_y = artificial_ctgan(train_x, train_y, art_str, savefile)
--> [247](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=246) train_x, test_x = encode_case1_train_test(train_x, test_x)
    [248](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=247) train_x, test_x = impute_case1_train_test(train_x, test_x)
    [249](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=248) if feat_sel_type is not None:

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in encode_case1_train_test(train_x, test_x)
     [55](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=54) def encode_case1_train_test(train_x, test_x):
     [56](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=55) 	enc_ord, enc_ohe = get_encoders(df)
---> [57](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=56) 	enc_ord.fit(train_x)
     [58](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=57) 	train_x_enc = enc_ord.transform(train_x)
     [59](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=58) 	test_x_enc = enc_ord.transform(test_x)

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\encoder\encoder.py:84, in DataEncoding.fit(self, df, y)
     82 self._set_column_to_encode()
...
    120         + "the order of the existing values of the column col_encode[i]. If a value is not given, "
    121         + "it will be assigned a None value."
    122     )

ValueError: ERROR: the value '24-26' provided to the the list of values for the key 'inv-nodes' in the 'categories' parameter does not match any of the unique values found in the column 'inv-nodes' of the dataset provided.

Not sure exactly what the cause is for this yet.

Additionally, while investigating this, I noticed a couple of other issues as well:

  • Ordinal Encoding is performed on age, tumor size, and inv-nodes. But the lexicographic sorting is being done since they are strings, and so there is not numeric sorting.
age_order, ['20-29' '30-39' '40-49' '50-59' '60-69' '70-79']
tumor_size_order, ['0-4' '10-14' '15-19' '20-24' '25-29' '30-34' '35-39' '40-44' '45-49'
 '5-9' '50-54']
inv_nodes_order, ['0-2' '12-14' '15-17' '24-26' '3-5' '6-8' '9-11']
  • get_encoders is called on df, but encode_case1_train_test() does not take in df as a parameter
	enc_ord, enc_ohe = get_encoders(df)```

feat_sel_sequential.ipynb example throwing key error in section 2 (no column names)

I am receiving the following error when running SeqFeatSelection on the example with no column names.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\module_tests\feat_sel_sequential.ipynb Cell 31 in <cell line: 2>()
      [1](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/module_tests/feat_sel_sequential.ipynb#ch0000030?line=0) feat_sel = SeqFeatSelection(n_jobs=1)
----> [2](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/module_tests/feat_sel_sequential.ipynb#ch0000030?line=1) feat_sel.fit(df=dataset, label_col=11)
      [3](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/module_tests/feat_sel_sequential.ipynb#ch0000030?line=2) feat_sel.get_selected_features()

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\feat_selection\selector.py:174, in FeatureSelection.fit(self, X, y, df, label_col)
    172 if self.in_place:
    173     self.df_org = self.df
--> 174 self._fit()
    175 self.set_selected_features()
    176 self.fitted = True

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\feat_selection\sequential_select.py:381, in SeqFeatSelection._fit(self)
    379 self._check_n_feat()
    380 self._check_fixed_columns()
--> 381 self._run_feat_selection()
    382 self._save_json()

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\feat_selection\sequential_select.py:345, in SeqFeatSelection._run_feat_selection(self)
    333     verbose = 2
    334 self.selector = SFS(
    335     self.estimator,
    336     k_features=self.n_feat,
...
--> 568 k_idx = self.subsets_[best_subset]['feature_idx']
    570 if self.k_features == 'parsimonious':
    571     for k in self.subsets_:

KeyError: None```

Update the version for the SDV library

Currently, this repo is using SDV v0.13.1, but as of today (August 2022), SDV v0.16.0 is already available. Also, v0.13.1 is causing some security issues with numpy. Therefore, this repo should update the SDV version used.

Balancing on multiple columns work around

Currently there is a work around in the end to end Jupyter notebook so that we can balance on two columns at the same time using any of the rebalancing techniques. While making API deseign changes to the Rebalance API, we should allow the user to specific multiple columns that they are interested in balancing on rather than creating a single column and balancing on that single column cohort

Correlated Features examples

I noticed a couple of issues here.

  1. There are two different examples feat_sel_corr_tutorial.ipynb and feat_sel_corr.ipynb. feat_sel_corr.ipynb does not contain any explanatory comments in the notebook. Maybe this one is extra and not needed?
  2. Both of these notebooks write the same json files to ./corr_json_examples/. The files that are part of the git repo are the ones belonging to feat_sel_corr.ipynb .

If the 2nd notebook is not needed, then the json files should be replaced with the ones belonging to the 1st example.

Update/fix/reorganize the documentation files for the databalanceanalysis package

I noted the following problems:

  • some of the classes don't have a documentation for the constructor class
  • some bullet points aren't formatted properly
  • the entire documentation of this package is contained in a single page. Maybe it would make sense to spread it out into multiple pages and create some sort of hierarchy among them

I believe it would be a good idea to just go over the docs for this package and make sure that everything is in order.

No module named 'seaborn' when importing the "cohort" module

After installing the package using pip install raimitigations, when I import any class from the cohort module, I get the following error:

> from .utils import fetch_cohort_results, plot_value_counts_cohort
> import seaborn as sns
No module named 'seaborn'

The error doesn't occur if I install the library using pip install raimitigations[all].

To fix, add seaborn to the base set of dependencies, not only to the [all] dependency group.

Seed cannot be set

"As we can see, this transformation had some impact in the results (depends on the seed used) when we use KNN. Let's check how this data transformation impacts the XGBoost model:"

The case2.ipynb notebook references the ability to set a seed. But, this is not available for either split_data() , train_model_plot_results() or train_model_fetch_results(). Additionally, I have noticed that that there is no possibility to pass any parameters to the model itself for instantiation/fitting (e.g. setting the number of neighbors for KNN).

I am not sure whether you expect these functions to be used outside of the example notebooks. But if yes, you should consider allowing the user to set a random seed, as well as pass in model parameters, possibly using something like *args **kwargs.

Case3_stat.ipynb: ValueError. mismatched shapes

In Case3_stat.ipynb case study notebook, in cell `Artificial Instances - CTGAN, I receive the following error:

ValueError                                Traceback (most recent call last)
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case3_stat.ipynb Cell 20 in <cell line: 8>()
      [5](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=4) result_tr = test_corr_transf(df, label_col, N_EXEC, dp.DataStandardScaler, MODEL_NAME, num_col)
      [6](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=5) result_df = add_results_df(result_df, result_tr, "Std.")
----> [8](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=7) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=True, scaler_ref=dp.DataStandardScaler, feat_sel_type=None, art_str=0.2, savefile="3_1.pkl")
      [9](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=8) result_df = add_results_df(result_df, restult_fs, "CTGAN 0.2 Std.")
     [11](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=10) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=True, scaler_ref=dp.DataStandardScaler, feat_sel_type=None, art_str=0.6, savefile="3_2.pkl")

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case3_stat.ipynb Cell 20 in test_ctgan_first(df, label_col, n_exec, model_name, rcorr, scaler_ref, num_col, feat_sel_type, art_str, savefile)
    [260](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=259) if art_str is not None:
    [261](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=260) 	train_x, train_y = artificial_ctgan(train_x, train_y, art_str, savefile)
--> [262](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=261) train_x, test_x = encode_case3_train_test(train_x, test_x)
    [263](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=262) train_x, test_x = impute_case3_train_test(train_x, test_x)
    [264](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=263) if feat_sel_type is not None:

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case3_stat.ipynb Cell 20 in encode_case3_train_test(train_x, test_x)
     [33](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=32) enc_ohe = dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
     [34](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=33) enc_ohe.fit(train_x)
---> [35](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=34) train_x_enc = enc_ohe.transform(train_x)
     [36](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=35) test_x_enc = enc_ohe.transform(test_x)
     [37](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=36) return train_x_enc, test_x_enc

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\encoder\encoder.py:108, in DataEncoding.transform(self, df)
    106 self._check_if_fitted()
...
    391 passed = values.shape
    392 implied = (len(index), len(columns))
--> 393 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (2582, 30), indices imply (2582, 29)

I am not sure if this is related to issue #31. But potentially there is an issue with the label_col not being included in the df when necessary.

No installation instructions

I have noticed that there are no installation instructions in the main README.md as well as the documentation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.