Giter Site home page Giter Site logo

lshy687 / feature-importance Goto Github PK

View Code? Open in Web Editor NEW

This project forked from boulderds/feature-importance

0.0 0.0 0.0 71.26 MB

特征可解释性分析

JavaScript 3.30% C++ 0.22% Python 3.00% CSS 0.01% TeX 0.01% HTML 0.01% Jupyter Notebook 93.48%

feature-importance's Introduction

To save training time, all models used in the three different datasets are provided in the following path /data/<dataset_name>/models e.g., /data/deception/models. BERT parameters should be stored in the following path data/<dataset_name>/bert_fine_tune. Please download the folders from this link: https://tinyurl.com/bert-fine-tune-folder. Note that folders can be huge and may take time to download.

1. Generate top 10 features and their feature importance for svm, svm_l1, xgb, and lstm.

  1. To save svm, svm_l1, xgb, and lstm features and their feature importance, run save_combinations.py.
    • Note that only save_combinations.py uses the downloaded shap package instead of the local one. As such, before running save_combinations.py, remember to set package path to run the downloaded shap package. Otherwise, simply rename local shap folder to something else so that save_combinations.py does not read from the local package. If you renamed local shap folder, remember to revert to the original folder name after running save_combinations.py so other files will not be affected.
  2. To save lstm attention weights, run get_lstm_att_weights.py.
  3. To save lstm SHAP, run python get_lstm_shap.py <dataset_name>.

2. Generate top 10 features and their feature importance for bert.

  1. Generate tsv files for bert:
    1. deception: run python data_retrieval.py deception
    2. yelp: run python data_retrieval.py yelp
    3. sst: run python data_retrieval.py sst
  2. To save bert attention weights:
    1. deception: run python bert_att_weight_retrieval.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
    2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
    3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
  3. To save bert LIME:
    1. deception: run python bert_lime.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
    2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
    3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
  4. To save bert SHAP:
    1. deception: run python bert_shap.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
    2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
    3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
  5. Generate bert spans and white spans:
    1. deception: run python tokenizer_alignment.py --data_dir data/deception --bert_model data/deception/bert_fine_tune --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300
    2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
    3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
  6. Align all bert features/tokens with correct weights, run python get_bert.py. Note: to generate bert related feature and its feature importance, it is important to follow the above steps in order.

3. Recreate analysis plots.

  1. To generate plots in the paper, refer to interactive notebook main.ipynb.

If met with any problems, please send an email to [email protected] and [email protected].

4. Paper and citation

Paper: https://arxiv.org/abs/1910.08534

@article{lai2019many,
  title={Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification},
  author={Lai, Vivian and Cai, Jon Z and Tan, Chenhao},
  journal={arXiv preprint arXiv:1910.08534},
  year={2019}
}

feature-importance's People

Contributors

vivlai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.