The feature-importance from lshy687

To save training time, all models used in the three different datasets are provided in the following path /data/<dataset_name>/models e.g., /data/deception/models. BERT parameters should be stored in the following path data/<dataset_name>/bert_fine_tune. Please download the folders from this link: https://tinyurl.com/bert-fine-tune-folder. Note that folders can be huge and may take time to download.

1. Generate top 10 features and their feature importance for `svm`, `svm_l1`, `xgb`, and `lstm`.

To save svm, svm_l1, xgb, and lstm features and their feature importance, run save_combinations.py.
- Note that only save_combinations.py uses the downloaded shap package instead of the local one. As such, before running save_combinations.py, remember to set package path to run the downloaded shap package. Otherwise, simply rename local shap folder to something else so that save_combinations.py does not read from the local package. If you renamed local shap folder, remember to revert to the original folder name after running save_combinations.py so other files will not be affected.
To save lstm attention weights, run get_lstm_att_weights.py.
To save lstm SHAP, run python get_lstm_shap.py <dataset_name>.

2. Generate top 10 features and their feature importance for `bert`.

Generate tsv files for bert:
1. deception: run python data_retrieval.py deception
2. yelp: run python data_retrieval.py yelp
3. sst: run python data_retrieval.py sst
To save bert attention weights:
1. deception: run python bert_att_weight_retrieval.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
To save bert LIME:
1. deception: run python bert_lime.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
To save bert SHAP:
1. deception: run python bert_shap.py --data_dir data/deception --bert_model data/deception/bert_fine_tune/ --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300 --eval_batch_size 1
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
Generate bert spans and white spans:
1. deception: run python tokenizer_alignment.py --data_dir data/deception --bert_model data/deception/bert_fine_tune --task_name sst-2 --output_dir /data/temp_output_dir/deception/ --do_eval --max_seq_length 300
2. yelp: run the above command, but replace deception with yelp, and change max_seq_length to 512
3. sst: run the above command, but replace deception with sst, and change max_seq_length to 128
Align all bert features/tokens with correct weights, run python get_bert.py. Note: to generate bert related feature and its feature importance, it is important to follow the above steps in order.

3. Recreate analysis plots.

To generate plots in the paper, refer to interactive notebook main.ipynb.

If met with any problems, please send an email to [email protected] and [email protected].

4. Paper and citation

Paper: https://arxiv.org/abs/1910.08534

@article{lai2019many,
  title={Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification},
  author={Lai, Vivian and Cai, Jon Z and Tan, Chenhao},
  journal={arXiv preprint arXiv:1910.08534},
  year={2019}
}

lshy687 / feature-importance Goto Github PK

feature-importance's Introduction

1. Generate top 10 features and their feature importance for `svm`, `svm_l1`, `xgb`, and `lstm`.

2. Generate top 10 features and their feature importance for `bert`.

3. Recreate analysis plots.

4. Paper and citation

feature-importance's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

lshy687 / feature-importance Goto Github PK

feature-importance's Introduction

1. Generate top 10 features and their feature importance for svm, svm_l1, xgb, and lstm.

2. Generate top 10 features and their feature importance for bert.

3. Recreate analysis plots.

4. Paper and citation

feature-importance's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

1. Generate top 10 features and their feature importance for `svm`, `svm_l1`, `xgb`, and `lstm`.

2. Generate top 10 features and their feature importance for `bert`.