In all of our commands and cases we expect the raw data to be downloaded in to the following folder ../target_data/
The training data can be found here and the test data here
Below we have created a number of notebooks to show how the package works and to explore some of the datasets that are commonly used.
In the following notebook we show how to load in the following two datasets that are commonly used in the literature and explore them with respect to the task of target extraction.
- SemEval 2014 task 4 -- Laptop domain.
- SemEval 2016 task 5 -- Restaurant domain. Training data can be found here and Test Gold data can be found here
- SemeEval 2014 task 4 - Laptop 1, 2, 3, 4, 5
- SemEval 2016 task 5 - Restaurant 1, 2, 4, 5
- SemEval 2014 task 4 - Restaurant 3, 5
- SemEval 2015 task 12 - Restaurant 3, 5
Papers that used those datasets numbers:
- https://www.aclweb.org/anthology/N19-1242
- https://www.aclweb.org/anthology/P18-2094
- https://www.aaai.org/Conferences/AAAI/2017/PreliminaryPapers/15-Wang-W-14441.pdf
- https://www.aclweb.org/anthology/D17-1310
- https://www.ijcai.org/proceedings/2018/0583.pdf
From what I gather of SemEval 2014 data you can have categories and no targets but I have not seen Vice Versa. I have also not seen but I assume you can have a sentence that has neither categories nor targets. There are the following 4 sentiments, positive, negative, neutral, and conflict. I think we want the following flags not_conflict and sentiment_to_nums
SemEval 2014 Laptop
python create_splits.py ../original_target_datasets/semeval_2014/SemEval\'14-ABSA-TrainData_v2\ \&\ AnnotationGuidelines/Laptop_Train_v2.xml ../original_target_datasets/semeval_2014/ABSA_Gold_TestData/Laptops_Test_Gold.xml semeval_2014 ../original_target_datasets/semeval_2014/laptop_json/train.json ../original_target_datasets/semeval_2014/laptop_json/val.json ../original_target_datasets/semeval_2014/laptop_json/test.json
SemEval 2014 Restaurant
python create_splits.py ../original_target_datasets/semeval_2014/SemEval\'14-ABSA-TrainData_v2\ \&\ AnnotationGuidelines/Restaurants_Train_v2.xml ../original_target_datasets/semeval_2014/ABSA_Gold_TestData/Restaurants_Test_Gold.xml semeval_2014 ../original_target_datasets/semeval_2014/restaurant_json/train.json ../original_target_datasets/semeval_2014/restaurant_json/val.json ../original_target_datasets/semeval_2014/restaurant_json/test.json
SemEval 2016 Restaurant
python create_splits.py ../original_target_datasets/semeval_2016/ABSA16_Restaurants_Train_SB1_v2.xml ../original_target_datasets/semeval_2016/EN_REST_SB1_TEST.xml.gold semeval_2016 ../original_target_datasets/semeval_2016/restaurant_json/train.json ../original_target_datasets/semeval_2016/restaurant_json/val.json ../original_target_datasets/semeval_2016/restaurant_json/test.json
allennlp train config_char.json -s /tmp/something --include-package target_extraction
They can be found within the following folder.