AI2's Mosaic Team has created benchmark datasets for various Commonsense Understanding tasks. To keep track of progress, each dataset is associated with a leaderboard linked here: https://leaderboard.allenai.org/
This repository provides implementations for baselines and evaluation scripts for each dataset.
- αNLI: Abductive Natural Language Inference
- Evaluator
- Random Baseline
- VCR: Visual Commonsense Reasoning
- Evaluator
- Random Baseline
- HellaSwag: Can a Machine Really Finish Your Sentence?
- Evaluator
- Random Baseline
- Social IQA: Commonsense Reasoning about Social Interactions
- Evaluator
- Random Baseline
- Physical IQA: Commonsense Reasoning about Physical Interactions
- Evaluator
- Random Baseline