This tutorial aims to identify geographical biases propagated by LLMs. For this purpose, 4 indicators are proposed.
- Spatial disparities in geographical knowledge.
- Spatial information coverage in training datasets.
- Correlation between geographic distance and semantic distance.
- Anomaly between geographical distance and semantic distance.
Fig. 1: Average semantic distances between the three most populous cities in a country compared to other cities worldwide.
Rémy Decoupes |
Maguelonne Teisseire |
Mathieu Roche |
Acknowledgement:
This study was partially funded by EU grant 874850 MOOD and is catalogued as MOOD099. The contents of this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission
If you find this work helpful or refer to it in your research, please consider citing:
- Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations, Rémy Decoupes, Roberto Interdonato, Mathieu Roche, Maguelonne Teisseire, Sarah Valentin.
- PAKDD'24: See slides.