G^3: Geolocation via Guidebook Grounding

For inquiries and requests, please contact [email protected].

This repository contains the code accompanying our paper, which proposes a new method to use human-written guides to improve image geolocation, the task of predicting the location of an image. Our method uses explicit knowledge from human-written guidebooks that were created to help others improve their skills at playing GeoGuessr. The guidebooks describe the salient and class-discriminative visual features people have curated from experience. We evaluate on a novel dataset of StreetView images from a diverse set of locations. This code provides code and scripts to reproduce the results from our paper. We would also like to acknowledge the work Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification, which this repository builds off of.

Geolocation via Guidebook Grounding Dataset

The StreetView panorama ids and guidebook are available for download here. You can download all necessary files by running ./scripts/get_data_files.sh.

In our dataset train/val/test contain the StreetView panorama ids, guidebook.json contains the guidebook text, s2_cells/countries.csv contains the mapping from our predicted class labels to the human-readable country name. From the panorama ids you can download the panoramas then cut them to images, which is described further in the section "Getting StreetView Images".

train:
- train.csv: A csv file where each row corresponds to an image in the train set with metadata IMG_ID. For an image with IMG_ID YYcmZ_mdbshez6STxSxmRQ_0.png, the first 22 characters before the final underscore (YYcmZ_mdbshez6STxSxmRQ) corresponds to a pano_id in the StreetView API, and the last digit (_0.png) corresponds to a piece of the original panorama (where each panorama was split into four disjoint pieces).
val: Folder containing the val data in the same structure as train.
test: Folder containing the test data in the same structure as train.
guidebook.json: A json file containing a list of CLUE_ID, text, geoparsed (which countries were geoparsed from the text) corresponding to each guidebook clue mined from a human-written GeoGuessr guide.

We also include the following files which are used during training.

s2_cells/countries.csv: A csv file derived from the GeoNames database where each row corresponds to a ground truth country with metadata class_label (equivalent to hex_id), country, geonameid, latitude_mean, longitude_mean (the mean lat/lon of the country's landmass), geoJSON (a polygon representing the country's borders).
pseudo_labels/countries.json: A json file indexed by IMG_ID mapping to a list of CLUE_IDs which indicates for each StreetView image which guidebook clues correspond.
loss_weight/countries.json: A json file containing a list of weights for each country class used in the training country classification loss to account for data imbalance, where each index corresponds to class_label in s2_cells/countries.csv.
features:
- streetview_clip_rn50x16.pkl: A pickle file indexed by IMG_ID containing features for each StreetView image as generated by CLIP RN50x16 off-the-shelf.
- guidebook_roberta_base.pkl: A pickle file indexed by CLUE_ID containing features for each guidebook clue as generated by RoBERTa base off-the-shelf.

Getting StreetView Images

We provide the panorama ids for all images in our dataset at dataset/${split}/${split}.csv, where ${split} denotes either train, val, or test.

Download the panoramas using the API, and save in the folder dataset/${split}/panos with the file name <pano_id>.jpg. If you encounter issues downloading the panoramas for your research, please contact us.
Run all data preparation using the bash script ./scripts/process_streetview_images.sh. This script cuts the panoramas into images using scripts/panocutter.py, saves the images in msgpack format used during training using scripts/image_to_msgpack.py, and infers the image to label mapping using scripts/image_to_country.py. The images are stored in the compressed mspack format in shards, indexed by IMG_ID.

Training G^3

To run each row from our main table, run the following shell scripts.

Experiment	Attn Supervision	File
ISN	N/A	./quickstart/isn.sh
ISN + Random Text	N/A	./quickstart/isn_random_text.sh
ISN + Guidebook	No	./quickstart/isn_guidebook_no-attn-sup.sh
ISN + Guidebook	Yes	./quickstart/isn_guidebook.sh

ISN + CLIP	N/A	./quickstart/isn_clip.sh
ISN + CLIP + Random Text	N/A	./quickstart/isn_clip_random_text.sh
ISN + CLIP + Guidebook	No	./quickstart/isn_clip_guidebook_no-attn-sup.sh
G^3 = ISN + CLIP + Guidebook	Yes	./quickstart/isn_clip_guidebook.sh

You can also customize the arguments using our OmegaConf inheritance scheme. You can edit a config to have the field includes: [<parent_config.yml>] to inherit fields. You can also add fields via flags when running python -m classification.train.train_classification, for example adding model_params.name=<name> to override the default name in the provided config.

Evaluating G^3

To evaluate trained weights on the StreetView images, run the cells of notebooks/inference.ipynb.

Citing

If you find our dataset useful for your research, please, cite the following paper:

@article{luo2022geolocation,
  title={G^3: Geolocation via Guidebook Grounding},
  author={Luo, Grace and Biamby, Giscard and Darrell, Trevor and Fried, Daniel and Rohrbach, Anna},
  journal={Findings of EMNLP},
  year={2022}
}

g-luo / geolocation_via_guidebook_grounding Goto Github PK

geolocation_via_guidebook_grounding's Introduction

G^3: Geolocation via Guidebook Grounding

Geolocation via Guidebook Grounding Dataset

Getting StreetView Images

Training G^3

Evaluating G^3

Citing

geolocation_via_guidebook_grounding's People

Contributors

Stargazers

Watchers

Forkers

geolocation_via_guidebook_grounding's Issues

ask for dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent