Giter Site home page Giter Site logo

g-luo / geolocation_via_guidebook_grounding Goto Github PK

View Code? Open in Web Editor NEW
10.0 4.0 2.0 8 KB

G^3: Geolocation via Guidebook Grounding, Findings of EMNLP 2022

Home Page: https://arxiv.org/abs/2211.15521

Python 78.49% Jupyter Notebook 7.34% Shell 14.17%

geolocation_via_guidebook_grounding's Introduction

G^3: Geolocation via Guidebook Grounding

For inquiries and requests, please contact [email protected].

This repository contains the code accompanying our paper, which proposes a new method to use human-written guides to improve image geolocation, the task of predicting the location of an image. Our method uses explicit knowledge from human-written guidebooks that were created to help others improve their skills at playing GeoGuessr. The guidebooks describe the salient and class-discriminative visual features people have curated from experience. We evaluate on a novel dataset of StreetView images from a diverse set of locations. This code provides code and scripts to reproduce the results from our paper. We would also like to acknowledge the work Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification, which this repository builds off of.

Geolocation via Guidebook Grounding Dataset

The StreetView panorama ids and guidebook are available for download here. You can download all necessary files by running ./scripts/get_data_files.sh.

In our dataset train/val/test contain the StreetView panorama ids, guidebook.json contains the guidebook text, s2_cells/countries.csv contains the mapping from our predicted class labels to the human-readable country name. From the panorama ids you can download the panoramas then cut them to images, which is described further in the section "Getting StreetView Images".

  • train:
    • train.csv: A csv file where each row corresponds to an image in the train set with metadata IMG_ID. For an image with IMG_ID YYcmZ_mdbshez6STxSxmRQ_0.png, the first 22 characters before the final underscore (YYcmZ_mdbshez6STxSxmRQ) corresponds to a pano_id in the StreetView API, and the last digit (_0.png) corresponds to a piece of the original panorama (where each panorama was split into four disjoint pieces).
  • val: Folder containing the val data in the same structure as train.
  • test: Folder containing the test data in the same structure as train.
  • guidebook.json: A json file containing a list of CLUE_ID, text, geoparsed (which countries were geoparsed from the text) corresponding to each guidebook clue mined from a human-written GeoGuessr guide.

We also include the following files which are used during training.

  • s2_cells/countries.csv: A csv file derived from the GeoNames database where each row corresponds to a ground truth country with metadata class_label (equivalent to hex_id), country, geonameid, latitude_mean, longitude_mean (the mean lat/lon of the country's landmass), geoJSON (a polygon representing the country's borders).
  • pseudo_labels/countries.json: A json file indexed by IMG_ID mapping to a list of CLUE_IDs which indicates for each StreetView image which guidebook clues correspond.
  • loss_weight/countries.json: A json file containing a list of weights for each country class used in the training country classification loss to account for data imbalance, where each index corresponds to class_label in s2_cells/countries.csv.
  • features:
    • streetview_clip_rn50x16.pkl: A pickle file indexed by IMG_ID containing features for each StreetView image as generated by CLIP RN50x16 off-the-shelf.
    • guidebook_roberta_base.pkl: A pickle file indexed by CLUE_ID containing features for each guidebook clue as generated by RoBERTa base off-the-shelf.

Getting StreetView Images

We provide the panorama ids for all images in our dataset at dataset/${split}/${split}.csv, where ${split} denotes either train, val, or test.

  1. Download the panoramas using the API, and save in the folder dataset/${split}/panos with the file name <pano_id>.jpg. If you encounter issues downloading the panoramas for your research, please contact us.

  2. Run all data preparation using the bash script ./scripts/process_streetview_images.sh. This script cuts the panoramas into images using scripts/panocutter.py, saves the images in msgpack format used during training using scripts/image_to_msgpack.py, and infers the image to label mapping using scripts/image_to_country.py. The images are stored in the compressed mspack format in shards, indexed by IMG_ID.

Training G^3

To run each row from our main table, run the following shell scripts.

Experiment Attn Supervision File
ISN N/A ./quickstart/isn.sh
ISN + Random Text N/A ./quickstart/isn_random_text.sh
ISN + Guidebook No ./quickstart/isn_guidebook_no-attn-sup.sh
ISN + Guidebook Yes ./quickstart/isn_guidebook.sh
ISN + CLIP N/A ./quickstart/isn_clip.sh
ISN + CLIP + Random Text N/A ./quickstart/isn_clip_random_text.sh
ISN + CLIP + Guidebook No ./quickstart/isn_clip_guidebook_no-attn-sup.sh
G^3 = ISN + CLIP + Guidebook Yes ./quickstart/isn_clip_guidebook.sh

You can also customize the arguments using our OmegaConf inheritance scheme. You can edit a config to have the field includes: [<parent_config.yml>] to inherit fields. You can also add fields via flags when running python -m classification.train.train_classification, for example adding model_params.name=<name> to override the default name in the provided config.

Evaluating G^3

To evaluate trained weights on the StreetView images, run the cells of notebooks/inference.ipynb.

Citing

If you find our dataset useful for your research, please, cite the following paper:

@article{luo2022geolocation,
  title={G^3: Geolocation via Guidebook Grounding},
  author={Luo, Grace and Biamby, Giscard and Darrell, Trevor and Fried, Daniel and Rohrbach, Anna},
  journal={Findings of EMNLP},
  year={2022}
}

geolocation_via_guidebook_grounding's People

Contributors

g-luo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

geolocation_via_guidebook_grounding's Issues

ask for dataset

Thank you very much for the excellent work you do, do you have the annotation file corresponding to the image id and geolocation? Also when I use the api to get the panoramic image, I found that Google's api doesn't support direct download of panoramic images, do you have the corresponding panoramic image files? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.