Giter Site home page Giter Site logo

ccvpe's Introduction

(Accepted by T-PAMI) CCVPE: Convolutional Cross-View Pose Estimation

[Paper] [Demo Video] [BibTeX]

This work is an extension of "Visual Cross-View Metric Localization with Dense Uncertainty Estimates, ECCV2022"

Demo video of per-frame pose estimation on Oxford RobotCar traversals with different weather and lighting conditions

CCVPE Demo Video on Oxford RobotCar

Pose estimation (localization + orientation estimation) on images with different horizontal field-of-view (HFoV). From left to right: HFoV= $360 &deg$, $180 &deg$, $108 &deg$

Abstract

We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translational equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 meter and a median orientation error of around 1 degree at 14 FPS.

Datasets

VIGOR dataset can be found at https://github.com/Jeff-Zilence/VIGOR. We use the revised ground truth from https://github.com/tudelft-iv/SliceMatch
KITTI dataset can be found at https://github.com/shiyujiao/HighlyAccurate
For Oxford RobotCar, the aerial image is provided by https://github.com/tudelft-iv/CrossViewMetricLocalization, the ground images are from https://robotcar-dataset.robots.ox.ac.uk/datasets/

Models

Our trained models are available at: https://surfdrive.surf.nl/files/index.php/s/cbyPn7NQoOOzlqp

Training and testing

Training or testing on VIGOR dataset:
samearea split: python train_VIGOR.py --area samearea
crossarea split: python train_VIGOR.py --area crossarea
For testing, add argument --training False
For testing with an orientation prior that contains up to $&plusmn X &deg$ noise, e.g. $&plusmn 72 &deg$, add the argument --ori_noise 72. $X=0$ corresponds to testing with known orientation
For testing with images with a limited HFoV, e.g. $180 &deg$, add the argument --FoV 180

Training on KITTI dataset: python train_KITTI.py
For testing, add argument --training False
For training or testing with an orientation prior, e.g. $&plusmn 10 &deg$, add argument --rotation_range 10
We also provide the model trained with $&plusmn 10 &deg$ orientation prior, please change the test_model_path in train_KITTI.py

Training or testing on Oxford RobotCar dataset:
python train_OxfordRobotCar.py or python train_OxfordRobotCar.py --training False

Visualize qualitative results

Visualize qualitative results on VIGOR same-area or cross-area test set:
python visualize_qualitative_results_VIGOR.py --area samearea --ori_prior 180 --idx 0
idx: image index in VIGOR test set
ori_prior: $X$ means assuming known orientation with $&plusmn X &deg$ noise, $180$ means no orientation prior

Citation

@article{xia2023convolutional,
  title={Convolutional Cross-View Pose Estimation},
  author={Xia, Zimin and Booij, Olaf and Kooij, Julian FP},
  journal={arXiv preprint arXiv:2303.05915},
  year={2023}
}

ccvpe's People

Contributors

ziminxia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.