Giter Site home page Giter Site logo

vsformer's Introduction

VSFormer

Official implementation of VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning.

The paper has been accepted by AAAI 2024.

Introduction

VSFormer is able to identify inliers and recover camera poses accurately. Firstly, highly abstract visual cues of a scene are obtained with the cross attention between local features of two-view images. Then, these visual cues and correspondences are modeled by a joint visual-spatial fusion module, simultaneously embedding visual cues into correspondences for pruning. Additionally, to mine the consistency of correspondences, a novel module that combines the KNN-based graph and the transformer, effectively captures both local and global contexts.

Requirements

Installation

We recommend using Anaconda or Miniconda. To setup the environment, follow the instructions below.

conda create -n vsformer python=3.8 --yes
conda activate vsformer
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=11.0 -c pytorch --yes
python -m pip install -r requirements.txt

Dataset

Follow the instructions provided here for downloading and preprocessing datasets. The packaged dataset should be put in the data_dump/ and directory structure should be:

$VSFormer
    |----data_dump
      |----yfcc-sift-2000-train.hdf5
      |----yfcc-sift-2000-val.hdf5
      |----yfcc-sift-2000-test.hdf5
      ...

Training & Evaluation

  1. If you have multiple gpus, it is recommended to use train_multi_gpu.py for training.
# train by multiple gpus
CUDA_VISIBLE_DEVICES=0,1 nohup python -u -m torch.distributed.launch --nproc_per_node=2 --use_env train_multi_gpu.py >./logs/vsformer_yfcc.txt 2>&1 &

# train by single gpu
nohup python -u train_single_gpu.py >./logs/vsformer_yfcc.txt 2>&1 &
  1. Evaluation
python test.py

Acknowlegment

This repo benefits from OANet and CLNet. Thanks for their wonderful works.

Citation

Thanks for citing our paper:

@inproceedings{liao2024vsformer,
  title={VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning},
  author={Liao, Tangfei and Zhang, Xiaoqin and Zhao, Li and Wang, Tao and Xiao, Guobao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={4},
  pages={3369--3377},
  year={2024}
}

vsformer's People

Contributors

sugar-fly avatar

Stargazers

simu avatar  avatar Ayiyayi avatar  avatar Tao Wang avatar  avatar Fabio Milentiansen Sim avatar Zhiyong Wang avatar Giseop Kim avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.