Giter Site home page Giter Site logo

kilinw / vq2d_cvpr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookresearch/vq2d_cvpr

0.0 0.0 0.0 8.68 MB

This repo contains the code for the recipe of the winning entry to the Ego4d VQ2D challenge at CVPR 2022.

License: MIT License

Shell 2.44% Python 97.56%

vq2d_cvpr's Introduction

Methods for Visual Queries 2D Localization

This repo is a codebase for our submission to the VQ2D task in Ego4D Challenge (CVPR22 and ECCV22). The aim of this repo is to help other researchers and challenge practitioners:

  • reproduce some of our experiment results and
  • leverage our pre-trained detection model for other tasks.

Currently, this codebase supports the following methods:

Updates

Introduction

We deals with the problem of localizing objects in image and video datasets from visual exemplars. In particular, we focus on the challenging problem of egocentric visual query localization. We first identify grave implicit biases in current query-conditioned model design and visual query datasets. Then, we directly tackle such biases at both frame and object set levels. Concretely, our method solves these issues by expanding limited annotations and dynamically dropping object proposals during training. Additionally, we propose a novel transformer-based module that allows for object-proposal set context to be considered while incorporating query information. We name our module Conditioned Contextual Transformer or CocoFormer. Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations. Thus, we can improve frame-level detection performance from 26.28% to 31.26% in AP, which correspondingly improves the VQ2D and VQ3D localization scores by significant margins. Our improved context-aware query object detector ranked first and second respectively in the VQ2D and VQ3D tasks in the 2nd Ego4D challenge.

Visulization

[easy] frying pan [hard] blue bin

Installation

Please find installation instructions in INSTALL.md. It includes system requirement, installation guide, and dataset preperation.

Quick Start

Run evaluate_vq2d_one_query.py with our release checkpoint to quickly see the result.

    python evaluate_vq2d_one_query.py \
        model.config_path=$PWD/checkpoint/train_log/slurm_8gpus_4nodes_cocoformer/output/config.yaml \
        model.checkpoint_path=$PWD/checkpoint/train_log/slurm_8gpus_4nodes_cocoformer/output/model_0064999.pth \
        data.split=val logging.visualize=True logging.save_dir=$PWD/visualizations

Bibtex

Our CVPR22 Challenge report is available on arXiv.

@article{xu2022negative,
  title={Negative Frames Matter in Egocentric Visual Query 2D Localization},
  author={Xu, Mengmeng and Fu, Cheng-Yang and Li, Yanghao and Ghanem, Bernard and Perez-Rua, Juan-Manuel and Xiang, Tao},
  journal={arXiv preprint arXiv:2208.01949},
  year={2022}
}

Our ECCV22 Challenge report is available on arXiv.

@article{xu2022where,
  doi = {10.48550/ARXIV.2211.10528},
  url = {https://arxiv.org/abs/2211.10528},
  author = {Xu, Mengmeng and Li, Yanghao and Fu, Cheng-Yang and Ghanem, Bernard and Xiang, Tao and Perez-Rua, Juan-Manuel},
  title = {Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization},  
  journal={arXiv preprint arXiv:2211.10528},
  year={2022}
}

License

Improved Baseline for Visual Queries 2D Localization is released under the MIT license.

Acknowledgements

This codebase relies on detectron2, Ego4d, and episodic-memory repositories.

vq2d_cvpr's People

Contributors

frostinassiky avatar jperezrua avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.