Giter Site home page Giter Site logo

jjhw / cross-view-ag Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lhc1224/cross-view-ag

0.0 0.0 0.0 4.63 MB

Official PyTorch Implementation of Learning Affordance Grounding from Exocentric Images, CVPR 2022

License: MIT License

Shell 0.08% Python 99.92%

cross-view-ag's Introduction

Learning Affordance Grounding from Exocentric Images

PyTorch implementation of our Cross-view-AG models. This repository contains PyTorch evaluation code, training code.

  1. ๐Ÿ“Ž Paper Link
  2. ๐Ÿ’ก Abstract
  3. ๐Ÿ“– Method
  4. ๐Ÿ“‚ Dataset
  5. ๐Ÿ“Š Experimental Results
  6. โœ‰๏ธ Statement
  7. โœจ Other Relevant Works
  8. ๐Ÿ” Citation

๐Ÿ“Ž Paper Link

Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

  • Grounded Affordance from Exocentric View (Extended version) [pdf]

Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

๐Ÿ’ก Abstract

Affordance grounding, a task to ground (i.e., localize) action possibility region in objects, which faces the challenge of establishing an explicit link with object parts due to the diversity of interactive affordance. Human has the ability that transform the various exocentric interactions to invariant egocentric affordance so as to counter the impact of interactive diversity. To empower an agent with such an ability, this paper proposes a task of affordance grounding from exocentric view, i.e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision. To this end, we devise a cross-view knowledge transfer framework that extracts affordance-specific features from exocentric interactions and enhances the perception of affordance regions by preserving affordance correlation. Specifically, an Affordance Invariance Mining module is devised to extract specific clues by minimizing the intra-class differences originated from interaction habits in exocentric images. Furthermore, an Affordnace Co-relation Preserving strategy is presented to perceive and localize affordance by aligning the co-relation matrix of predicted results between the two views. Particularly, an affordance grounding dataset named AGD20K is constructed by collecting and labeling over 20K images from 36 affordance categories. Experimental results demonstrate that our method outperforms the representative methods in terms of objective metrics and visual quality.


Observation. By observing the exocentric diverse interactions, the human learns affordance knowledge determined by the objectโ€™s intrinsic properties and transfer it to the egocentric view.


Motivation. (a) Exocentric interactions can be decomposed into affordance-specific features M and differences in individual habits E. (b) There are co-relations between affordances, e.g.โ€œCut withโ€ inevitably accompanies โ€œHoldโ€ and is independent of the object category (knife and scissors). Such co-relation is common between objects. In this paper, we mainly consider extracting affordance-specific cues M from diverse interactions while preserving the affordance co-relations to enhance the perceptual capability of the network.

๐Ÿ“– Method


Overview of the proposed cross-view knowledge transfer affordance grounding framework. It mainly consists of an Affordance Invariance Mining (AIM) module and an Affordance Co-relation Preservation (ACP) strategy. The AIM module (see in Sec. 3.1) aims to obtain invariant affordance representations from diverse exocentric interactions. The ACP strategy (see in Sec. 3.2) enhances the networkโ€™s affordance perception by aligning the co-relation of the outputs of the two views.

๐Ÿ“‚ Dataset


The properties of the AGD20K dataset. (a) Some examples from the dataset. (b) The distribution of categories in AGD20K. (c) The word cloud distribution of affordances in AGD20K. (d) Confusion matrix between the affordance category and the object category in AGD20K, where the horizontal axis denotes the object category and the vertical axis denotes the affordance category.

๐Ÿ“Š Experimental Results


The results of different methods on AGD20k. The best results are in bold. โ€œSeenโ€ means that the training set and the test set contain the same object categories, while โ€œUnseenโ€ means that the object categories in the training set and the test set do not overlap. The * defines the relative improvement of our method over other methods. โ€œDark redโ€, โ€œOrangeโ€ and โ€œPurpleโ€ represent saliency detection, weakly supervised object localization and affordance grounding models, respectively.


Visual affordance heatmaps on the AGD20K dataset. We select the prediction results of representative methods of affordance grounding (Hotspots [33]), weakly supervised object localization (EIL [30]), and saliency detection (DeepGazeII [21]) for presentation.

โœ‰๏ธ Statement

This project is for research purpose only, please contact us for the licence of commercial use. For any other questions please contact [email protected] or [email protected].

โœจ Other Relevant Works

1.The paper "One-Shot Affordance Detection" was accepted by IJCAI2021 and the corresponding paper and code are available from https://github.com/lhc1224/OSAD_Net.

2.The paper "Phrase-Based Affordance Detection via Cyclic Bilateral Interaction" was accepted by IEEE Transactions on Artificial Intelligence (T-AI). The language-annotated PAD-L dataset is available for download via [ link ], and related papers and code can be downloaded from the [link].

3.The paper "Grounding 3D Object Affordance from 2D Interactions in Images" and the corresponding code are obtained from https://github.com/yyvhang/IAGNet.

๐Ÿ” Citation

@inproceedings{Learningluo,
  title={Learning Affordance Grounding from Exocentric Images},
  author={Luo, Hongchen and Zhai, Wei and Zhang, Jing and Cao, Yang and Tao, Dacheng},
  booktitle={CVPR},
  year={2022}
}

cross-view-ag's People

Contributors

lhc1224 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.