Giter Site home page Giter Site logo

sapeirone / egopack Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 0.0 9.03 MB

Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2024.

Home Page: https://sapeirone.github.io/EgoPack/

License: MIT License

Python 99.51% Shell 0.49%
egocentric-vision multi-task-learning

egopack's Introduction

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives (CVPR 2024)

Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Giuseppe Averta

Politecnico di Torino

This is the official PyTorch implementation of our work "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2024.

Abstract: Human comprehension of a video stream is naturally broad: in a few instants, we are able to understand what is happening, the relevance and relationship of objects, and forecast what will follow in the near future, everything all at once. We believe that - to effectively transfer such an holistic perception to intelligent machines - an important role is played by learning to correlate concepts and to abstract knowledge coming from different tasks, to synergistically exploit them when learning novel skills. To accomplish this, we seek for a unified approach to video understanding which combines shared temporal modelling of human actions with minimal overhead, to support multiple downstream tasks and enable cooperation when learning novel skills. We then propose EgoPack, a solution that creates a collection of task perspectives that can be carried across downstream tasks and used as a potential source of additional insights, as a backpack of skills that a robot can carry around and use when needed. We demonstrate the effectiveness and efficiency of our approach on four Ego4d benchmarks, outperforming current state-of-the-art methods.

Overview

This architecture of EgoPack is structured around a two phase training. First, a multi-task model is trained on a subset of the tasks, e.g., AR, LTA and PNR. Then, the multi-task model is used to bootstrap the EgoPack architecture to learn a novel task, e.g., OSCC.

Getting started

  1. Download all submodules of this repository
git submodule update --init --recursive
  1. Create a conda environment for the project with all the dependendencies listed in the environment.yaml file.
conda env create -f  environment.yaml
conda activate egopack-env
  1. Download the Ego4D annotations and Omnivore features from https://ego4d-data.org/ under the data/ directory:

    • data/ego4d/raw/annotations/v1: *.json and *.csv annotations from Ego4D.
    • data/ego4d/raw/features/omnivore_video_swinl: Omnivore features as *.pt files.
  2. Some example scripts are provided in the experiments/ directory and can be executed using the wandb sweep path/to/config.yaml command.

WandB Integration

The code is organized to heavily rely on WandB to run experiments and save checkpoints. In particular, experiments are defined as WandB sweeps with the random seed as one of the experiment's parameter. By doing so, you can easily group experiment by name on the WandB dashboard and evaluate the metrics on the average of three different runs, to obtain more consistent results.

NOTE: A refactored version of this code, with more experiment configs and no dependency on WandB will be released soon (no timeline yet).

Cite Us

@inproceedings{peirone2024backpack,
    title={A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives}, 
    author={Simone Alberto Peirone and Francesca Pistilli and Antonio Alliegro and Giuseppe Averta},
    year={2024},
    booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}
}

egopack's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.