Giter Site home page Giter Site logo

avsd's Introduction

2D-MapFormer

image

Source Code for my master thesis "2D-MapFormer: 2D-Map Transformer for Audio-Visual Scene-Aware Dialogue and Reasoning" (Currently not published).

The Source Code is derived from

  • AVSD-DSTC10 Baseline: Link
  • 2D-Tan module: Link

Usage

  1. Requirments
    • conda
    • wandb
  2. Environments Setting
    . ./setup.sh
    
  3. Download I3D and VGGish pretrained features
    . ./download_data.sh
    python3 utils/combine_files.py # combine feature files into ./data/features/train.pkl and ./data/features/test.pkl
    
  4. Train model
    1. Specify the exp_name in the run.sh. The trained model and model outputs will stored in ./log/{exp_name}/. It will also be the experiment name of wandb
    2. Specify the procedure='train_test'
    3. Specify other hyperparameters. Please see run.sh and main.py for more details.
    4. run . ./run.sh.
      1. It will run training and testing automatically
      2. You will see the following procedure in the command line
        train 15, tan:0.125, dig:2.272: 100%|█████| 4787/4787 [21:15<00:00,  3.75it/s]
        train 15, tan:0.112, dig:2.153
        val   15, tan:0.087, dig:1.985: 100%|█████| 1117/1117 [06:12<00:00,  3.00it/s]
        val   15, tan:0.109, dig:2.295
        The best metric was  for 0 epochs.
        Expected early stop @ 19
        train 16, tan:0.094, dig:2.097: 100%|█████| 4787/4787 [21:10<00:00,  3.77it/s]
        train 16, tan:0.112, dig:2.136
        val   16, tan:0.088, dig:2.005: 100%|█████| 1117/1117 [06:11<00:00,  3.01it/s]
        val   16, tan:0.109, dig:2.298
        
      3. You will see the following test result in the command line
        DSTC10_beam_search result:
        | Bleu_1: 68.7000
        | Bleu_2: 55.5832
        | Bleu_3: 45.4938
        | Bleu_4: 37.5887
        | METEOR: 24.3038
        | ROUGE_L: 53.4955
        | CIDEr: 86.9928
        | IoU-1: 54.7007
        | IoU-2: 57.6148
        

Model Architecture

image
Model Overview
image
Audio Visual Encoder
image
Sentence Cross Attention
image
Update Gate

avsd's People

Contributors

axotzero avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.