Giter Site home page Giter Site logo

wlml's Introduction

WLML

This is the code for the paper "Temporal Plot and Semantic Mining for Weakly-Supervised Language Moment Localization".

Framework

alt text

Main Results

Main results on Charades-STA

Method [email protected] [email protected] [email protected] [email protected]
LOC-PSE (C3D) 38.41 21.24 83.09 49.25
LOC-PSE (VGG) 41.32 23.74 82.82 49.22

Main results on ActivityNet Captions

Method [email protected] [email protected] [email protected] [email protected]
LOC-PSE (C3D) 56.50 34.38 83.75 71.01
LOC-PSE (ResNet152) 57.11 34.11 83.13 70.08

Prerequisites

  • python 3.7
  • pytorch 1.7.1
  • torchvision 0.8.2
  • torchtext 0.8.1
  • pattern 3.6
  • easydict
  • terminaltables

Testing

Please download the data from google drive. Because the data is large, we split it into four files. Please use the following commands to merge them and untar it to the WLML/ folder.

cd WLML
cat data.tar.gz.* > data.tar.gz
tar -zxvf data.tar.gz

We also provide the VGG video feature for Charades-STA and ResNet152 video feature for ActivityNet Captions. They are in data2.tar.gz.*.

Our trained models are provided in google drive. Please download them and also untar them to the WLML/ folder.

Then, run the following commands for reproducing the results in our paper:

# Evaluate CT-SP-16L
cd CT_SP
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/charades/charades-16L.yaml --verbose --split test
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/activitynet/activitynet-16L.yaml --verbose --split test

# Evaluate CT-SP-32L
cd CT_SP
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/charades/charades-32L.yaml --verbose --split test
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/activitynet/activitynet-32L.yaml --verbose --split test

# Evaluate CT-SP-MS
cd CT_SP_MS
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/charades/charades-ms.yaml --verbose --split test
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/activitynet/activitynet-ms.yaml --verbose --split test

# Evaluate LOC-PSE (Ours)
cd LOC_PSE
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/charades/charades-loc-pse.yaml --verbose --split test
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/activitynet/activitynet-loc-pse.yaml --verbose --split test

To reproduce the results using VGG and ResNet152 video features, please replace the charades-*.yaml with charades-*-vgg.yaml and replace activitynet-*.yaml with activitynet-*-resnet152.yaml. Besides, before evaluating LOC-PSE, please first evaluate CT-SP-MS to produce the last score curve which is loaded in the dataloader of LOC_PSE. Note that the score curve is not used during inferring, it is only used to generate pseudo ground truths in the training stage of LOC_PSE.

Training

To re-train the models, please use the following commands:

# train the model for genrating matching score curve at the scale of 16
cd CT_SP
CUDA_VISIBLE_DEVICES=0 python moment_localization/train.py --cfg experiments/charades/charades-16L.yaml --verbose
CUDA_VISIBLE_DEVICES=0 python moment_localization/train.py --cfg experiments/activitynet/activitynet-16L.yaml --verbose

# train the model for genrating matching score curve at the scale of 32
cd CT_SP
CUDA_VISIBLE_DEVICES=0 python moment_localization/train.py --cfg experiments/charades/charades-32L.yaml --verbose
CUDA_VISIBLE_DEVICES=0 python moment_localization/train.py --cfg experiments/activitynet/activitynet-32L.yaml --verbose

# fuse the matching score curves of the two scales
cd CT_SP_MS
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/charades/charades-ms.yaml --verbose --split train
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/charades/charades-ms.yaml --verbose --split val
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/charades/charades-ms.yaml --verbose --split test

CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/activitynet/activitynet-ms.yaml --verbose --split train
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/activitynet/activitynet-ms.yaml --verbose --split val
CUDA_VISIBLE_DEVICES=0 python moment_localization/test.py --cfg experiments/activitynet/activitynet-ms.yaml --verbose --split test

# train the final localization model
cd LOC_PSE
CUDA_VISIBLE_DEVICES=0 python moment_localization/train.py --cfg experiments/charades/charades-loc-pse.yaml --verbose
CUDA_VISIBLE_DEVICES=0 python moment_localization/train.py --cfg experiments/activitynet/activitynet-loc-pse.yaml --verbose

Note that after training the models for genrating matching score curves of the two scales, we need to set the MODEL.CHECKPOINT_2 and MODEL.CHECKPOINT_3 options in charades-ms.yaml or activitynet-ms.yaml to deploy the right models to fuse the score curves.

To re-train the models using VGG and ResNet152 video features, please also replace the charades-*.yaml with charades-*-vgg.yaml and replace activitynet-*.yaml with activitynet-*-resnet152.yaml.

wlml's People

Contributors

anonymousabca avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.