Giter Site home page Giter Site logo

duong-nd / youtube8mchallenge Goto Github PK

View Code? Open in Web Editor NEW

This project forked from miha-skalic/youtube8mchallenge

0.0 2.0 0.0 15.51 MB

1st place solution to Kaggle's 2018 YouTube-8M Video Understanding Challenge

License: Apache License 2.0

Python 100.00%

youtube8mchallenge's Introduction

Next top GB model solution to The 2nd YouTube-8M Video Understanding Challenge

This repository contains all code used by first placed team "Next top GB model" (David and Miha skalic) in the Kaggle's competition The 2nd YouTube-8M Video Understanding Challenge.

The repository is a fork of google's repository and borrows from Wang et al, Miech et al and Skalic et al. Code is released under Apache License Version 2.0.

This readme walks through a specific example to reproduce training, eval, distillation, quantization, graph combination for a single model type.

Background

All models herein were trained in single GPU mode and the instructions that follow will reproduce this step. The overall flow for training each model is as follows:

  1. Train model
  2. Evaluate model
  3. (Optional) Perform EMA- Exponentially weighted Moving Average of weights
  4. Quantize model
  5. Perform inference on quantized model
  6. (Optional) If running distillation, create distillation dataset, then repeat from step 1.
  7. Combine multiple graphs into single graph

This readme will walk through all commands to train both stand-alone and a distillation model. For model details for other models see all_models.txt.

Requirements

All code was run using Python 2.7 and Tensorflow 1.8.0. All models were trained on GPU's. The requirements.txt file contains a list of all libraries installed in the environment used for training and testing. While all libraries are not required, the having the full list should ensure complete compatibility with all code.

Training Models

Make sure to set your local paths correctly for the train and save paths:

export CUDA_VISIBLE_DEVICES=0
SAVEPATH="../trained_models"
RECORDPAT="../data/frame/train"


python train.py \
  --train_data_pattern="$RECORDPAT/*.tfrecord" \
  --model=NetVLADModelLF \
  --train_dir="$SAVEPATH//NetVLAD" \
  --frame_features=True --feature_names="rgb,audio" \
  --feature_sizes="1024,128" \
  --batch_size=160 --base_learning_rate=0.0002 \
  --netvlad_cluster_size=256 \
  --netvlad_hidden_size=1024 \
  --moe_l2=1e-6 --iterations=300 \
  --learning_rate_decay=0.8 \
  --netvlad_relu=False \
  --gating=True \
  --moe_prob_gating=True \
  --lightvlad=False \
  --num_gpu 1 \
  --num_epochs=10 \

Eval model

Once training is complete, eval is performed as follows:

RECORDPATVAL="../data/frame/train"

python eval.py \
  --eval_data_pattern="$RECORDPATVAL/*.tfrecord" \
  --model=NetVLADModelLF \
  --train_dir="$SAVEPATH//NetVLAD" \
  --frame_features=True --feature_names="rgb,audio" \
  --feature_sizes="1024,128" \
  --batch_size=160 --base_learning_rate=0.0002 \
  --netvlad_cluster_size=256 \
  --netvlad_hidden_size=1024 \
  --moe_l2=1e-6 --iterations=300 \
  --learning_rate_decay=0.8 \
  --netvlad_relu=False \
  --gating=True \
  --moe_prob_gating=True \
  --lightvlad=False \
  --num_gpu 1 \
  --num_epochs=10 \
  --run_once \
  --build_only \
  --sample_all

Perform EMA

python train.py \
  --train_data_pattern="$RECORDPAT/*.tfrecord" \
  --model=NetVLADModelLF \
  --train_dir="$SAVEPATH//NetVLAD_ema" \
  --video_level_classifier_model="LogisticModel" \
  --frame_features \
  --feature_names="rgb, audio" \
  --feature_sizes="1024, 128" \
  --batch_size=160 \
  --base_learning_rate=0.00008 \
  --lstm_cells=1024 \
  --num_epochs=2 \
  --num_gpu 1 \
  --num_readers 8 \
  --loss_lambda 0.5 \
  --ema_halflife 2000 \
  --ema_source "$SAVEPATH//NetVLAD/inference_model"

python eval.py \
  --eval_data_pattern="$RECORDPATVAL/*.tfrecord" \
  --model=NetVLADModelLF \
  --train_dir="$SAVEPATH//NetVLAD_ema" \
  --video_level_classifier_model="LogisticModel" \
  --frame_features \
  --feature_names="rgb, audio" \
  --feature_sizes="1024, 128" \
  --batch_size=160 \
  --base_learning_rate=0.00008 \
  --lstm_cells=1024 \
  --num_epochs=2 \
  --num_gpu 1 \
  --num_readers 8 \
  --build_only \
  --run_once \
  --sample_all

Quantize Model and copy model_flags.json

Change savefile to specific save path


python quantize.py \
  --transform_type quant_uniform \
  --model "$SAVEPATH//NetVLAD_ema/inference_model" \
  --savefile ../trained_models/quants/your_model/inference_model

cp $SAVEPATH//NetVLAD_ema/model_flags.json ../trained_models/quants/your_model

Combine multiple graphs into single graph

graph_ensemble.py takes in 2 or more trained models and combines them into a single graph. Sample usage:

python graph_ensemble.py \
--models ../trained_models/quants/74/inference_model \
        ../trained_models/model_1/inference_model \
        ../trained_models/model_2/inference_model \
        ../trained_models/model_3/inference_model \
--weights 0.3333 0.3333 0.3334  \
--save_folder ../train_models/your_combined_output_graph

Perform Inference


RECORDPATTEST="../data/frame/test"

python inference_gpu.py \
  --train_dir "../trained_models/quants/your_model"  \
  --output_file="./output.csv" \
  --input_data_pattern="$RECORDPATtest/*.tfrecord" \
  --batch_size 200 \
  --sample_all

Create Distillation Set and Train on it.

WARNING: Large dataset creation! Creating a new Distillation set will consume ~1.4TB of data so you'll need to have the storage space available.

python prepare_distill_dataset.py   --batch_size 128   --file_size 512   --input_data_pattern "$RECORDPATVAL/*.tfrecord"   --output_dir "output_folder/train_distill/"   --model_file "../train_models/your_ensemble_model/inference_model"

Training on a distillation dataset can be done using train_distill.py script. Use same flags as in train.py.

Model configurations

File model_configs.xlsx contains the arhitectures of models used in the work.

Trained Model

Trained model as .tar.gz can be downloaded from here. See inference.py for sample usage of the model. Folder feature_extractor contains information on preprocessing custom videos.

youtube8mchallenge's People

Contributors

leegleechn avatar miha-skalic avatar sobinp avatar samihaija avatar vicaire avatar wendykan avatar gtoderici avatar jingjinglong avatar daustingm1 avatar sohierdane avatar cesposo avatar rickymf4 avatar natsev avatar uniqueness avatar iezepov avatar embiem avatar voilin avatar

Watchers

James Cloos avatar Nguyen Duc Duong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.