Giter Site home page Giter Site logo

datamllab / autovideo Goto Github PK

View Code? Open in Web Editor NEW
310.0 15.0 36.0 2.53 MB

AutoVideo: An Automated Video Action Recognition System

Home Page: https://autoedge.ai/

License: MIT License

Python 100.00%
automl video deep-learning video-recognition automated

autovideo's Introduction

AutoVideo: An Automated Video Action Recognition System

Logo

Testing PyPI version Downloads Downloads License: MIT

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting a complete training pipeline consisting of data processing, video processing, video transformation, and action recognition. It also supports automated tuners for pipeline search. AutoVideo is developed by DATA Lab at Rice University.

There are some other video analysis libraries out there, but this one is designed to be highly modular. AutoVideo is highly extendible thanks to the pipeline language, where each module is wrapped as a primitive with some hyperparameters. This allows us to easily develop new modules. It is also convenient to perform pipeline search. We welcome contributions to enrich AutoVideo with more primitives. You can find instructions in Contributing Guide.

Demo

Overview

Cite this work

If you find this repo useful, you may cite:

Zha, Daochen, et al. "AutoVideo: An Automated Video Action Recognition System." arXiv preprint arXiv:2108.0421 (2021).

@inproceedings{zha2021autovideo,
  title={Autovideo: An automated video action recognition system},
  author={Zha, Daochen and Bhat, Zaid Pervaiz and Chen, Yi-Wei and Wang, Yicheng and Ding, Sirui and Chen, Jiaben and Lai, Kwei-Herng and Bhat, Mohammad Qazim and Jain, Anmoll Kumar and Reyes, Alfredo Costilla and Zou, Na and Xia, Hu},
  booktitle={IJCAI},
  year={2022}
}

Installation

Make sure that you have Python 3.6+ and pip installed. Currently the code is only tested in Linux system. First, install torch and torchvision with

pip3 install torch
pip3 install torchvision

To use the automated searching, you need to install ray-tune and hyperopt with

pip3 install 'ray[tune]' hyperopt

We recommend installing the stable version of autovideo with pip:

pip3 install autovideo

Alternatively, you can clone the latest version with

git clone https://github.com/datamllab/autovideo.git

Then install with

cd autovideo
pip3 install -e .

Quick Start

To try the examples, you may download hmdb6 dataset, which is a subset of hmdb51 with only 6 classes. All the datasets can be downloaded from Google Drive. Then, you may unzip a dataset and put it in datasets. You may also try STGCN for skeleton-based action recogonition on kinetics36, which is a subset of Kinetics dataset with 36 classes.

Fitting and saving a pipeline

python3 examples/fit.py

Some important hyperparameters are as follows.

  • --alg: the supported algorithm. Currently we support tsn, tsm, i3d, eco, eco_full, c3d, r2p1d, r3d, stgcn.
  • --pretrained: whether loading pre-trained weights and fine-tuning.
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for sainge the log
  • --save_path: the path for saving the fitted pipeline

In AutoVideo, all the pipelines can be described as Python Dictionaries. In examplers/fit.py, the default pipline is defined below.

config = {
	"transformation":[
		("RandomCrop", {"size": (128,128)}),
		("Scale", {"size": (128,128)}),
	],
	"augmentation": [
		("meta_ChannelShuffle", {"p": 0.5} ),
		("blur_GaussianBlur",),
		("flip_Fliplr", ),
		("imgcorruptlike_GaussianNoise", ),
	],
	"multi_aug": "meta_Sometimes",
	"algorithm": "tsn",
	"load_pretrained": False,
	"epochs": 50,
}

This pipeline describes what transformation and augmentation primitives will be used, and also how the multiple augmentation primitives are combined. It also specifies using TSN to train 50 epochs from scratch. The hyperparameters can be flexibly configured based on the hyperparameters defined in each primitive.

Loading a fitted pipeline and producing predictions

After fitting a pipeline, you can load a pipeline and make predictions.

python3 examples/produce.py

Some important hyperparameters are as follows.

  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for saving the log
  • --load_path: the path for loading the fitted pipeline

Loading a fitted pipeline and recogonizing actions

After fitting a pipeline, you can also make predicitons on a single video. As a demo, you may download the fitted pipeline and the demo video from Google Drive. Then, you can use the following command to recogonize the action in the video:

python3 examples/recogonize.py

Some important hyperparameters are as follows.

  • --gpu: which gpu device to use. Empty string for CPU.
  • --video_path: the path of video file
  • --log_dir: the path for saving the log
  • --load_path: the path for loading the fitted pipeline

Fitting and producing a pipeline

Alternatively, you can do fit and produce without saving the model with

python3 examples/fit_produce.py

Some important hyperparameters are as follows.

  • --alg: the supported algorithm.
  • --pretrained: whether loading pre-trained weights and fine-tuning.
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for saving the log

Automated searching

In addition to running them by yourself, we also support automated model selection and hyperparameter tuning:

python3 examples/search.py

Some important hyperparameters are as follows.

  • --alg: the searching algorithm. Currently, we support random and hyperopt.
  • --num_samples: the number of samples to be tried
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset

Search sapce can also be specified as Python Dictionaries. An example:

search_space = {
	"augmentation": {
		"aug_0": tune.choice([
			("arithmetic_AdditiveGaussianNoise",),
			("arithmetic_AdditiveLaplaceNoise",),
		]),
		"aug_1": tune.choice([
			("geometric_Rotate",),
			("geometric_Jigsaw",),
		]),
	},
	"multi_aug": tune.choice([
		"meta_Sometimes",
		"meta_Sequential",
	]),
	"algorithm": tune.choice(["tsn"]),
	"learning_rate": tune.uniform(0.0001, 0.001),
	"momentum": tune.uniform(0.9,0.99),
	"weight_decay": tune.uniform(5e-4,1e-3),
	"num_segments": tune.choice([8,16,32]),
}

Supported Action Recogoniton Algorithms

Algorithms Primitive Path Paper
TSN autovideo/recognition/tsn_primitive.py Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
TSM autovideo/recognition/tsm_primitive.py TSM: Temporal Shift Module for Efficient Video Understanding
R2P1D autovideo/recognition/r2p1d_primitive.py A Closer Look at Spatiotemporal Convolutions for Action Recognition
R3D autovideo/recognition/r3d_primitive.py Learning spatio-temporal features with 3d residual networks for action recognition
C3D autovideo/recognition/c3d_primitive.py Learning Spatiotemporal Features with 3D Convolutional Networks
ECO-Lite autovideo/recognition/eco_primitive.py ECO: Efficient Convolutional Network for Online Video Understanding
ECO-Full autovideo/recognition/eco_full_primitive.py ECO: Efficient Convolutional Network for Online Video Understanding
I3D autovideo/recognition/i3d_primitive.py Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
STGCN autovideo/recognition/stgcn_primitive.py Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Supported Augmentation Primitives

We have adapted all the augmentation methods in imgaug to videos and wrap them as primitives. Some examples are as below.

Augmentation Method Primitive Path
AddElementwise autovideo/augmentation/arithmetic/AddElementwise_primitive.py
Cartoon autovideo/augmentation/artistic/Cartoon_primitive.py
BlendAlphaBoundingBoxes autovideo/augmentation/blend/BlendAlphaBoundingBoxes_primitive.py
AverageBlur autovideo/augmentation/blend/AverageBlur_primitive.py
AddToBrightness autovideo/augmentation/color/AddToBrightness_primitive.py
AllChannelsCLAHE autovideo/augmentation/contrast/AllChannelsCLAHE_primitive.py
DirectedEdgeDetect autovideo/augmentation/convolutional/DirectedEdgeDetect_primitive.py
DirectedEdgeDetect autovideo/augmentation/convolutional/DirectedEdgeDetect_primitive.py
SaveDebugImageEveryNBatches autovideo/augmentation/edges/SaveDebugImageEveryNBatches_primitive.py
Canny autovideo/augmentation/debug/Canny_primitive.py
Fliplr autovideo/augmentation/debug/Fliplr_primitive.py
Affine autovideo/augmentation/geometric/Affine_primitive.py
Brightness autovideo/augmentation/imgcorruptlike/Brightness_primitive.py
ChannelShuffle autovideo/augmentation/meta/ChannelShuffle_primitive.py
Autocontrast autovideo/augmentation/pillike/Autocontrast_primitive.py
AveragePooling autovideo/augmentation/pooling/AveragePooling_primitive.py
RegularGridVoronoi autovideo/augmentation/segmentation/RegularGridVoronoi_primitive.py
CenterCropToAspectRatio autovideo/augmentation/size/CenterCropToAspectRatio_primitive.py
Clouds autovideo/augmentation/weather/Clouds_primitive.py

See the Full List of Augmentation Primitives

Advanced Usage

Beyond the above examples, you can also customize the configurations.

Configuring the hypereparamters

Each model in AutoVideo is wrapped as a primitive, which contains some hyperparameters. An example of TSN is here. All the hyperparameters can be specified when building the pipeline by passing a config dictionary. See examples/fit.py.

Configuring the search space

The tuner will search the best hyperparamter combinations within a search sapce to improve the performance. The search space can be defined with ray-tune. See examples/search.py.

Preparing datasets and benchmarking

The datasets must follow d3m format, which consists of a csv file and a media folder. The csv file should have three columns to specify the instance indices, video file names and labels. An example is as below

d3mIndex,video,label
0,Aussie_Brunette_Brushing_Hair_II_brush_hair_u_nm_np1_ri_med_3.avi,0
1,brush_my_hair_without_wearing_the_glasses_brush_hair_u_nm_np1_fr_goo_2.avi,0
2,Brushing_my_waist_lenth_hair_brush_hair_u_nm_np1_ba_goo_0.avi,0
3,brushing_raychel_s_hair_brush_hair_u_cm_np2_ri_goo_2.avi,0
4,Brushing_Her_Hair__[_NEW_AUDIO_]_UPDATED!!!!_brush_hair_h_cm_np1_le_goo_1.avi,0
5,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_0.avi,0
6,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_1.avi,0
7,Prelinger_HabitPat1954_brush_hair_h_nm_np1_fr_med_26.avi,0
8,brushing_hair_2_brush_hair_h_nm_np1_ba_med_2.avi,0

The media folder should contain video files. You may refer to our example hmdb6 dataset in Google Drive. We have also prepared hmdb51 and ucf101 in the Google Drive for benchmarking. Please read benchmark for more details. For some of the algorithms (TSN, TSM, C3D, R2P1D and R3D), if you want to load the pre-trained weights and fine-tune, you need to download the weights from Google Drive and put it to weights.

autovideo's People

Contributors

daochenzha avatar huaizhengzhang avatar lhenry15 avatar yiwei-chen avatar zaidbhat1234 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autovideo's Issues

Problem with generating fitted timelines

Hi all!

I'm running into some problems with generating fitted pipelines for the different algorithms available. So I was trying to run the following command:

python3 examples/fit.py --alg tsn --pretrained --gpu 0,1 --data_dir datasets/hmdb6/ --log_path logs/tsn.txt --save_path fittted_timelines/TSN/

And I got the following output.

--> Running on the GPU

Initializing TSN with base model: resnet50.
TSN Configurations:
input_modality: RGB
num_segments: 3
new_length: 1
consensus_module: avg
dropout_ratio: 0.8

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /home/myuser/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|##########| 97.8M/97.8M [00:02<00:00, 40.4MB/s]
Downloading: "https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/tsn2d_kinetics400_rgb_r50_seg3_f1s1-b702e12f.pth" to /home/myuser/.cache/torch/hub/checkpoints/tsn2d_kinetics400_rgb_r50_seg3_f1s1-b702e12f.pth
Traceback (most recent call last):
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1008, in _do_run_step
self._run_step(step)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 998, in _run_step
self._run_primitive(step)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 873, in _run_primitive
multi_call_result = self._call_primitive_method(primitive.fit_multi_produce, fit_multi_produce_arguments)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 974, in _call_primitive_method
raise error
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 970, in _call_primitive_method
result = method(**arguments)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/primitive_interfaces/base.py", line 532, in fit_multi_produce
return self._fit_multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, inputs=inputs, outputs=outputs)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/primitive_interfaces/base.py", line 559, in _fit_multi_produce
fit_result = self.fit(timeout=timeout, iterations=iterations)
File "/home/myuser/autovideo/autovideo/base/supervised_base.py", line 54, in fit
self._init_model(pretrained = self.hyperparams['load_pretrained'])
File "/home/myuser/autovideo/autovideo/recognition/tsn_primitive.py", line 206, in _init_model
model_data = load_state_dict_from_url(pretrained_url)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/torch/hub.py", line 553, in load_state_dict_from_url
download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/torch/hub.py", line 419, in download_url_to_file
u = urlopen(req)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "examples/fit.py", line 61, in
run(args)
File "examples/fit.py", line 49, in run
pipeline=pipeline)
File "/home/myuser/autovideo/autovideo/utils/axolotl_utils.py", line 55, in fit
raise pipeline_result.error
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1039, in _run
self._do_run()
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1025, in _do_run
self._do_run_step(step)
File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1017, in _do_run_step
) from error
d3m.exceptions.StepFailedError: Step 5 for pipeline e61792eb-f54b-44ae-931c-f0f965c5e9de failed.

As you can see, I'm having problems with an Access Denied to the .pth files hosted at Amazon Cloud. Do you have any ideas on how to fix this?

d3m exceptions StepFailedError

d3m.exceptions.StepFailedError: Step 7 for pipeline c43355b7-0e87-499f-a9f2-defc56b6713a failed

I have trained this model using fit.py on your given dataset and saved weights in the weights directory than I run produce.py these two files run smoothly.
But when I try to run recognize.py it gives me this exception.

Does not work with latest torch

works with torch==1.9.0 , torchvision==0.10.0 because torchvision has deprecated Scale in favour of Resize but d3m does not support it yet, so need to downgrade to torchvision<0.12.0 for this repo to work.

AssertionError: assert os.path.exists(NO_SPLIT_TABULAR_SPLIT_PIPELINE_PATH)

I am trying to run the given example of hmbd6 but getting error :

Traceback (most recent call last):
  File "examples/fit.py", line 56, in <module>
    run(args)
  File "examples/fit.py", line 20, in run
    from autovideo.utils import set_log_path, logger
  File "/content/autovideo/autovideo/__init__.py", line 4, in <module>
    from .utils import build_pipeline, fit, produce, fit_produce, produce_by_path, compute_accuracy_with_preds
  File "/content/autovideo/autovideo/utils/__init__.py", line 2, in <module>
    from .axolotl_utils import *
  File "/content/autovideo/autovideo/utils/axolotl_utils.py", line 12, in <module>
    from axolotl.backend.simple import SimpleRunner
  File "/usr/local/lib/python3.7/dist-packages/axolotl/backend/simple.py", line 5, in <module>
    from d3m import runtime as runtime_module
  File "/usr/local/lib/python3.7/dist-packages/d3m/runtime.py", line 23, in <module>
    from d3m.contrib import pipelines as contrib_pipelines
  File "/usr/local/lib/python3.7/dist-packages/d3m/contrib/pipelines/__init__.py", line 13, in <module>
    assert os.path.exists(NO_SPLIT_TABULAR_SPLIT_PIPELINE_PATH)
AssertionError

Running on Google colab.
Code :

!git clone https://github.com/datamllab/autovideo.git

%cd autovideo
!pip3 install -e .

!gdown --id 1nLTjp6l6UucXEy8_eOM5Zj4Q1m79OhmT
!unzip hmdb6.zip -d datasets

!python3 examples/fit.py --alg tsn --data_dir datasets/hmdb6/ --gpu "cuda"

How to resolve it?

Reading a RTSP LInk

How can we read a RTSP LINK and get prediction in this repo. i checked the code but I think RTSP link is not handling
or how can Modify the code in your action recognition repository to accept frames as input instead of a complete video file.

from autovideo import extract_frames is nor working

when i ran

"from autovideo import extract_frames"

I get following error

"ImportError: cannot import name 'extract_frames' from 'autovideo' (/Volumes/Disk-Data/pose estimation/autovideo-main/autovideo/init.py)"

Where is the ui?

Hi, in your demo video and your paper you mentioned this nice and fancy ui based on Orange. But I cannot seem to find it in the code. Can you point it to me? Also how do I bring up the ui after I install it? Thank you.

Doubt about TSM temporal shift

Hi,

First of all, I'd like to congratulate about this repo, we've found this very useful. While training TSM, we've discovered that the parameter is_shift is by default false. Also, the import there cannot be resolved since the original make_temporal_shift code is not integrated into this repo.

Without is_shift enabled, does that mean that we're using a vanilla 2D Resnet50 and averaging the output of every input image in the sequence? Am I missing anything? The original contribution of TSM was this special temporal shift in the internal feature maps of any 2D CNN model.

Thanks in advance.

examples/recogonize.py does not work out of the box.

Minimum size of dataset is 4, I have the following hack in produce_by_path that works.

# minimum size is 4
dataset = {
    'd3mIndex': [0,1,2,3],
    'video': [video_name,video_name,video_name,video_name],
    'label': [0,0,0,0]
}

Running Predictions with pertained weights

Hi,

I'm trying to benchmark the hmdb51 and ucf101 datasets with the pertained weights available on Google Drive. I'm unfamiliar with axolotl library and am a little confused on how to populate fitted_pipeline['runtime'] if I don't try fitting using example/fit.py. Do you have any suggestions on how to accomplish this?

Thank you,
Rohita

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.