Giter Site home page Giter Site logo

open-mmlab / mmaction2 Goto Github PK

View Code? Open in Web Editor NEW
3.9K 43.0 1.2K 69.79 MB

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Home Page: https://mmaction2.readthedocs.io

License: Apache License 2.0

Python 83.68% Dockerfile 0.09% Shell 1.27% Jupyter Notebook 14.96%
action-recognition temporal-action-localization pytorch video-understanding tsn i3d slowfast ava spatial-temporal-action-detection benchmark

mmaction2's Introduction

English | 简体中文

📄 Table of Contents

🥳 🚀 What's New 🔝

The default branch has been switched to main(previous 1.x) from master(current 0.x), and we encourage users to migrate to the latest version with more supported models, stronger pre-training checkpoints and simpler coding. Please refer to Migration Guide for more details.

Release (2023.10.12): v1.2.0 with the following new features:

  • Support VindLU multi-modality algorithm and the Training of ActionClip
  • Support lightweight model MobileOne TSN/TSM
  • Support video retrieval dataset MSVD
  • Support SlowOnly K700 feature to train localization models
  • Support Video and Audio Demos

📖 Introduction 🔝

MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.

Action Recognition on Kinetics-400 (left) and Skeleton-based Action Recognition on NTU-RGB+D-120 (right)


Skeleton-based Spatio-Temporal Action Detection and Action Recognition Results on Kinetics-400


Spatio-Temporal Action Detection Results on AVA-2.1

🎁 Major Features 🔝

  • Modular design: We decompose a video understanding framework into different components. One can easily construct a customized video understanding framework by combining different modules.

  • Support five major video understanding tasks: MMAction2 implements various algorithms for multiple video understanding tasks, including action recognition, action localization, spatio-temporal action detection, skeleton-based action detection and video retrieval.

  • Well tested and documented: We provide detailed documentation and API reference, as well as unit tests.

🛠️ Installation 🔝

MMAction2 depends on PyTorch, MMCV, MMEngine, MMDetection (optional) and MMPose (optional).

Please refer to install.md for detailed instructions.

Quick instructions
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
conda install pytorch torchvision -c pytorch  # This command will automatically install the latest version PyTorch and cudatoolkit, please check whether they match your environment.
pip install -U openmim
mim install mmengine
mim install mmcv
mim install mmdet  # optional
mim install mmpose  # optional
git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
pip install -v -e .

👀 Model Zoo 🔝

Results and models are available in the model zoo.

Supported model
Action Recognition
C3D (CVPR'2014) TSN (ECCV'2016) I3D (CVPR'2017) C2D (CVPR'2018) I3D Non-Local (CVPR'2018)
R(2+1)D (CVPR'2018) TRN (ECCV'2018) TSM (ICCV'2019) TSM Non-Local (ICCV'2019) SlowOnly (ICCV'2019)
SlowFast (ICCV'2019) CSN (ICCV'2019) TIN (AAAI'2020) TPN (CVPR'2020) X3D (CVPR'2020)
MultiModality: Audio (ArXiv'2020) TANet (ArXiv'2020) TimeSformer (ICML'2021) ActionCLIP (ArXiv'2021) VideoSwin (CVPR'2022)
VideoMAE (NeurIPS'2022) MViT V2 (CVPR'2022) UniFormer V1 (ICLR'2022) UniFormer V2 (Arxiv'2022) VideoMAE V2 (CVPR'2023)
Action Localization
BSN (ECCV'2018) BMN (ICCV'2019) TCANet (CVPR'2021)
Spatio-Temporal Action Detection
ACRN (ECCV'2018) SlowOnly+Fast R-CNN (ICCV'2019) SlowFast+Fast R-CNN (ICCV'2019) LFB (CVPR'2019) VideoMAE (NeurIPS'2022)
Skeleton-based Action Recognition
ST-GCN (AAAI'2018) 2s-AGCN (CVPR'2019) PoseC3D (CVPR'2022) STGCN++ (ArXiv'2022) CTRGCN (CVPR'2021)
MSG3D (CVPR'2020)
Video Retrieval
CLIP4Clip (ArXiv'2022)
Supported dataset
Action Recognition
HMDB51 (Homepage) (ICCV'2011) UCF101 (Homepage) (CRCV-IR-12-01) ActivityNet (Homepage) (CVPR'2015) Kinetics-[400/600/700] (Homepage) (CVPR'2017)
SthV1 (ICCV'2017) SthV2 (Homepage) (ICCV'2017) Diving48 (Homepage) (ECCV'2018) Jester (Homepage) (ICCV'2019)
Moments in Time (Homepage) (TPAMI'2019) Multi-Moments in Time (Homepage) (ArXiv'2019) HVU (Homepage) (ECCV'2020) OmniSource (Homepage) (ECCV'2020)
FineGYM (Homepage) (CVPR'2020) Kinetics-710 (Homepage) (Arxiv'2022)
Action Localization
THUMOS14 (Homepage) (THUMOS Challenge 2014) ActivityNet (Homepage) (CVPR'2015) HACS (Homepage) (ICCV'2019)
Spatio-Temporal Action Detection
UCF101-24* (Homepage) (CRCV-IR-12-01) JHMDB* (Homepage) (ICCV'2015) AVA (Homepage) (CVPR'2018) AVA-Kinetics (Homepage) (Arxiv'2020)
MultiSports (Homepage) (ICCV'2021)
Skeleton-based Action Recognition
PoseC3D-FineGYM (Homepage) (ArXiv'2021) PoseC3D-NTURGB+D (Homepage) (ArXiv'2021) PoseC3D-UCF101 (Homepage) (ArXiv'2021) PoseC3D-HMDB51 (Homepage) (ArXiv'2021)
Video Retrieval
MSRVTT (Homepage) (CVPR'2016)

👨‍🏫 Get Started 🔝

For tutorials, we provide the following user guides for basic usage:

Research works built on MMAction2 by users from community
  • Video Swin Transformer. [paper][github]
  • Evidential Deep Learning for Open Set Action Recognition, ICCV 2021 Oral. [paper][github]
  • Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective, ICCV 2021 Oral. [paper][github]

🎫 License 🔝

This project is released under the Apache 2.0 license.

🖊️ Citation 🔝

If you find this project useful in your research, please consider cite:

@misc{2020mmaction2,
    title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
    author={MMAction2 Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
    year={2020}
}

🙌 Contributing 🔝

We appreciate all contributions to improve MMAction2. Please refer to CONTRIBUTING.md in MMCV for more details about the contributing guideline.

🤝 Acknowledgement 🔝

MMAction2 is an open-source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features and users who give valuable feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their new models.

🏗️ Projects in OpenMMLab 🔝

  • MMEngine: OpenMMLab foundational library for training deep learning models.
  • MMCV: OpenMMLab foundational library for computer vision.
  • MIM: MIM installs OpenMMLab packages.
  • MMEval: A unified evaluation library for multiple machine learning libraries.
  • MMPreTrain: OpenMMLab pre-training toolbox and benchmark.
  • MMDetection: OpenMMLab detection toolbox and benchmark.
  • MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
  • MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
  • MMYOLO: OpenMMLab YOLO series toolbox and benchmark.
  • MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
  • MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
  • MMPose: OpenMMLab pose estimation toolbox and benchmark.
  • MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
  • MMSelfSup: OpenMMLab self-supervised learning toolbox and benchmark.
  • MMRazor: OpenMMLab model compression toolbox and benchmark.
  • MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
  • MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
  • MMTracking: OpenMMLab video perception toolbox and benchmark.
  • MMFlow: OpenMMLab optical flow toolbox and benchmark.
  • MMagic: OpenMMLab Advanced, Generative and Intelligent Creation toolbox.
  • MMGeneration: OpenMMLab image and video generative models toolbox.
  • MMDeploy: OpenMMLab model deployment framework.
  • Playground: A central hub for gathering and showcasing amazing projects built upon OpenMMLab.

mmaction2's People

Contributors

cir7 avatar congee524 avatar dai-wenxun avatar dreamerlin avatar gengenkai avatar hellock avatar hukkai avatar innerlee avatar irvingzhang0512 avatar jackytown avatar jamiechoi1995 avatar jin-s13 avatar joannalxy avatar kennymckormick avatar ly015 avatar magicdream2222 avatar makecent avatar michael-camilleri avatar rlleshi avatar sczwangxiao avatar shoufachen avatar sunnyxiaohu avatar wangruohui avatar willoscar avatar xwen99 avatar yaochaorui avatar yrquni avatar yuta1125tp avatar yzfly avatar zheng-linxiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmaction2's Issues

Demo.py

when I try to run demo.py to test my own video
I use:
python demo/demo.py configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py demo/checkpoints/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200618-9a124260.pth demo/test1.mp4 demo/label_map.txt

but it was wrong and say:
Traceback (most recent call last):
File "demo/demo.py", line 35, in
main()
File "demo/demo.py", line 27, in main
results = inference_recognizer(model, args.video, args.label)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/apis/inference.py", line 63, in inference_recognizer
data = test_pipeline(data)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in call
data = t(data)
File "/dat01/wangbo2/ZT/mmaction2/mmaction/datasets/pipelines/loading.py", line 582, in call
directory = results['frame_dir']
KeyError: 'frame_dir'

my environment:
python 3.6
pytorch 1.3
others followed the requirements

need your help!

Inconsistent variable type in generate_labels

In generate_labels

    def generate_labels(self, gt_bbox):
        """Generate training labels."""
        match_score_confidence_list = []
        match_score_start_list = []
        match_score_end_list = []
        for every_gt_bbox in gt_bbox:
            gt_iou_map = []
            for start, end in every_gt_bbox:
                start = start.numpy()
                end = end.numpy()
         ......

The type variable start and end is numpy.float64 instead of tensor, obviously it has no function named numpy(), which leads to an error.
A simple solution would be just comment out these last two lines. (which may be conflict with your design? )
Another solution is found in your training log, just add gt_bbox in the argument keys of ToTensor

train_pipeline = [
    dict(type='LoadLocalizationFeature'),
    dict(type='GenerateLocalizationLabels'),
    dict(
        type='Collect',
        keys=['raw_feature', 'gt_bbox'],
        meta_name='video_meta',
        meta_keys=['video_name']),
    # dict(type='ToTensor', keys=['raw_feature']),
    dict(type='ToTensor', keys=['raw_feature', 'gt_bbox']),
    dict(type='ToDataContainer', fields=[dict(key='gt_bbox', stack=False)])
]

Maybe it's just a simple mistake when uploading the config file :)

Give training dataset resolution info in modelzoo

Currently some typical used resolutions for action recognition include:

  • 340x256 (i guess it is the legacy of ucf101)
  • short-side 256
  • height 256
  • short-side 320
  • height 320
  • height 331

Obviously different resolution might or might not influence the accuracy. So it is good to mark the resolution of the training data

Edit:

Forgot about the video format.

MEVA or Virat dataset

This is probably a silly question. I am interested in aerial and surveillance action recognition. Like for the recently completed ActivityNet at CVPR, there was a surveillance challenge with MEVA/Virat data. As the actions there are not as nuanced as in Kinectics/Ava, and some are also taken from aerial perspective, can we still apply mmaction2 for that type of data?

will bmn support classification with start time point and end time point?

Hi,

Thanks for the great repo!

I know that bmn support video start time point and end time point prediction. But will it support the classification for this video snip between start and end point? if not, how to do the classification in an end to end way? any suggestions?

will add this feature to the repo in near future?

Thanks in advance!

Useless forward Test code

mmaction\models\recognizers\recognizer3d.py def forward_test(self, imgs):

Loss is not calculated, and accuracy is not calculated. So why do I use it? I recommend printing the accuracy after the evaluation

the results of the experiment

Hi, I did not find the results of the experiment. I would like to ask how the UCF 101 dataset performs on slowfast in your experiment? And have you achieve the acc in the original paper?

Roadmap of MMAction2

We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.

You can either:

  1. Suggest a new feature by leaving a comment.
  2. Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)
  3. Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)

video classification

Can I use mmaction2 to Classify videos? I I've just come into contact with this framework, and I'm not very familiar with it.Please tell me which model in the model zoo would be good at video classification?

re:error

tsn_r50_1x1x3_80e_ucf101_rgb.py
Traceback (most recent call last):
File "D:\Anaconda\envs\mmaction2-master\lib\sre_parse.py", line 1015, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\U'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/11987/Desktop/小论文素材/model/mmaction2-master/tools/train.py", line 167, in
main()
File "C:/Users/11987/Desktop/小论文素材/model/mmaction2-master/tools/train.py", line 83, in main
cfg = Config.fromfile(args.config)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 204, in fromfile
use_predefined_variables)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 127, in _file2dict
temp_config_file.name)
File "D:\Anaconda\envs\mmaction2-master\lib\site-packages\mmcv-1.0.2-py3.7-win-amd64.egg\mmcv\utils\config.py", line 108, in _substitute_predefined_vars
config_file = re.sub(regexp, value, config_file)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "D:\Anaconda\envs\mmaction2-master\lib\re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "D:\Anaconda\envs\mmaction2-master\lib\sre_parse.py", line 1018, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \U at position 2

When I run train.py, I encountered this problem, I tried to change re to regex and I also got this:

regex._regex_core.error: incomplete escape \U at position 5

Can anyone help me,Thanks!!!

about predict different labels in a video

I have a long video that contains a lot of labels, I want create a 300 frames detection window .I want to change the demo.py to do this. But it seems that the demo.py have to read the path of the shortcut video and got one predict. My temporary treatment plan is when i got 300 frames , I save the video as a temporary file,and call the demo.py program.then delete temporary file. It's stupid and inefficiency How can i read a long video and get the continuous prediction.please help me.

FileNotFoundError: [Errno 2] No such file or directory: 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'

hi mmaction2, i met this problem several times, use UCF101 dataset to train, use default config file

$ CUDA_VISIBLE_DEVICES=1 python tools/train.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py

refer to #101 , add start_index=0 to data dict

but there was a problem

2020-08-12 09:20:54,393 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
Traceback (most recent call last):
  File "tools/train.py", line 146, in <module>
    main()
  File "tools/train.py", line 125, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/home/zj/zhonglian/mmcv/mmcv/utils/registry.py", line 167, in build_from_cfg
    return obj_cls(**args)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 93, in __init__
    multi_class, num_classes, start_index, modality)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 63, in __init__
    self.video_infos = self.load_annotations()
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 98, in load_annotations
    with open(self.ann_file, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'

when i modified this file path in config file

# ann_file_train = 'data/ucf101/ucf101_train_split_{1,2,3}_rawframes.txt'
ann_file_train = 'data/ucf101/ucf101_train_split_1_rawframes.txt'

ok, everything is fine. I wonder if it needs to be changed every time, after trained use 2 to continue train, because there has a config

work_dir = './work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/'

Method of RGB fram extraction

Thanks for your elegant implementation of this toolbox.

The doc says denseflow installation is unnecessary for RGB frame extraction, but I find this script still uses denseflow for both RGB and flow extraction. I am wondering which one should be trusted.

Could you provide the classification component for Temporal Action Localization task to get the mAP?

Thanks for your awesome job.
Now you provide the code of BSN and BMN for Temporal Action Localization. But it only contains the Temporal Propocal Generation part. I note that many works apply the untrimmedNet (CUHK & ETHZ & SIAT Submission to ActivityNet Challenge) to the get classification results, but I have not found the classification results file or a easy way to get the classification results .
Do you have plan to provide the code for classifing the proposals to get the final metric mAP?

Test video from URL

There lots of videos in webs, it's common to testing a video directly from video's URL, rather than download the video to disk as a temp file and then run the testing pipeline.

Would you consider implementing such testing pipeline?

Thanks.

How to add NECK module

I want to reimplement TPN in mmaction2. TPN registries the 'TPN' NECK module in the original mmaction, how can I implement this function in mmaction2?

TSM temporal_pool=True bug

Thanks for your awesome codabase.
I'm trying to train TSM with temporal_pool=True(add temporal_pool=True in both TSMHead & ResNetTSM ) but get some errors.
After some debugging, i think ResNetTSM forget to do actual temporal pool between layer1 and layer2.
which means, feature map shape before layer2 should be N * num_segments/2, C, H, W instead of N * num_segments, C, H, W
In original TSM codabase, when temporal_pool=True, there is a max_pool3d to do actual temporal pool before layer2, which is missing in mmaction2.

sth v1 preparation

The original something-something v1 dataset already contains frames after extraction. So the preparation process probably needs a refactorization. What is needed is just renaming the extracted frames to follow the naming convention "img_%05d.jpg".

Test picture

I successfully used demo.py to test a video, but how can I test a picture from my own data?
If I can. I want to compare the result between video and pictures in order from the same data.
I have seen the " dataset_type = 'RawframeDataset' " and " dataset_type = 'VideoDataset' " in the config.

BSN README problem

training command in BSN README

python tools/train.py configs/localization/bsn/bsn_400x100_1x16_20e_activitynet_feature.py

can not run(filename not updated)

use TSN to train HMDB51, then errors happen

hi mmaction2, First of all, thank you for your contribution. i want to train hmdb51 dataset using TSN, following tutorials, i do something as:

  1. data processing: following preparing_hmdb51.md
$ cd mmaction2/tools/data/hmdb51

# process data and annotation
$ bash download_annotations.sh
$ bash download_videos.sh

# extract data
$ bash extract_rgb_frames.sh

# generate label
$ bash  generate_rawframes_filelist.sh
$ bash generate_videos_filelist.sh

when doing this, the log shows some .avi doesn't work

...
...
rgb 6568 talk/The_Matrix_3_talk_h_nm_np1_fr_goo_13.avi None done       <------------- here
"../../data/hmdb51/videos/talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi", frames ≈ 96
extracted frames of video "../../data/hmdb51/videos/talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi", 95 frames
1 videos (95 frames, 0 tvl1 flows) processed, using 0.276s, decoding speed 344.203fps, flow speed 0fps
rgb 6569 talk/Hamlet_(1996)_Fencing_Scenes_talk_u_cm_np1_fr_med_0.avi None done              <-------------------here
"../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi", frames ≈ 59
extracted frames of video "../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi", 58 frames
1 videos (58 frames, 0 tvl1 flows) processed, using 0.327s, decoding speed 177.37fps, flow speed 0fps
rgb 6570 talk/Fellowship_6_talk_h_cm_np1_fr_goo_14.avi None done
"../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi", frames ≈ 73
extracted frames of video "../../data/hmdb51/videos/talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi", 72 frames
1 videos (72 frames, 0 tvl1 flows) processed, using 0.502s, decoding speed 143.426fps, flow speed 0fps
rgb 6571 talk/Fellowship_6_talk_h_cm_np1_fr_goo_13.avi None done
Genearte raw frames (RGB only)
  1. prepare config

i copy mmaction2/configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py to tsn_hmdb51_config.py and modified some place

# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNet',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False),
    cls_head=dict(
        type='TSNHead',
        # num_classes=101,
        # 修改类别
        num_classes=51,                 <--------------- here
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.8,
        init_std=0.001))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'data/hmdb51/rawframes/'                                    <--- here
data_root_val = 'data/hmdb51/rawframes/'
ann_file_train = 'data/hmdb51/hmdb51_train_split_{1,2,3}_rawframes.txt'
ann_file_val = 'data/hmdb51/hmdb51_val_split_{1,2,3}_rawframes.txt'
ann_file_test = 'data/hmdb51/hmdb51_val_split_{1,2,3}_rawframes.txt'       <----- here
img_norm_cfg = dict(mean=[104, 117, 128], std=[1, 1, 1], to_bgr=False)
train_pipeline = [
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
    dict(type='FrameSelector'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.875, 0.75, 0.66),
        random_crop=False,
        max_wh_scale_gap=1),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=3,
        test_mode=True),
    dict(type='FrameSelector'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=25,
        test_mode=True),
    dict(type='FrameSelector'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='TenCrop', crop_size=224),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=32,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
# optimizer
optimizer = dict(
    type='SGD', lr=0.001, momentum=0.9,
    weight_decay=0.0005)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[30, 60])
total_epochs = 80
checkpoint_config = dict(interval=5)
evaluation = dict(
    interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/tsn_r50_1x1x3_80e_hmdb51_rgb/'
# 使用预训练模型
load_from = './checkpoints/tsn_r50_1x1x3_80e_ucf101_rgb_20200613-d6ad9c48.pth'           <- here
# load_from = None
resume_from = None
workflow = [('train', 1)]
  1. begin train. using following statement
$ CUDA_VISIBLE_DEVICES=3 python tools/train.py configs/recognition/tsn/tsn_hmdb51_config.py 
2020-08-11 15:19:47,841 - mmaction - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
...
...
...
# load_from = None
resume_from = None
workflow = [('train', 1)]

2020-08-11 15:19:49,517 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.weight', 'fc.bias'}
Traceback (most recent call last):
  File "tools/train.py", line 146, in <module>
    main()
  File "tools/train.py", line 125, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/home/zj/zhonglian/mmcv/mmcv/utils/registry.py", line 167, in build_from_cfg
    return obj_cls(**args)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 92, in __init__
    multi_class, num_classes, modality)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 57, in __init__
    self.video_infos = self.load_annotations()
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 97, in load_annotations
    with open(self.ann_file, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'data/hmdb51/hmdb51_train_split_{1,2,3}_rawframes.txt'

ok, i modified the path to ann_file_train = 'data/hmdb51/hmdb51_train_split_1_rawframes.txt', then another error happens

...
...
2020-08-11 15:36:29,645 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
2020-08-11 15:36:41,862 - mmaction - INFO - load checkpoint from ./checkpoints/tsn_r50_1x1x3_80e_ucf101_rgb_20200613-d6ad9c48.pth
2020-08-11 15:36:44,408 - mmaction - WARNING - The model and loaded state dict do not match exactly

size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([101, 2048]) from checkpoint, the shape in current model is torch.Size([51, 2048]).
size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([101]) from checkpoint, the shape in current model is torch.Size([51]).
2020-08-11 15:36:44,409 - mmaction - INFO - Start running, host: zj@user-SYS-7049GP-TRT, work_dir: /home/zj/zhonglian/mmaction2/work_dirs/tsn_r50_1x1x3_80e_hmdb51_rgb
2020-08-11 15:36:44,409 - mmaction - INFO - workflow: [('train', 1)], max: 80 epochs
Traceback (most recent call last):
  File "tools/train.py", line 146, in <module>
    main()
  File "tools/train.py", line 142, in main
    meta=meta)
  File "/home/zj/zhonglian/mmaction2/mmaction/apis/train.py", line 111, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/zj/zhonglian/mmcv/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zj/zhonglian/mmcv/mmcv/runner/epoch_based_runner.py", line 27, in train
    for i, data_batch in enumerate(data_loader):
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
    return self._process_data(data)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 103, in __getitem__
    return self.prepare_train_frames(idx)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/rawframe_dataset.py", line 137, in prepare_train_frames
    return self.pipeline(results)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/pipelines/loading.py", line 848, in __call__
    img_bytes = self.file_client.get(filepath)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/file_client.py", line 294, in get
    return self.client.get(filepath)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/file_client.py", line 185, in get
    with open(filepath, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/zj/zhonglian/mmaction2/data/hmdb51/rawframes/shake_hands/Concourse_d_elegance_Sofia_2009___PRICE_GIVING_CEREMONY_shake_hands_f_cm_np2_le_med_0/img_00066.jpg'

i check the img dir

$ ls
img_00000.jpg  img_00006.jpg  img_00012.jpg  img_00018.jpg  img_00024.jpg  img_00030.jpg  img_00036.jpg  img_00042.jpg  img_00048.jpg  img_00054.jpg  img_00060.jpg
img_00001.jpg  img_00007.jpg  img_00013.jpg  img_00019.jpg  img_00025.jpg  img_00031.jpg  img_00037.jpg  img_00043.jpg  img_00049.jpg  img_00055.jpg  img_00061.jpg
img_00002.jpg  img_00008.jpg  img_00014.jpg  img_00020.jpg  img_00026.jpg  img_00032.jpg  img_00038.jpg  img_00044.jpg  img_00050.jpg  img_00056.jpg  img_00062.jpg
img_00003.jpg  img_00009.jpg  img_00015.jpg  img_00021.jpg  img_00027.jpg  img_00033.jpg  img_00039.jpg  img_00045.jpg  img_00051.jpg  img_00057.jpg  img_00063.jpg
img_00004.jpg  img_00010.jpg  img_00016.jpg  img_00022.jpg  img_00028.jpg  img_00034.jpg  img_00040.jpg  img_00046.jpg  img_00052.jpg  img_00058.jpg  img_00064.jpg
img_00005.jpg  img_00011.jpg  img_00017.jpg  img_00023.jpg  img_00029.jpg  img_00035.jpg  img_00041.jpg  img_00047.jpg  img_00053.jpg  img_00059.jpg  img_00065.jpg

there is no img_00066.jpg happens, why this will happen and how to solve it ? Looking forward to your help

decord SampleFrames start_index=1 bug

i try to train kinetics with this config, but get index out of range error.
after some debugging, i find that this bug is caused by default setting start_index = 1 in SampleFrames.
I think start_index should be 0 for decord.

Train custom data

I write a slowfast_custom_config.py . It reads:

model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3dSlowFast',
pretrained=None,
resample_rate=8, # tau
speed_ratio=8, # alpha
channel_ratio=8, # beta_inv
slow_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=True,
conv1_kernel=(1, 7, 7),
dilations=(1, 1, 1, 1),
conv1_stride_t=1,
pool1_stride_t=1,
inflate=(0, 0, 1, 1),
norm_eval=False),
fast_pathway=dict(
type='resnet3d',
depth=50,
pretrained=None,
lateral=False,
base_channels=8,
conv1_kernel=(5, 7, 7),
conv1_stride_t=1,
pool1_stride_t=1,
norm_eval=False)),
cls_head=dict(
type='SlowFastHead',
in_channels=2304, # 2048+256
num_classes=400,
spatial_type='avg',
dropout_ratio=0.5))
train_cfg = None
test_cfg = dict(average_clips=None)
dataset_type = 'VideoDataset'
data_root = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/videos_train'
data_root_val = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/videos_val'
ann_file_train = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/train_list_videos.txt'
ann_file_val = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/val_list_videos.txt'
ann_file_test = '/dat01/liuzhixiong/zt/mmaction2/data/fortest/val_list_videos.txt'

img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='RandomResizedCrop'),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=1,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=32,
frame_interval=2,
num_clips=10,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='ThreeCrop', crop_size=256),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=4,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_test,
data_prefix=data_root_val,
pipeline=test_pipeline))

optimizer = dict(
type='SGD', lr=0.1, momentum=0.9,
weight_decay=0.0001) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))

lr_config = dict(
policy='CosineAnnealing',
min_lr=0,
warmup='linear',
warmup_by_epoch=True,
warmup_iters=34)
total_epochs = 256
checkpoint_config = dict(interval=4)
workflow = [('train', 1)]
evaluation = dict(
interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/slowfast_r50_video_3d_4x16x1_256e_fortest_rgb'
load_from = None
resume_from = None
find_unused_parameters = False

The train_list_videos.txt follows the tips.It reads:

data/fortest/videos_train/01_trian.mp4 1
data/fortest/videos_train/02_trian.mp4 1
data/fortest/videos_train/03_trian.mp4 1
data/fortest/videos_train/04_trian.mp4 2
data/fortest/videos_train/05_trian.mp4 3
......

But when I ues:
python tools/train.py configs/recognition/slowfast/slowfast_custom_config.py
--work-dir work_dirs/slowfast_r50_4x16x1_256e_fortest_rgb
--validate --seed 0 --deterministic
to sbatch my job.

It feedbacks:

Traceback (most recent call last):
File "tools/train.py", line 146, in
main()
File "tools/train.py", line 125, in main
datasets = [build_dataset(cfg.data.train)]
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/builder.py", line 39, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/dat01/liuzhixiong/anaconda3/envs/mmaction/lib/python3.6/site-packages/mmcv/utils/registry.py", line 167, in build_from_cfg
return obj_cls(**args)
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/video_dataset.py", line 43, in init
super().init(ann_file, pipeline, start_index=start_index, **kwargs)
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/base.py", line 63, in init
self.video_infos = self.load_annotations()
File "/dat01/liuzhixiong/zt/mmaction2/mmaction/datasets/video_dataset.py", line 58, in load_annotations
filename, label = line_split
ValueError: not enough values to unpack (expected 2, got 0)

I don't konw why it can't read my list.txt

Support multi_class in TSM-Head

Describe the feature

Motivation
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when [....].
Ex2. There is a recent paper [....], which is very helpful for [....].

Related resources
If there is an official code released or third-party implementations, please also provide the information here, which would be very helpful.

Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.

You should just add one line in tsm_head.py to support multi_class.

experiment results on UCF-101 and HMDB-51 for R(2+1)D and I3D backbone.

Hi,

For experiments using R(2+1)D and I3D backbone
(https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/r2plus1d/README.md),
(https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/i3d/README.md),
did you have experiment results on UCF-101 and HMDB-51? If yes, would you mind share with me your experimental results and give me more information about model initialization (random init or ImageNet pre-trained)

Thanks!

AttributeError: 'EpochBasedRunner' object has no attribute 'data_loader'

python tools/train.py configs/recognition/slowfast/slowfast_r50_4x8x1_256e_jester_rgb.py --validate

error info:
Traceback (most recent call last):
File "/export/mmaction2/tools/train.py", line 146, in
main()
File "/export/mmaction2/tools/train.py", line 142, in main
meta=meta)
File "/export/mmaction2/mmaction/apis/train.py", line 111, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 103, in run
self.call_hook('before_run')
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 298, in call_hook
getattr(hook, fn_name)(self)
File "/home/zhanglu/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/hooks/lr_updater.py", line 114, in before_run
epoch_len = len(runner.data_loader)
AttributeError: 'EpochBasedRunner' object has no attribute 'data_loader'

The same error will occur with the csn model.

mmaction vs mmaction2

hello! thanks for the new repo! I just wanna ask, why mmaction2? why not reorganizing mmaction codebase? what's the difference between mmaction and mmaction2?

reproducing TSM_R50_1x1x16_50e_sthv2 issue

Notice

There are several common situations in the reimplementation issues as below

  1. Reimplement a model in the model zoo using the provided configs

Checklist

  1. I have searched related issues but cannot get the expected help.

Describe the issue

When I tested tsm_r50_1x1x16_50e_sthv2_rgb with this checkpoint , the result is lower than the reported accuracy (57.68/83.65).

I used sthv2 dataset in original webm video format.

image

Reproduction

  1. What command or script did you run?
 bash tools/dist_test.sh configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb_20200621-60ff441a.pth 8 --eval top_k_accuracy mean_class_accuracy
  1. What config dir you run?
configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

To use something-somethingv-2 original video dataset, I just made sthv2_{train, val}_list_videos.txt files.

Also, modified the config file to use this video format.

# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNetTSM',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False,
        shift_div=8),
    cls_head=dict(
        type='TSMHead',
        num_classes=339,
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.5,
        init_std=0.001,
        is_shift=True))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
# dataset_type = 'RawframeDataset'
# data_root = 'data/sthv2/rawframes'
# data_root_val = 'data/sthv2/rawframes'
# ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt'
# ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt'
# ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt'
dataset_type = 'VideoDataset'
data_root = 'data/sthv2/videos'
data_root_val = 'data/sthv2/videos'
ann_file_train = 'data/sthv2/sthv2_train_list_videos.txt'
ann_file_val = 'data/sthv2/sthv2_val_list_videos.txt'
ann_file_test = 'data/sthv2/sthv2_val_list_videos.txt'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.875, 0.75, 0.66),
        random_crop=False,
        max_wh_scale_gap=1,
        num_fixed_crops=13),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=16,
        test_mode=True),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(type='DecordInit'),    
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=16,
        test_mode=True),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=6,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
# optimizer
optimizer = dict(
    type='SGD',
    constructor='TSMOptimizerConstructor',
    paramwise_cfg=dict(fc_lr5=True),
    lr=0.0075,  # this lr is used for 8 gpus
    momentum=0.9,
    weight_decay=0.0005)
optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[20, 40])
total_epochs = 50
checkpoint_config = dict(interval=1)
evaluation = dict(
    interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb/'
load_from = None
resume_from = None
workflow = [('train', 1)]

  1. What dataset did you use?

--> Something-Something-V2

Environment

  1. Please run PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /usr/local/cuda
NVCC:
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.5
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMAction2: 0.6.0+7dc58b3

  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
      --> by conda

Results

If applicable, paste the related results here, e.g., what you expect and what you get.

Evaluating top_k_accuracy...

top1_acc        0.4162
top5_acc        0.7047

Evaluating mean_class_accuracy...

mean_acc        0.3648
top1_acc: 0.4162
top5_acc: 0.7047
mean_class_accuracy: 0.3648

Issue fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

fail to run webcam demo with r2plus1d

Hi, thanks for providing this awesome tool first.

I trained on my own datasets and it works on the webcam demo with TSN.

I tried to run the webcam demo with r2plus1d but it failed.

Here is the error messages:

Traceback (most recent call last):
File "demo/webcam_demo.py", line 161, in
main()
File "demo/webcam_demo.py", line 157, in main
predict_webcam_video()
File "demo/webcam_demo.py", line 83, in predict_webcam_video
cur_data = test_pipeline(cur_data)
File "/home/ubuntu/Desktop/YHWang/mmaction2/mmaction/datasets/pipelines/compose.py", line 41, in call
data = t(data)
File "/home/ubuntu/Desktop/YHWang/mmaction2/mmaction/datasets/pipelines/formating.py", line 248, in call
num_clips = results['num_clips']
KeyError: 'num_clips'

The config i modified is num_classes(in r2plus1d_r34_video_inference_8x8x1_180e_kinetics400_rgb.py). I changed it from 400 to 12 (my datasets class numbers).
After a little test, i found that it fail to get the clip_len and num_clips in the test_pipeline dict.
I tried to comment some code in formating.py:
"
if self.input_format == 'NCTHW':
#num_clips = results['num_clips']
#clip_len = results['clip_len']

        imgs = imgs.reshape((-1, num_clips, clip_len) + imgs.shape[1:])

"
and i change num_clips, clip_len to some number then it works.
But the predictied label doesn't change by time, maybe the result is wrong.

Sorry for my poor english.
Could you give me some idea? Thanks for you help!

Tables in the docs of TIN are not correctly displayed

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
In the document of TIN under the Modelzoo section, some tables are not correctly displayed. [Link]
But it seems fine in the README of TIN, therefore a re-compilation of the document may be required.

Could you share your Kinetics400 dataset?

I cannot download the Kinetics400 dataset. When I train your tsn model, it's hard to reproduce your released accuracy. I don't know the problem. Please, could you share your used kinetics400 data set?

TypeError: Object of type ndarray is not JSON serializable

hi mmation2 , i trained a model for ucf101 using config file configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py

now i want to test the power of it, using following code:

$ CUDA_VISIBLE_DEVICES=1 python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/latest.pth --eval top_k_accuracy mean_class_accuracy     --out result.json

it works fine, but when save result into json file, error happes

$ CUDA_VISIBLE_DEVICES=1 python tools/test.py configs/recognition/tsn/tsn_r50_1x1x3_80e_ucf101_rgb.py work_dirs/tsn_r50_1x1x3_80e_ucf101_rgb/latest.pth --eval top_k_accuracy mean_class_accuracy     --out result.json
2020-08-12 14:40:58,082 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4/4, 1.6 task/s, elapsed: 3s, ETA:     0s
writing results to result.json
Traceback (most recent call last):
  File "tools/test.py", line 139, in <module>
    main()
  File "tools/test.py", line 131, in main
    dataset.dump_results(outputs, **output_config)
  File "/home/zj/zhonglian/mmaction2/mmaction/datasets/base.py", line 86, in dump_results
    return mmcv.dump(results, out)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/io.py", line 80, in dump
    handler.dump_to_path(obj, file, **kwargs)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/handlers/base.py", line 25, in dump_to_path
    self.dump_to_fileobj(obj, f, **kwargs)
  File "/home/zj/zhonglian/mmcv/mmcv/fileio/handlers/json_handler.py", line 13, in dump_to_fileobj
    json.dump(obj, file, **kwargs)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 429, in _iterencode
    yield from _iterencode_list(o, _current_indent_level)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/home/zj/anaconda3/envs/zhonglian/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable

in tools/test.py, the correlative code is

    if rank == 0:
        if output_config:
            out = output_config['out']
            print(f'\nwriting results to {out}')
            dataset.dump_results(outputs, **output_config)

i printed the output_config and out, the info is:

{'out': 'result.json'}

[array([ 1.9529229 ,  2.923142  ,  0.26469332, -0.12839134, -2.1167572 ,
       -0.7340729 , -1.3261667 ,  0.4541236 ,  0.94828093,  1.155941  ,
        0.23628543, -0.78831387,  2.2801087 ,  1.0793906 ,  0.31419927,
        0.30997226,  0.5425246 , -0.70942116, -1.1134925 ,  2.236816  ,
        3.9390984 , -2.1505275 , -1.085769  , -2.8008654 , -1.3788043 ,
       -0.35550973,  0.6128084 ,  0.97523236, -1.4105709 , -1.2038826 ,
       -2.1797624 , -1.4052689 , -0.67973197,  1.7024329 , -0.7162529 ,
       -1.2531643 , -1.405829  , -1.7755532 , -0.9127121 , -0.52495575,
        0.5702051 , -0.54499656,  0.9248879 , -1.0198474 ,  1.8331637 ,
       -0.5963148 , -1.2978854 ,  1.1907437 , -1.4260625 ,  0.20374985,
        1.7188393 ,  0.9811421 , -1.6228783 ,  0.58338284, -0.7557665 ,
       -1.0928499 , -0.7617161 , -0.65688896,  4.0263968 , -0.09345046,
        0.07987386,  0.73330057, 13.416785  , -0.31503808,  3.6180706 ,
        1.4577851 ,  1.4350643 ,  0.21168658, -0.19559935, -1.103691  ,
        0.7532946 ,  1.5955294 , -1.1590674 , -1.2700799 , -0.32934734,
       -0.52962774, -0.747167  ,  0.18337195, -0.2666077 ,  0.717041  ,
       -0.6293016 , -0.6326269 , -0.17059498, -2.4983056 ,  0.0488462 ,
       -1.161425  ,  0.13799725, -1.8053738 , -1.6930958 ,  1.2327036 ,
       -1.2348598 , -0.18195666,  1.3208578 , -3.0858784 ,  1.1431783 ,
       -0.9411551 , -0.7087368 , -1.15071   , -3.0066304 , -1.8325434 ,
        2.1851883 ], dtype=float32), array([-3.1554081 , -1.8086557 , -1.2189873 , -1.3863541 , -3.4624038 ,
       -3.7378008 ,  7.6724052 ,  5.97018   , -4.2795534 , -2.5561104 ,
        2.2037597 ,  1.2032832 , -5.6521015 , -3.200562  ,  0.06564808,
       -3.1106699 , -0.22693926, -4.557994  , -0.9784015 , -2.8301358 ,
       -0.26256648, -1.9581242 , -0.75423837,  3.251859  , -3.7698638 ,
        2.8235092 , -2.9476943 ,  0.75258267, 10.651768  ,  1.6277269 ,
       -0.08898169, -1.1676219 ,  4.3143296 , -4.4079895 , -4.0753226 ,
        2.1783433 , -4.154809  , -1.7371117 , -2.4756253 ,  6.97458   ,
       -1.4465613 ,  3.5330255 , -1.9635652 , -1.0765982 ,  3.4709496 ,
       -0.44178772, -0.5041221 ,  2.493868  , -0.25774002, -2.910048  ,
        1.3306173 ,  3.3166916 , -1.9219271 , -1.5394036 , -1.2261659 ,
       -1.2541034 , -0.9439164 , -0.20131937, -2.7909422 , -1.7844346 ,
       -0.31215718, -2.2882266 , -1.4200875 , -2.3059387 , -1.2107593 ,
       -2.174218  , -3.193241  ,  2.251296  , -2.9217339 ,  2.1830683 ,
        0.09082523,  0.70335275, -3.5495253 , -5.4326572 , -2.9788358 ,
        0.7502857 , -2.0108578 , -3.704027  ,  2.679557  , -0.8924122 ,
        0.39617965,  2.2738085 , -3.2832923 ,  7.1167126 ,  3.3312867 ,
       -0.20836425, -3.8255863 , -0.7380201 ,  2.5008836 ,  5.836446  ,
        3.9049966 , 16.540073  ,  9.489449  ,  6.8317823 , -2.6105278 ,
        0.0635196 , -0.18466364,  2.4365137 , -0.29589617, -0.49789888,
        2.5412517 ], dtype=float32), array([-3.40932107e+00, -1.06936395e-02, -1.73499656e+00, -1.59915805e+00,
        5.71720302e-02, -1.26235354e+00,  1.75313354e+00,  1.82909936e-01,
       -2.73504066e+00, -8.32203209e-01,  1.33741820e+00,  1.22894943e+00,
       -3.33747673e+00, -2.82331657e+00, -6.27151072e-01, -5.35833001e-01,
        7.28152394e-02, -3.50825024e+00,  2.36635065e+00,  1.20436706e-01,
        1.99636745e+00,  1.94954121e+00,  1.54881507e-01,  3.04111511e-01,
       -2.20299864e+00,  4.68201256e+00, -3.32769918e+00,  1.58799827e+00,
        2.00522804e+00,  4.28090960e-01,  1.21267533e+00, -3.45705330e-01,
        2.38831758e+00, -2.96614265e+00, -1.35263073e+00,  1.28939712e+00,
       -1.74022067e+00, -1.94155240e+00, -3.36226821e+00,  7.63379526e+00,
        4.00016403e+00,  4.05345821e+00, -4.05784190e-01,  1.22065210e+00,
        3.96605849e-01, -3.39757466e+00,  1.67164028e+00,  6.65977716e-01,
        3.89114916e-01, -1.13685560e+00,  1.78429723e+00,  1.66959250e+00,
        8.51574957e-01, -1.33695388e+00, -3.62328577e+00, -2.20936608e+00,
       -4.98263955e-01, -1.52075148e+00, -1.68073058e+00, -3.47000551e+00,
       -4.68902290e-03,  9.44112360e-01, -2.32742310e+00, -7.69852519e-01,
       -2.74959385e-01, -1.03926265e+00, -1.83813047e+00,  3.34748793e+00,
       -3.22042465e-01, -4.92838115e-01,  2.63888419e-01,  3.05683446e+00,
        1.63758367e-01, -4.02872753e+00, -2.33594084e+00,  1.09016666e+01,
       -2.16153765e+00, -2.93059349e+00,  3.17019510e+00,  1.59995222e+00,
       -7.56023049e-01,  7.05853367e+00, -1.75534749e+00, -9.27645862e-02,
       -7.87818313e-01, -1.31494510e+00, -5.49836457e-02,  7.27982521e-01,
       -9.21023250e-01,  2.67443925e-01,  1.25793505e+00,  1.52883315e+00,
        2.56475949e+00,  9.29922283e-01, -1.78127527e+00, -6.23938262e-01,
       -6.67548358e-01,  1.15025485e+00, -2.27030230e+00,  2.42970988e-01,
       -1.11846581e-01], dtype=float32), array([ 9.7881667e-02,  6.3227153e-01, -1.8561482e+00, -2.1571205e+00,
        1.4059830e+01,  4.8399657e-01, -1.8275721e+00, -2.1536226e+00,
        2.0527697e+00, -2.3162837e+00, -3.0728564e+00,  4.5147705e-01,
       -2.2566085e+00,  9.0172809e-01,  9.2773736e-01,  3.4005036e+00,
       -2.4779036e+00, -1.9556541e+00, -4.0643939e-01, -1.2113328e+00,
        1.0615828e+00,  1.8980796e+00,  8.0910289e-01, -3.4260190e+00,
        1.6985834e-02,  1.8681365e+00, -1.6745995e+00,  3.1297741e+00,
        4.9533206e-01,  7.7088308e+00, -9.4858694e-01,  1.6952250e+00,
       -3.3255212e+00, -9.6397811e-01,  2.0618695e-01,  3.0011529e-01,
        1.3867394e+00,  2.7509351e+00, -1.8679692e+00,  1.8175439e+00,
       -1.7074220e+00, -3.3053722e+00,  4.2096773e-01,  3.0590990e+00,
       -3.0134280e+00, -4.1446114e+00,  1.4162828e+00, -1.3907127e+00,
       -2.8771629e+00,  9.5357203e-01,  1.0698979e+00, -3.5089359e+00,
       -4.6066377e-01, -2.0315270e+00, -2.4641752e+00, -1.7112375e+00,
        7.7639780e+00, -7.3515660e-01, -1.6210897e+00, -1.6490629e+00,
       -1.4550496e+00,  8.2967222e-01, -2.4997182e+00, -3.0694556e-01,
        7.3129952e-01, -7.7849364e-01, -8.0653977e-01, -2.7814975e-01,
        6.9563894e+00, -2.2368103e-02,  1.2655897e+00,  1.0192424e-02,
        1.6345310e+00, -2.7512756e-01, -1.4516522e+00, -1.3889271e-01,
       -7.7020127e-01, -1.5020751e+00,  1.9333646e+00, -4.9428000e+00,
       -1.9338553e+00, -2.5300448e+00,  4.6418971e-01, -5.1236825e+00,
        2.4116956e-01,  7.5193768e+00,  5.8947573e+00, -5.9647286e-01,
       -3.0245688e+00,  1.1701695e+00, -2.1766311e-01, -2.4784267e+00,
       -2.7892220e+00,  1.5604091e-01,  2.2785933e+00,  7.8045473e+00,
       -1.1207641e+00, -2.6828754e+00,  1.1542189e+00,  3.2799768e-01,
       -9.7450703e-01], dtype=float32)]

I hope you can help me solve this problem

mmcv error when extracting frames

Describe the bug

When I extracting rgb frames using tools/data/sthv2/extract_rgb_frames_opencv.sh, opencv resize error happened.

From the error trackback, it may be caused by mmcv.

Reproduction

  1. What command or script did you run?

sh extract_rgb_frames_opencv.sh in tools/data/sthv2

  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    --> No

  2. What dataset did you use?
    --> sthv2

Environment

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /usr/local/cuda
NVCC:
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.5
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMAction2: 0.6.0+7dc58b3

  1. You may add addition that may be helpful for locating the problem, such as
    --> pytorch installed by conda

Error traceback
If applicable, paste the error traceback here.


Traceback (most recent call last):
  File "build_rawframes.py", line 226, in <module>
    len(vid_list) * [args.task]))
  File "/home/lsrock1/anaconda3/envs/pytorch1.6/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/lsrock1/anaconda3/envs/pytorch1.6/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-nzyrw1vf/opencv/modules/imgproc/src/resize.cpp:3932: error: (-215:Assertion failed) inv_scale_x > 0 in function 'resize'

Genearte raw frames (RGB only)


"workers_per_gpu" settings do not work

My CPU is AMD ThreadRipper 2990wx and GPU is Titan RTX.

No matter how much I set workers_per_gpu to, the code only uses one thread of the cpu, and cannot use all the 64 threads of the cpu.

Can anyone help me,Thanks!!!

Do you have the plan to add the person detection function

你好。非常感谢你们的implementation。
请问你们接下去有没有计划实现对动画中出现的不同人都进行独立的行为识别推论的计划?就像slowfast他们的implementation一样,他们对AVA的dataset也能进行学习与推论。
我们现在在做监视摄像头的行为识别,里面出现的人不止一个,所以希望能实现独立的推论。

misleading settings in README

I got the suggestions "The gpus indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. According to the Linear Scaling Rule, you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs * 2 video/gpu and lr=0.08 for 16 GPUs * 4 video/gpu." when tried to use slowfast configs.

Yet the lr and videos_per_gpu in these configs files are different from those in README. For example, in https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py, the 'lr' is 0.1 and the 'videos_per_gpu' is 8.

So, which one is the correct setting to reproduce the performance mentioned in README?

test.py

I want know whether test.py can output the predicted values( if I have 3cls, it can output 0 or 1 or 2? ) or labels?
I followed your guide to add '-out result,json', but I can't understand what the values mean in the result.json .

Also, I want to ask a question about the model:
As a 3D model, whether Slowfast can predict with only one picture as input?
I have tried my idea, but it doesn't work, maybe I get wrong dataset for rawframes.

how to set custom lr updater?

Hi,
I want to use 'StepLrUpdaterHook', but I do not want it decrease to 0.1 * lr at the step I specified. What I want is base_lr = 0.1, and then the flowing decreased lr is 0.5 * base_lr, 0.1 * base_lr, 0.05 * base_lr, 0.001 * base_lr.
How can i do it?

Thanks in advance!

Unexpected keyword 'use_frames'

When following the demo.py in documents, I got an error like this:

TypeError: init_recognizer() got an unexpected keyword argument 'use_frames'

Is there anything thing changed about the recognizer?

Feature extraction of BMN using TSN

In the BMN Model Zoo there are results of feature extracted by MMAction but I found on details in Data Preparation about how to extract the feature using TSN.

After refer to BMN paper and some issues, I am still confusing about the details.


assume the video has 16,000 frames

  1. Divide all frames into 1000 continuous non-overlap snippets, each has 16 frames. Decode video to raw frames and calculate optical flow.

  2. Select the 8-th rgb frame and 6,7,8,9,10-th optical flow frames in each snippet to represent this snippet.

  3. For one snippet:

  • RGB: initialize TSN network with ActivityNet RGB corresponding config and ckpt in TSN Model Zoo. Input one rgb frame (8-th), simply resize to 224x224 without any crop, then cls_score return by tsn_head will be a tensor with shape [1, 200].

  • Flow: initialize TSN network with flow config and ckpt, input five optical flow frames, then consensus module will "average" them, so cls_score will also be a tensor with shape [1, 200].

  • concat two tensor above -> get feature of this snippet

  1. Same process to all 1000 snippets, so the feature shape of a video is [1000, 400], then use this script to rescaled to [100, 400]

Is above the right step? Or could you add your feature extraction script to this repo.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.