Giter Site home page Giter Site logo

efficient-3dcnns's Introduction

Efficient-3DCNNs

PyTorch Implementation of the article "Resource Efficient 3D Convolutional Neural Networks", codes and pretrained models.

Update!

3D ResNet and 3D ResNeXt models are added! The details of these models can be found in link.

Requirements

Pre-trained models

Pretrained models can be downloaded from here.

Implemented models:

  • 3D SqueezeNet
  • 3D MobileNet
  • 3D ShuffleNet
  • 3D MobileNetv2
  • 3D ShuffleNetv2

For state-of-the-art comparison, the following models are also evaluated:

  • ResNet-18
  • ResNet-50
  • ResNet-101
  • ResNext-101

All models (except for SqueezeNet) are evaluated for 4 different complexity levels by adjusting their 'width_multiplier' with 2 different hardware platforms.

Results

Dataset Preparation

Kinetics

  • Download videos using the official crawler.

    • Locate test set in video_directory/test.
  • Different from the other datasets, we did not extract frames from the videos. Insted, we read the frames directly from videos using OpenCV throughout the training. If you want to extract the frames for Kinetics dataset, please follow the preperation steps in Kensho Hara's codebase. You also need to modify the kinetics.py file in the datasets folder.

  • Generate annotation file in json format similar to ActivityNet using utils/kinetics_json.py

    • The CSV files (kinetics_{train, val, test}.csv) are included in the crawler.
python utils/kinetics_json.py train_csv_path val_csv_path video_dataset_path dst_json_path

Jester

  • Download videos here.
  • Generate n_frames files using utils/n_frames_jester.py
python utils/n_frames_jester.py dataset_directory
  • Generate annotation file in json format similar to ActivityNet using utils/jester_json.py
    • annotation_dir_path includes classInd.txt, trainlist.txt, vallist.txt
python utils/jester_json.py annotation_dir_path

UCF-101

  • Download videos and train/test splits here.
  • Convert from avi to jpg files using utils/video_jpg_ucf101_hmdb51.py
python utils/video_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory
  • Generate n_frames files using utils/n_frames_ucf101_hmdb51.py
python utils/n_frames_ucf101_hmdb51.py jpg_video_directory
  • Generate annotation file in json format similar to ActivityNet using utils/ucf101_json.py
    • annotation_dir_path includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt
python utils/ucf101_json.py annotation_dir_path

Running the code

Model configurations are given as follows:

ShuffleNetV1-1.0x : --model shufflenet   --width_mult 1.0 --groups 3
ShuffleNetV2-1.0x : --model shufflenetv2 --width_mult 1.0
MobileNetV1-1.0x  : --model mobilenet    --width_mult 1.0
MobileNetV2-1.0x  : --model mobilenetv2  --width_mult 1.0 
SqueezeNet	  : --model squeezenet --version 1.1
ResNet-18	  : --model resnet  --model_depth 18  --resnet_shortcut A
ResNet-50	  : --model resnet  --model_depth 50  --resnet_shortcut B
ResNet-101	  : --model resnet  --model_depth 101 --resnet_shortcut B
ResNeXt-101	  : --model resnext --model_depth 101 --resnet_shortcut B --resnext_cardinality 32

Please check all the 'Resource efficient 3D CNN models' in models folder and run the code by providing the necessary parameters. An example run is given as follows:

  • Training from scratch:
python main.py --root_path ~/ \
	--video_path ~/datasets/jester \
	--annotation_path Efficient-3DCNNs/annotation_Jester/jester.json \
	--result_path Efficient-3DCNNs/results \
	--dataset jester \
	--n_classes 27 \
	--model mobilenet \
	--width_mult 0.5 \
	--train_crop random \
	--learning_rate 0.1 \
	--sample_duration 16 \
	--downsample 2 \
	--batch_size 64 \
	--n_threads 16 \
	--checkpoint 1 \
	--n_val_samples 1 \
  • Resuming training from a checkpoint:
python main.py --root_path ~/ \
	--video_path ~/datasets/jester \
	--annotation_path Efficient-3DCNNs/annotation_Jester/jester.json \
	--result_path Efficient-3DCNNs/results \
	--resume_path Efficient-3DCNNs/results/jester_shufflenet_0.5x_G3_RGB_16_best.pth \
	--dataset jester \
	--n_classes 27 \
	--model shufflenet \
	--groups 3 \
	--width_mult 0.5 \
	--train_crop random \
	--learning_rate 0.1 \
	--sample_duration 16 \
	--downsample 2 \
	--batch_size 64 \
	--n_threads 16 \
	--checkpoint 1 \
	--n_val_samples 1 \
  • Training from a pretrained model. Use '--ft_portion' and select 'complete' or 'last_layer' for the fine tuning:
python main.py --root_path ~/ \
	--video_path ~/datasets/jester \
	--annotation_path Efficient-3DCNNs/annotation_UCF101/ucf101_01.json \
	--result_path Efficient-3DCNNs/results \
	--pretrain_path Efficient-3DCNNs/results/kinetics_shufflenet_0.5x_G3_RGB_16_best.pth \
	--dataset ucf101 \
	--n_classes 600 \
	--n_finetune_classes 101 \
	--ft_portion last_layer \
	--model shufflenet \
	--groups 3 \
	--width_mult 0.5 \
	--train_crop random \
	--learning_rate 0.1 \
	--sample_duration 16 \
	--downsample 1 \
	--batch_size 64 \
	--n_threads 16 \
	--checkpoint 1 \
	--n_val_samples 1 \

Augmentations

There are several augmentation techniques available. Please check spatial_transforms.py and temporal_transforms.py for the details of the augmentation methods.

Note: Do not use "RandomHorizontalFlip" for trainings of Jester dataset, as it alters the class type of some classes (e.g. Swipe_Left --> RandomHorizontalFlip() --> Swipe_Right)

Calculating Video Accuracy

In order to calculate viceo accuracy, you should first run the models with '--test' mode in order to create 'val.json'. Then, you need to run 'video_accuracy.py' in utils folder to calculate video accuracies.

Calculating FLOPs

In order to calculate FLOPs, run the file 'calculate_FLOP.py'. You need to fist uncomment the desired model in the file.

Citation

Please cite the following article if you use this code or pre-trained models:

@inproceedings{kopuklu2019resource,
  title={Resource efficient 3d convolutional neural networks},
  author={K{\"o}p{\"u}kl{\"u}, Okan and Kose, Neslihan and Gunduz, Ahmet and Rigoll, Gerhard},
  booktitle={2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
  pages={1910--1919},
  year={2019},
  organization={IEEE}
}

Acknowledgement

We thank Kensho Hara for releasing his codebase, which we build our work on top.

efficient-3dcnns's People

Contributors

ahmetgunduz avatar okankop avatar tdh512194 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

efficient-3dcnns's Issues

i need your help

hello ,thanks for your code ,i meet a error when i train the jester dataset, invalid argument 0: sizes of tensors must match except in dimension 0 ,got 16 and 7 in dimension 2 ,can you give me some advice ? thank you

Jester eval script

Hello, I want to evaluate the video accuracy on the jester dataset although there isn't any script for that. I only found for UCF and kinetics.

Another question, during the train process what's the meaning of the top 1 accuracy vs the top 1 accuracy during the validation? Why are they different? For example, for the jester dataset typically the validation accuracy is higher (15% to 20%) than the train top 1 accuracy.

low accuracy using Resnet50 with dataset UCF101 without pre-trained model

python main.py --root_path ./ \
	--video_path ./ucf101_dataset/ucf101_jpg \
	--annotation_path Efficient-3DCNNs/annotation_UCF101/ucf101_01.json \
	--result_path ./results/ucf101/resnet50/ \
	--dataset ucf101 \
	--n_classes 101 \
	--ft_portion complete \
	--model resnet --model_depth 50 --resnet_shorcut B\
	--groups 3 \
	--width_mult 0.5 \
	--train_crop random \
	--learning_rate 0.1 \
	--sample_duration 16 \
	--downsample 1 \
	--batch_size 64 \
	--n_threads 16 \
	--checkpoint 1 \
	--n_val_samples 1 \
        --test

After I run the code for training and validating the resnet50 without pretrain model, the top-1 acc is about 40 and top-5 acc is about 65; but in the paper, acc of resnet-50 with dataset UCF101 is 88.92
how can i reach the acc in the paper? Or do I have to load the pre-trained model?

Reproduction issues for pretrained Kinetics models

Hi, thanks for sharing the project codebase! 😊

I'm having issues reproducing any sort of results on Kinetics using your pretrained models.

So far, I have:

  • Downloaded Kinetics600 using the crawler
  • Generated the annotation file using utils/kinetics_json.py
  • Run run-kinetics.sh:
python main.py --root_path '' \
 	--video_path ~/datasets/Kinetics \
 	--annotation_path Efficient-3DCNNs/annotation_Kinetics/kinetics.json \
 	--result_path Efficient-3DCNNs/results \
 	--resume_path Efficient-3DCNNs/results/kinetics_mobilenetv2_0.45x_RGB_16_best.pth \
	--dataset kinetics \
 	--sample_size 112 \
 	--n_classes 600 \
 	--model mobilenetv2 \
 	--version 1.1 \
 	--groups 3 \
 	--width_mult 0.45 \
 	--train_crop random \
 	--learning_rate 0.1 \
 	--sample_duration 16 \
 	--batch_size 16 \
 	--n_threads 16 \
 	--checkpoint 1 \
 	--n_val_samples 1 \
	--no_train \
 	--no_val \
 	--test
  • Gathered result using python utils/video_accuracy.py
  • Repeated the above steps for other models (e.g. kinetics_shufflenet_1.0x_G3_RGB_16_best.pth with --model shufflenet and width_mult 1.0)
  • Repeated the above steps using python test_models.py
    • fixing a few bugs in lines
      • 56: correct_k = correct[:k].float().sum().item(),
      • 104: temporal_transform = TemporalCenterCrop(opt.sample_duration, opt.downsample)
      • 121: assert opt.model == checkpoint['arch']
  • Added in combinations of --std_norm, --norm_value 255, and --no_mean_norm.

However, all I get are random predictions (~0.001 acc).

Do you have a working example for the pretrained Kinetics models, you could share?

Thanks in advance 😊

fine tuning

how much epochs did it take to reach the decent accuracy
and what were the initial accuracy

Attribute Error: 'int' object has no attribute 'item'

When I run the network in Kinetics dataset format, an error is encountered in validation.py, which raised from pytorch that a INT doesnt have attribute item(). According to the documentation of pytorch, only a tensor can use 'Tensor.item()'. However, in the case of validation.py, the variable 'losses' is a type of AverageMeter() and 'losses.avg' is a type of INT, both of them have no attribute item(). Sincerely hope you can help us out of this. Thanks.

Below is the part of the codes that raising the error:
logger.log({'epoch': epoch, 'loss': losses.avg.item(), 'prec1': top1.avg.item(), 'prec5': top5.avg.item()})

ValueError: num_samples should be a positive integer value, but got num_samples=0

dataset loading [0/4795]
dataset loading [1000/4795]
dataset loading [2000/4795]
dataset loading [3000/4795]
dataset loading [4000/4795]
Traceback (most recent call last):
File "main.py", line 95, in
pin_memory=True)
File "/home/yashbhambhu_18je0949/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 213, in init
sampler = RandomSampler(dataset)
File "/home/yashbhambhu_18je0949/.local/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 94, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Please help!

Jester Pretrained model loading

Dear @okankop,
For the pretrained Jester models, I must use the BGR images/frames of range [0, 255] and with the below mean & std normalization (i.e., Kinetics mean)?
mean=[110.63666788, 103.16065604, 96.29023126], std=[1, 1, 1]

Modified THOP

Hi, thanks for your excellent work! I have cloned your repo and run calculate_FLOP.py. It goes wrong. Here is my code:

import os
import torch
import torch.nn as nn
from thop import profile
from models import squeezenet, shufflenetv2, shufflenet, mobilenet, mobilenetv2, c3d, resnext, resnet

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

model = c3d.get_model(num_classes=600, sample_size=112, sample_duration=16)
model = model.cuda()
model = nn.DataParallel(model, device_ids=[0,1])

pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("Total number of trainable parameters: ", pytorch_total_params)

flops, prms = profile(model, input_size=(1, 3, 16, 112, 112))
print("Total number of FLOPs: ", flops)

which returns:

Total number of trainable parameters:  80459480
Traceback (most recent call last):
  File "calculate_FLOP.py", line 18, in <module>
    flops, prms = profile(model, input_size=(1, 3, 16, 112, 112))
  File "/home/caomengqi/Efficient-3DCNNs-master/thop/utils.py", line 54, in profile
    model(x)
  File "/home/caomengqi/anaconda3/envs/slowfast/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/caomengqi/anaconda3/envs/slowfast/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
    "them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

So I am confused now. Could you please help me figure it out?

I inflated the google's lasted lightweight 2d CNN model: MobileNetV3 to 3d, and got a higher precision 52% on ucf101 from scratch

Here is the code, modified from https://github.com/d-li14/mobilenetv3.pytorch/blob/master/mobilenetv3.py 's 2D MobileNetV3:

import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


class h_sigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(h_sigmoid, self).__init__()
        self.relu = nn.ReLU6(inplace=inplace)

    def forward(self, x):
        return self.relu(x + 3) / 6


class h_swish(nn.Module):
    def __init__(self, inplace=True):
        super(h_swish, self).__init__()
        self.sigmoid = h_sigmoid(inplace=inplace)

    def forward(self, x):
        return x * self.sigmoid(x)


class SELayer(nn.Module):
    def __init__(self, channel, reduction=4):
        super(SELayer, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(channel, _make_divisible(channel // reduction, 8)),
            nn.ReLU6(inplace=True),
            nn.Linear(_make_divisible(channel // reduction, 8), channel),
            h_sigmoid()
        )

    def forward(self, x):
        b, c, f, h, w = x.size()
        y = F.avg_pool3d(x, x.data.size()[-3:]).view(b, c)
        y = self.fc(y).view(b, c, 1, 1, 1)
        return (x * y)


def conv_3x3x3_bn(inp, oup, stride):
    return nn.Sequential(
        nn.Conv3d(inp, oup, kernel_size=3, stride=stride, padding=(1,1,1), bias=False),
        nn.BatchNorm3d(oup),
        nn.ReLU6(inplace=True)
    )


def conv_1x1x1_bn(inp, oup):
    return nn.Sequential(
        nn.Conv3d(inp, oup, 1, 1, 0, bias=False),
        nn.BatchNorm3d(oup),
        nn.ReLU6(inplace=True)
    )


class InvertedResidual(nn.Module):
    def __init__(self, inp, hidden_dim, oup, kernel_size, stride, use_se, use_hs):
        super(InvertedResidual, self).__init__()
        self.stride = stride

        self.use_res_connect = self.stride == (1,1,1) and inp == oup

        if inp == hidden_dim:
            self.conv = nn.Sequential(
                # dw
                nn.Conv3d(hidden_dim, hidden_dim, kernel_size, stride, (kernel_size - 1) // 2, groups=hidden_dim, bias=False),
                nn.BatchNorm3d(hidden_dim),
                h_swish() if use_hs else nn.ReLU6(inplace=True),
                # Squeeze-and-Excite
                SELayer(hidden_dim) if use_se else nn.Identity(),
                # pw-linear
                nn.Conv3d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm3d(oup),
            )
        else:
            self.conv = nn.Sequential(
                # pw
                nn.Conv3d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm3d(hidden_dim),
                h_swish() if use_hs else nn.ReLU6(inplace=True),
                # dw
                nn.Conv3d(hidden_dim, hidden_dim, kernel_size, stride, (kernel_size - 1) // 2, groups=hidden_dim, bias=False),
                nn.BatchNorm3d(hidden_dim),
                # Squeeze-and-Excite
                SELayer(hidden_dim) if use_se else nn.Identity(),
                h_swish() if use_hs else nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv3d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm3d(oup),
            )

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV3(nn.Module):
    def __init__(self, cfgs, mode, num_classes=1000, sample_size=224, width_mult=1.):
        super(MobileNetV3, self).__init__()
        # setting of inverted residual blocks
        self.cfgs = cfgs
        assert mode in ['large', 'small']

        # building first layer
        assert sample_size % 16 == 0.
        input_channel = _make_divisible(16 * width_mult, 8)
        self.features = [conv_3x3x3_bn(3, input_channel, (1, 2, 2))]

        # building inverted residual blocks
        block = InvertedResidual
        for k, t, c, se, hs, s in self.cfgs:
            output_channel = _make_divisible(c * width_mult, 8)
            hidden_dim = _make_divisible(input_channel * t, 8)
            self.features.append(block(input_channel, hidden_dim, output_channel, k, s, se, hs))
            input_channel = output_channel

        self.features.append(conv_1x1x1_bn(input_channel, hidden_dim))
        # make it nn.Sequential
        self.features = nn.Sequential(*self.features)
        output_channel = {'large': 1280, 'small': 1024}
        output_channel = _make_divisible(output_channel[mode] * width_mult, 8) if width_mult > 1.0 else output_channel[
            mode]

        # building classifier
        self.classifier = nn.Sequential(
            nn.Linear(hidden_dim, output_channel),
            h_swish(),
            nn.Dropout(0.2),
            nn.Linear(output_channel, num_classes),
        )

        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = F.avg_pool3d(x, x.data.size()[-3:])
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv3d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm3d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


def get_fine_tuning_parameters(model, ft_portion):
    if ft_portion == "complete":
        return model.parameters()

    elif ft_portion == "last_layer":
        ft_module_names = []
        ft_module_names.append('classifier')

        parameters = []
        for k, v in model.named_parameters():
            for ft_module in ft_module_names:
                if ft_module in k:
                    parameters.append({'params': v})
                    break
            else:
                parameters.append({'params': v, 'lr': 0.0})
        return parameters

    else:
        raise ValueError("Unsupported ft_portion: 'complete' or 'last_layer' expected")


def mobilenetv3_large(**kwargs):
    """
    Constructs a MobileNetV3-Large model
    """
    cfgs = [
        # k, t, c, SE, HS, s
        [3,   1,  16, 0, 0, (1, 1, 1)],
        [3,   4,  24, 0, 0, (2, 2, 2)],
        [3,   3,  24, 0, 0, (1, 1, 1)],
        [5,   3,  40, 1, 0, (2, 2, 2)],
        [5,   3,  40, 1, 0, (1, 1, 1)],
        [5,   3,  40, 1, 0, (1, 1, 1)],
        [3,   6,  80, 0, 1, (2, 2, 2)],
        [3, 2.5,  80, 0, 1, (1, 1, 1)],
        [3, 2.3,  80, 0, 1, (1, 1, 1)],
        [3, 2.3,  80, 0, 1, (1, 1, 1)],
        [3,   6, 112, 1, 1, (1, 1, 1)],
        [3,   6, 112, 1, 1, (1, 1, 1)],
        [5,   6, 160, 1, 1, (2, 2, 2)],
        [5,   6, 160, 1, 1, (1, 1, 1)],
        [5,   6, 160, 1, 1, (1, 1, 1)]
    ]
    return MobileNetV3(cfgs, mode='large', **kwargs)


def mobilenetv3_small(**kwargs):
    """
    Constructs a MobileNetV3-Small model
    """
    cfgs = [
        # k, t, c, SE, HS, s
        [3,    1,  16, 1, 0, (2, 2, 2)],
        [3,  4.5,  24, 0, 0, (2, 2, 2)],
        [3, 3.67,  24, 0, 0, (1, 1, 1)],
        [5,    4,  40, 1, 1, (2, 2, 2)],
        [5,    6,  40, 1, 1, (1, 1, 1)],
        [5,    6,  40, 1, 1, (1, 1, 1)],
        [5,    3,  48, 1, 1, (1, 1, 1)],
        [5,    3,  48, 1, 1, (1, 1, 1)],
        [5,    6,  96, 1, 1, (2, 2, 2)],
        [5,    6,  96, 1, 1, (1, 1, 1)],
        [5,    6,  96, 1, 1, (1, 1, 1)],
    ]

    return MobileNetV3(cfgs, mode='small', **kwargs)


def get_model(**kwargs):
    """
    Returns the model.
    """
    return mobilenetv3_large(**kwargs)


if __name__ == "__main__":
    model = get_model(num_classes=600, sample_size=112, width_mult=1.)
    model = model.cuda()
    model = nn.DataParallel(model, device_ids=None)
    print(model)

    input_var = Variable(torch.randn(8, 3, 16, 112, 112))
    output = model(input_var)
    print(output.shape)

from utils import *

Hi, thank you for your code. utils.py is easy to cause conflict when from utils import *. I don't know if you have encountered the same problem, you can change the file name of utils.

FLOPs mismatch problems

out_w = y.size(3) // m.stride[1]

As the code above, I wondering why the calculation different from other FLOPs counting project like https://github.com/sovrasov/flops-counter.pytorch (Resnet18 3D same input output different FLOPs result, this one outputs ~8.32 G,but your implementation given 5G or so.) The reason is that you divide stride to the outputs size just like the code above. Could you please explain about it? Appreciate it! @ahmetgunduz @tdh512194 @okankop
ps:
image
Shouldn't it calculate in this way?

Low accuracy on Jester dataset with pre-trained model

Hi, I downloaded you repository and the Jester dataset, then I followed the instructions contained in the "README.md" file to preprocess the dataset in order to obtain the files required by the framework and finally I ran the system in test mode both on the validation and test set with the pre-trained model ResNeXt-101 (jester_resnext_101_RGB_16_best.pth) and MobileNetv2 1.0x (jester_mobilenetv2_1.0x_RGB_16_best.pth). But the performance is completely different: around 3% on validation set and 10% on test set. So, I wondered if you can share with me how to reproduce your results.

Why not evaluate the model with the entire validation set during training?

Hi Okan,

Thanks for your great work!
I'm currently trying to reproduce the results reported in your paper. While I noticed in your code there is a training option '--n_val_samples' which defines the number of samples for each category during validation. This is a bit confusing because usually, we'd like to evaluate the model with the whole validation set, Could you let me know the reasoning behind this training option? It also would be super helpful if your code could handle the case where all validation samples are covered during training.

Does your code support multi-gpu training?

Hi Okan,

Thanks for sharing your code!
I'm trying to run your code on multiple GPUs but it seems the code/pytorch only uses one of my 8 GPUs. I tried to set cuda environment variable 'CUDA_VISIBLE_DEVICES=gpu_ids python mycode', but with no luck. The output of torch.cuda.current_device() is always '0', also 'nvidia-smi' concurs that only GPU 0 was used.

Do you have any idea of how to run your codebase on multiple GPUs?

Edit: It turns out the problem is that the 'batch_size' is too large so that the CPU memory is not enough.

can’t calculate_FLOP

i installed pytorch 1.4.
when i instead c3d to resnet.resnet50,it say:"profile() got an unexpected keyword argument 'input_size'" .
i wanna calculate 3D model‘s flops.May i make a mistake at somewhere?

temporal dimention anotations

what does begin_ index and end_index signifies in the annotations of kinetics dataset in the csv file since i have observed that it's not the frame number neither the time is seconds so what exactly it is and do u temporal cut the videos?
thank you

Failed to finetune ResNet50 on UCF101 split-1

Hi! There are some implementation details on training 3DCNNs on UCF101 in your paper[1], one of which is "While dropout ratio is kept at 0.2 for Kinetics600 and Jester, it is increased to 0.9 for UCF-101". However, I cannot see any dropout modules in your resnet.py.

So far, I haven't produced the results (88.92% in Tab.8) in your paper. Could you give me an example run of resnet50-ucf101-pretrainK600?

Besides, what is your codebase for I3D?

Regards.

[1] 《Resource Efficient 3D Convolutional Neural Networks》

TensorBoard Support

Hi! Is there any way to add TensorBoard support to visualize and finetune models?
This would be great.

question about training on jester dataset

Hello! I find the project from https://github.com/ahmetgunduz/Real-time-GesRec. And I find that the model must pretrain on jester dataset and refine on egogesture dataset. The training code is the same as your code. So I want to use resnext101 to pretrain on jester dataset.The original code is suitable for mobilenet, and can run well, so I do the modification but got the bug. Here is my run-jester.sh

python main.py --root_path /Lun4/fdh/Real-time-GesRec
--video_path ../jester-dataset/20bn-jester-v1
--annotation_path annotation_Jester/jester.json
--result_path results
--dataset jester
--n_classes 27
--width_mult 2
--n_finetune_classes 27
--model resnext
--model_depth 101
--resnet_shortcut B
--train_crop random
--learning_rate 0.01
--sample_duration 32
--modality RGB
--downsample 1
--batch_size 48
--n_threads 16
--checkpoint 1
--n_val_samples 1
--test
--n_epochs 100
--ft_portion complete \

I found that this code can run on egogesture dataset ,but on jester dataset. I got a bug.

File "/Lun4/fdh/Real-time-GesRec/train.py", line 30, in train_epoch
outputs = model(inputs)
File "/home/great57/.conda/envs/fdh/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/great57/.conda/envs/fdh/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
"them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

Can you help me fix it? Thank you!

Using mean and std on jester dataset

First of all, awesome repository!

I wanted to ask if you were using mean and std calculated on the whole jester dataset for normalizing images during training on the jester dataset?

jester dataset

at jester part:
Generate annotation file in json format similar to ActivityNet using utils/jester_json.py
annotation_dir_path includes classInd.txt, trainlist.txt, vallist.txt

where to get the classInd.txt, trainlist.txt, vallist.txt? I downloaded jester dataset from official website, but i didn't find the txt file.

May you provide the pretrain model on UCF101 dataset?

Thank you for publish source code! I try to train MobileNet_0.5x on UCF101 dataset, but i can not arrive the precision of 62.17 on top 1. I just can arrive the precision of 40% on top 1, so what should i do when i train this model? I have already use the pretrain model which is trained on Kinetics-600.

When we talk about accuracy, it means top-1 accuracy or top-5 accuracy?

Hi @okankop ,

Thanks very much for sharing such a wonderful repo!

I am a little bit confused about the metric "video classification accuracy" in your paper. I don't know it means top-1 accuracy or top-5 accuracy.

The confusion comes from my experiment results based on your repo.

Results on model MobileNetV1 with UCF-101 datasets

Using the pre-trained model on Kinetics-600: Top1: 52.26%, Top5: 78.95%, Reported in your paper: 70.95% 

{"modality": "RGB", "dataset": "ucf101", "n_classes": 600, "n_finetune_classes": 101, "sample_size": 112, "sample_duration": 16, "downsample": 2, "initial_scale": 1.0, "n_scales": 5, "scale_step": 0.84089641525, "train_crop": "random", "learning_rate": 0.1, "lr_steps": [40, 55, 65, 70, 200, 250], "momentum": 0.9, "dampening": 0.9, "weight_decay": 0.001, "mean_dataset": "activitynet", "no_mean_norm": false, "std_norm": false, "nesterov": false, "optimizer": "sgd", "lr_patience": 10, "batch_size": 64, "n_epochs": 250, "begin_epoch": 1, "n_val_samples": 1, "resume_path": "", "pretrain_path": "~/Documents/proj_3dcnn/Efficient-3DCNNs/Pretrained-Models/kinetics_mobilenet_1.0x_RGB_16_best.pth", "ft_portion": "last_layer", "no_train": false, "no_val": false, "test": false, "test_subset": "val", "scale_in_test": 1.0, "crop_position_in_test": "c", "no_softmax_in_test": false, "no_cuda": false, "n_threads": 16, "checkpoint": 1, "no_hflip": false, "norm_value": 1, "model": "mobilenet", "version": 1.1, "model_depth": 18, "resnet_shortcut": "B", "wide_resnet_k": 2, "resnext_cardinality": 32, "groups": 3, "width_mult": 1.0, "manual_seed": 1, "scales": [1.0, 0.84089641525, 0.7071067811803005, 0.5946035574934808, 0.4999999999911653], "arch": "mobilenet", "mean": [114.7748, 107.7354, 99.475], "std": [38.7568578, 37.88248729, 40.02898126]}

Training-from-scratch: Top1: 38.51%, Top5: 64.02%

{"modality": "RGB", "dataset": "ucf101", "n_classes": 101, "n_finetune_classes": 400, "sample_size": 112, "sample_duration": 16, "downsample": 2, "initial_scale": 1.0, "n_scales": 5, "scale_step": 0.84089641525, "train_crop": "random", "learning_rate": 0.1, "lr_steps": [40, 55, 65, 70, 200, 250], "momentum": 0.9, "dampening": 0.9, "weight_decay": 0.001, "mean_dataset": "activitynet", "no_mean_norm": false, "std_norm": false, "nesterov": false, "optimizer": "sgd", "lr_patience": 10, "batch_size": 64, "n_epochs": 250, "begin_epoch": 1, "n_val_samples": 1, "resume_path": "", "pretrain_path": "", "ft_portion": "complete", "no_train": false, "no_val": false, "test": false, "test_subset": "val", "scale_in_test": 1.0, "crop_position_in_test": "c", "no_softmax_in_test": false, "no_cuda": false, "n_threads": 16, "checkpoint": 1, "no_hflip": false, "norm_value": 1, "model": "mobilenet", "version": 1.1, "model_depth": 18, "resnet_shortcut": "B", "wide_resnet_k": 2, "resnext_cardinality": 32, "groups": 3, "width_mult": 1.0, "manual_seed": 1, "scales": [1.0, 0.84089641525, 0.7071067811803005, 0.5946035574934808, 0.4999999999911653], "arch": "mobilenet", "mean": [114.7748, 107.7354, 99.475], "std": [38.7568578, 37.88248729, 40.02898126]}

Source Code Availability

Hello !

Are you still planning on posting an implementation of this paper ? It will be a tremendous help if it's the case ! :D

UCF101 pretrained models

Hello,
Can you please provide your pretrained models on UCF101. It seems there is only those on Kinetics and Jester for now.
Thank you.

To follow pandas 0.2.0 update, DataFrame.ix() function change to DataFrame.loc()

Hi! Thank you for your code.
I run this.

python utils/ucf101_json.py annotation_dir_path

then caught this error.

'DataFrame' object has no attribute 'ix'

At pandas >= 0.2.0, DataFrame.ix() function have already got deprecated.
https://pandas.pydata.org/pandas-docs/version/0.20/whatsnew.html#deprecate-ix

So I suggest DataFrame.ix() -> DataFrame.loc()

Corresponding Parts

Slow Training on Kinetics700

Hello, I'm fine-tuning from the checkpoint MobileNetV2 W1.0 to kinetics 700 (I checked the JSON format to be as similar as possible to yours).

However, the model seems to not really learn much with pretty terrible accuracy, as you see here:

epoch	loss	prec1	prec5	lr
1	6.704854488372803	0.16046573221683502	0.6605217456817627	0.1
2	6.569755554199219	0.20524686574935913	1.0113072395324707	0.1
3	6.504281520843506	0.27615031599998474	1.2576035261154175	0.1
4	6.451953887939453	0.2798820734024048	1.3620928525924683	0.1
5	6.399944305419922	0.33959028124809265	1.6904878616333008	0.1
6	6.353832244873047	0.3806396424770355	2.1233720779418945	0.1
7	6.317000389099121	0.5261783003807068	2.22039794921875	0.1

What I expect usually is for there to be some significant meaningful change from epoch 1 to 10. However, I'm not really sure if my deduction is correct. For fine-tuning, I used the exact same code provided in the repo:

--dataset kinetics \
--n_classes 600 \
--n_finetune_classes 700 \
--ft_portion last_layer \
--model mobilenetv2 \
--groups 3 \
--lr_steps  20\
--width_mult 1 \
--train_crop random \
--learning_rate 0.1 \	
--sample_duration 16 \
--downsample 1 \
--batch_size 64 \
--n_threads 32 \
--checkpoint 5 \
--n_val_samples 1 \
--n_epochs 20 \

I'm trying to just fit the last layer but will try with full as well however I doubt it will make much difference.

onnx export problem

I tried to convert the 3DCNN model to onnx,but the inference results by onnx and pytorch are different.I tried the backbone of resnet18 and mobilenet.Did anybody meet the same problem?

Environment:
CUDA 10.0
CUDNN 7
TENSORRT 7
PYTORCH 1.2.0

CODE:

model, parameters = generate_model(opt)
checkpoint = torch.load('my_mobilenet_1.0x_RGB_10_checkpoint.pth')

model.load_state_dict(checkpoint['state_dict'])
print('load checkpoint')
if isinstance(model, torch.nn.DataParallel):
    model = model.module

x = torch.ones((1, 3, 10, 128, 128)).cuda()

y = model(x)
print(y)

torch.onnx.export(model, x, '3dcnn.onnx', verbose=True)

Path of dataset

Can you provide a storage path for dataset?
such as:
data
├── classname1
│ ├── videoname1
│ │ ├── 0001.jpg
│ │ └── 0002.jpg
│ └── n_frames
├── classname2
│ ├── videoname2
│ │ ├── 0001.jpg
│ │ └── 0002.jpg
│ ├── videoname3
│ │ ├── 0001.jpg
│ │ └── 0002.jpg
│ └── n_frames

but it's not correct, I don't know how to set.

Low accuracy rate

I use the pre training model to fine tune the ucf101, i use the video_accuracy.py. but the accuracy of the video is only 60.2%. The accuracy of your paper is 76%. What did I miss? Here is my opts. JSON. Thanks!

{"root_path": "/3DCNN/3DCNN/",
"video_path": "/3DCNN/3DCNN/data/jpg/", "annotation_path": "/3DCNN/3DCNN/data/UCF101TrainTestSplits-RecognitionTask/ucfTrainTestlist/ucf101_01.json"
"result_path": "/3DCNN/3DCNN/results",
"store_name": "ucf101_shufflenet_1x_RGB_16", "modality": "RGB",
"dataset": "kinetics", "n_classes": 600, "n_finetune_classes": 101, "sample_size": 112, "sample_duration": 16, "downsample": 1, "initial_scale": 1.0, "n_scales": 5, "scale_step": 0.84089641525, "train_crop": "random", "learning_rate": 0.1, "lr_steps": [40, 55, 65, 70, 200, 250], "momentum": 0.9, "dampening": 0.9, "weight_decay": 0.001, "mean_dataset": "activitynet", "no_mean_norm": false, "std_norm": false, "nesterov": false, "optimizer": "sgd", "lr_patience": 10, "batch_size": 128, "n_epochs": 250, "begin_epoch": 1, "n_val_samples": 1, "resume_path": "", "pretrain_path": "kinetics_shufflenet_1.0x_G3_RGB_16_best.pth", "ft_portion": "last_layer", "no_train": false, "no_val": false, "test": false, "test_subset": "val", "scale_in_test": 1.0, "crop_position_in_test": "c", "no_softmax_in_test": false, "no_cuda": false, "n_threads": 16, "checkpoint": 1, "no_hflip": false, "norm_value": 1, "model": "shufflenet", "version": 1.1, "model_depth": 50, "resnet_shortcut": "B", "wide_resnet_k": 2, "resnext_cardinality": 32, "groups": 3, "width_mult": 1, "manual_seed": 1, "scales": [1.0, 0.84089641525, 0.7071067811803005, 0.5946035574934808, 0.4999999999911653], "arch": "shufflenet", "mean": [114.7748, 107.7354, 99.475], "std": [38.7568578, 37.88248729, 40.02898126]}

Extremely low accuracy on UCF with pre-trained model

Hi there,
thanks for the work. We are trying to reproduce the experiments, however, both training and validation top1 accuracy were closed to 0. We followed the configuration in another issue #3 , with batch_size adjusted:
python main.py --root_path ~/ --video_path ~/datasets/jester --annotation_path Efficient-3DCNNs/annotation_UCF101/ucf101_01.json --result_path Efficient-3DCNNs/results --pretrain_path Efficient-3DCNNs/results/kinetics_shufflenet_0.5x_G3_RGB_16_best.pth --dataset ucf101 --n_classes 600 --n_finetune_classes 101 --ft_portion last_layer --model shufflenet --groups 3 --width_mult 0.5 --train_crop random --learning_rate 0.1 --sample_duration 16 --batch_size 64 --n_threads 16 --checkpoint 1 --n_val_samples 1 \ Anyone could provide any thoughts? Thanks

Poor Result for Different Input Channel

Hi! Thank you so much for the awesome paper and source code. I was trying to apply resnext3d on my custom datasets.

I tested out with inputting RGB images on the pretrain and the output is quite nice (prec@1 87-88 2 classes). However, I tried to tested out with different input channel by saving the npy file of the optical flow (2 channel) and the optical flow stacked with rgb (5 channel), all normalized to 0-255. I changed the input channel for Resnext to be 2 and 5 respectively and the performance is significantly worse than when inputted with a 3 channel (no pretrained) stopping at only about prec@1 60 after around 25-30 epochs. I wonder if there's further adjustment needed to make on the model architecture. Right now I just changed the input channel of the first 3d conv to the input data channel and nothing else. Is there anything I can change to make it better? Thank you a lot for your time!

Pretrained model loading

Thanks for the repo, it's very useful. I have a couple of questions hoping you can give some insight.

I would like a use you model for a transfer learning study, from action recognition to another video task. I'am comparing the transferability of the features learned by your model and the ones by TorchVision ResNet (2+1)D. I'm using the features just before the fully connected layer.

Since ShuffleNetV2 is a way smaller, it's normal to expect that ResNet (2+1)D have higher results in the target task. However, the gap between both (63.04 - 70.88 Precision-Recall AUC) is higher than I expected. From my experiments with others models, I expect at most 3 points of difference. So, I have the following questions.

  1. Are the following lines correct to load your pretrained model?
# assuming pretrained model is in the current dir
path = 'kinetics_shufflenetv2_1.0x_RGB_16_best.pth'
cp = torch.load(path, map_location=lambda storage, loc: storage)
# remove parallel dataloader prefix
state_dict = {k[7:]: v for k, v in cp['state_dict'].items()}
model = ShuffleNetV2()
model.load_state_dict(state_dict)
  1. Is it enough normalizing input frames with Kinetics mean/var in mean module? Or there is another preprocessing step?

  2. Any thoughts on why this could be happening?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.