cypw / pytorch-mfnet Goto Github PK

View Code? Open in Web Editor NEW

253.0 253.0 56.0 6.96 MB

License: MIT License

Python 98.66% Shell 1.34%

pytorch-mfnet's People

Contributors

Stargazers

Watchers

Forkers

xyand hudengjunai aimeng100 bhaskarnallani bityangke hzhang57 wzmsltw salt-fly yiliangnie alexhu123 wangweidamon ckboss xuzengmin mathpopo zgsxwsdxg zomkey gaobingaobingaobin powerarena mynameiziji lulaneil23 gavin666github irentang caijiahao baifanysu narfian pplntech hazekiahwon swap2ag mengshus firedfree xqpinitial panna19951227 lxgychen neudeep cong-wu cy5211 zqsiat xiaoketongxue skbai1996 sundoge tor4z talshef cherishgbzhu sohuemily guitaryourself shiceliu yifeiyang210 liye1229 yonggwi-cho plaovem nehald99 esoff rovanieliu goldenxinxing shujunyy123 gkuo06

pytorch-mfnet's Issues

about can not download the knetics pretrained models

Hi,I can not download the knetics pretrained models at https://goo.gl/QdE85T

net.load_checkpoint(epoch=args.load_epoch) fails?

while excecuting python evaluate_video_ucf101_split1.py

File "evaluate_video_ucf101_split1.py", line 107, in
net.load_checkpoint(epoch=args.load_epoch)
File "../train/model.py", line 62, in load_checkpoint
assert os.path.exists(load_path), "Failed to load: {} (file not exist)".format(load_path)
AssertionError: Failed to load: ./../exps/<your_tesk_name>_ep-0000.pth (file not exist)

Kindly help

High gpu memory usage

I've noticed the higher memory usage of MFNet compared to that of ResNet in image processing.

Settings:
Framework: pytorch
resnet model: resnet18 from torchvision
mfnet model: modification of /network/mfnet_3d.py for 2d processing
input size: 128x3x224x224
with enabled gradients computation.

Results:
GPU memory consumption observed: 8GB for mfnet vs 3.4GB for resnet

The number of params matches to the paper for each model, but the actual memory consumption of mfnet doesn't reflect the reduced FLOPS. Have you observed the same behaviour, or could this be caused by the 2d conversion of the model? I'm confident that the modifications I made follows the description of 2D architecture on Table2 and it shouldn't be tricky. Any idea?
(By the way I still appreciate if you could release an official 2D version of MFNet, even though that's not the main point of your work.)

Why `softmax` was not implemented in the last layer to classify?

I didn't find that you use 'softmax' after the last layer(after 'classifier' in 'mfnet_3d.py') or in 'model.fit' (in 'model.py'), but you still use 'CrossEntropyLoss'. However, in 'evaluate_video_ucf101_split1.py' I found 'softmax' was used. Did I overlook anything? If you really did not use 'softmax' in training process, why did you do so? Thanks!

VideoIter:: >> frame [] is error & backup is inavailable

VideoIter:: >> frame [] is error & backup is inavailable. [./dataset/UCF101/raw/data/CricketBowling/v_CricketBowling_g22_c07.avi]'
2019-03-09 22:30:13: >> I/O error(None): None
2019-03-09 22:30:13: VideoIter:: ERROR!! (Force using another index:
3279)
VideoIter:: >> frame [] is error & backup is inavailable. [./dataset/UCF101/raw/data/TableTennisShot/v_TableTennisShot_g11_c05.avi]'
2019-03-09 22:30:13: VideoIter:: ERROR!! (Force using another index:
2801)
VideoIter:: >> frame [] is error & backup is inavailable. [./dataset/UCF101/raw/data/ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c05.avi]'

why I have this question?

Some questions and a suggestion

Hi Yunpeng, just read your paper and have a couple of quick questions:

Why do you implement batchnorm + relu before the convolution in the BN_AC_CONV3D class here?
In your paper you say: we set the number of the first-layer output channels to be 4 times smaller than its input channels, ...
Hence, shouldn't this line be written as shown below?

# current version: num_ix = int(num_mid/4)
num_ix = int(num_in/4)

I believe self.conv_i1 and self.conv_i2 are the layers for the multiplexer. Or am I getting it wrong?

This is a theoretical question: is the role of multiplexer to rearrange information across channels so as to minimize information loss for each fiber?

Lastly, a small suggestion: since you are using PyTorch v0.4, you need not use Variable anymore. Hence you can write this line as:

data = torch.randn(1,3,16,224,224)

Thank you.

Augmentations on GPU

Hi, great code !
I have been noticing GPU usage is a bit low (around 40%), and trying to optimize.
I've been noticing that HLSTransform is very CPU intensive.
Are you aware of any way to have it executed on GPU instead of CPU ?
Do you think it could help ?
Thanks

real time video recognition

hello,

do you have real time video recognition implementation of mfnet?

thanks

Results on the Kinetics dataset

Hi.

I see in the paper that the accuracy on kinetics dataset is 72.8%.
As seen in this table

But on the graph below, it seems that the results are presented on the training set and not validation set.

So I wanted to know if I misunderstood something, and if the aforementioned result is on the training set or the validation set. And if you presented the accuracy on the training set, what accuracy on validation set did you get?

Thanks in advance.

load model warninng

2018-11-01 12:03:35 WARNING: >> Failed to load: ['module.conv1.bn.num_batches_tracked', 'module.conv2.B01.conv_i1.bn.num_batches_tracked', 'module.conv2.B01.conv_i2.bn.num_batches_tracked', 'module.conv2.B01.conv_m1.bn.num_batches_tracked', 'module.conv2.B01.conv_m2.bn.num_batches_tracked', 'module.conv2.B01.conv_w1.bn.num_batches_tracked', 'module.conv2.B02.conv_i1.bn.num_batches_tracked', 'module.conv2.B02.conv_i2.bn.num_batches_tracked', 'module.conv2.B02.conv_m1.bn.num_batches_tracked', 'module.conv2.B02.conv_m2.bn.num_batches_tracked', 'module.conv2.B03.conv_i1.bn.num_batches_tracked', 'module.conv2.B03.conv_i2.bn.num_batches_tracked', 'module.conv2.B03.conv_m1.bn.num_batches_tracked', 'module.conv2.B03.conv_m2.bn.num_batches_tracked', 'module.conv3.B01.conv_i1.bn.num_batches_tracked', 'module.conv3.B01.conv_i2.bn.num_batches_tracked', 'module.conv3.B01.conv_m1.bn.num_batches_tracked', 'module.conv3.B01.conv_m2.bn.num_batches_tracked', 'module.conv3.B01.conv_w1.bn.num_batches_tracked', 'module.conv3.B02.conv_i1.bn.num_batches_tracked', 'module.conv3.B02.conv_i2.bn.num_batches_tracked', 'module.conv3.B02.conv_m1.bn.num_batches_tracked', 'module.conv3.B02.conv_m2.bn.num_batches_tracked', 'module.conv3.B03.conv_i1.bn.num_batches_tracked', 'module.conv3.B03.conv_i2.bn.num_batches_tracked', 'module.conv3.B03.conv_m1.bn.num_batches_tracked', 'module.conv3.B03.conv_m2.bn.num_batches_tracked', 'module.conv3.B04.conv_i1.bn.num_batches_tracked', 'module.conv3.B04.conv_i2.bn.num_batches_tracked', 'module.conv3.B04.conv_m1.bn.num_batches_tracked', 'module.conv3.B04.conv_m2.bn.num_batches_tracked', 'module.conv4.B01.conv_i1.bn.num_batches_tracked', 'module.conv4.B01.conv_i2.bn.num_batches_tracked', 'module.conv4.B01.conv_m1.bn.num_batches_tracked', 'module.conv4.B01.conv_m2.bn.num_batches_tracked', 'module.conv4.B01.conv_w1.bn.num_batches_tracked', 'module.conv4.B02.conv_i1.bn.num_batches_tracked', 'module.conv4.B02.conv_i2.bn.num_batches_tracked', 'module.conv4.B02.conv_m1.bn.num_batches_tracked', 'module.conv4.B02.conv_m2.bn.num_batches_tracked', 'module.conv4.B03.conv_i1.bn.num_batches_tracked', 'module.conv4.B03.conv_i2.bn.num_batches_tracked', 'module.conv4.B03.conv_m1.bn.num_batches_tracked', 'module.conv4.B03.conv_m2.bn.num_batches_tracked', 'module.conv4.B04.conv_i1.bn.num_batches_tracked', 'module.conv4.B04.conv_i2.bn.num_batches_tracked', 'module.conv4.B04.conv_m1.bn.num_batches_tracked', 'module.conv4.B04.conv_m2.bn.num_batches_tracked', 'module.conv4.B05.conv_i1.bn.num_batches_tracked', 'module.conv4.B05.conv_i2.bn.num_batches_tracked', 'module.conv4.B05.conv_m1.bn.num_batches_tracked', 'module.conv4.B05.conv_m2.bn.num_batches_tracked', 'module.conv4.B06.conv_i1.bn.num_batches_tracked', 'module.conv4.B06.conv_i2.bn.num_batches_tracked', 'module.conv4.B06.conv_m1.bn.num_batches_tracked', 'module.conv4.B06.conv_m2.bn.num_batches_tracked', 'module.conv5.B01.conv_i1.bn.num_batches_tracked', 'module.conv5.B01.conv_i2.bn.num_batches_tracked', 'module.conv5.B01.conv_m1.bn.num_batches_tracked', 'module.conv5.B01.conv_m2.bn.num_batches_tracked', 'module.conv5.B01.conv_w1.bn.num_batches_tracked', 'module.conv5.B02.conv_i1.bn.num_batches_tracked', 'module.conv5.B02.conv_i2.bn.num_batches_tracked', 'module.conv5.B02.conv_m1.bn.num_batches_tracked', 'module.conv5.B02.conv_m2.bn.num_batches_tracked', 'module.conv5.B03.conv_i1.bn.num_batches_tracked', 'module.conv5.B03.conv_i2.bn.num_batches_tracked', 'module.conv5.B03.conv_m1.bn.num_batches_tracked', 'module.conv5.B03.conv_m2.bn.num_batches_tracked', 'module.tail.bn.num_batches_tracked']
2018-11-01 12:03:35 INFO: Only model state resumed from: ././../exps/models/MFNet3D_UCF-101_Split-1_96.3.pth_ep-0000.pth' 2018-11-01 12:03:35 WARNING: >> Epoch information inconsistant: 30 vs 0 2018-11-01 12:03:35 WARNING: VideoIter:: >> check_video' is off, `tolerant_corrupted_video' is automatically activated.

Should I neglect the warning while loading models?

Why do you need loops in testing?

Hi ! my friend
Your code is well written.I'm a novice in behavioral recognition.
I want to ask why 'for i_round in range(total_round):' is set in the test.py file.
Isn't it enough to test once?
Although I don't understand why, I tested it with this loop.
I found that the final accuracy was affected by the number of cycles.
This makes me even more confused, for a fixed weight, how can the final accuracy be different?
Is it influenced by 'duplication = 0.92 * duplication + 0.08 * avg_score[video_subpath_i][3]'?
I'm ashamed to say that I didn't understand this line of code.

question

How to solve this problem：
File "MFNet/train/metric.py", line122, in update
self.num_inst += loss.shape[0]
IndexError: tuple index out of range

Where does MFNet output the features of input videos?

Executed your pre-trained model and tested it on some videos of HMDB51. I want to get the output features of the videos I am inserting to it. Where can I get it ( resulting feature vector or output or anything like that) ?

2019-03-15 15:54:10 INFO: VideoIter:: found 32 videos in `../dataset/HMDB51/raw/list_cvt/testlist01.txt'
2019-03-15 15:54:10 INFO: VideoIter:: iterator initialized (phase: 'test', num: 32)
2019-03-15 15:54:10 INFO: round #0/5
2019-03-15 15:54:24 INFO: 0.0%, 1.0 	| Batch [0,0]    	Avg: loss-ce = 4.13500, top1 = 0.00000, top5 = 0.12500
2019-03-15 15:54:25 INFO: round #1/5
2019-03-15 15:54:35 INFO: 0.0%, 1.5 	| Batch [0,0]    	Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.06250
2019-03-15 15:54:36 INFO: round #2/5
2019-03-15 15:54:46 INFO: 0.0%, 2.5 	| Batch [0,0]    	Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.03125
2019-03-15 15:54:47 INFO: round #3/5
2019-03-15 15:54:57 INFO: 0.0%, 3.4 	| Batch [0,0]    	Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.09375
2019-03-15 15:54:59 INFO: round #4/5
2019-03-15 15:55:07 INFO: 0.0%, 4.4 	| Batch [0,0]    	Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.06250
2019-03-15 15:55:09 INFO: Evaluation Finished!
2019-03-15 15:55:09 INFO: Total time cost: 7.2 sec
2019-03-15 15:55:09 INFO: Speed: 22.3596 samples/sec
2019-03-15 15:55:09 INFO: Accuracy:
2019-03-15 15:55:09 INFO: [
    [
        [
            "loss-ce",
            4.672654986381531
        ]
    ],
    [
        [
            "top1",
            0.0
        ]
    ],
    [
        [
            "top5",
            0.0625
        ]
    ]
]

What is your environment?

python?
opencv?

Training tricks

Do you have any training tricks on ucf101,when I train MFNET on ucf101,it turns out to overfitting easily. Is a pretraining on large dataset like kinetics necessary and what's your key tricks to improve accuracy?

deleted issue

getting lower top1 accuary on ucf101

Model: the MF-Net (3D) model(split1) provided on the google drive
Data: UCF101 split1 testlist01.txt
Code: python evaluate_video_ucf101_split1.py --task-name ./../exps/models/MFNet3D_UCF-101_Split-1_96.3.pth
i test the model on that data, but get lower top1-accuary:
[
[
"top1",
0.9333862014274386
]
],
[
[
"top5",
0.9949775310600053
]

the top1 acc is 0.933, much lower than the average acc 0.96 reported in paper.
what is the crop method used in the paper?
is there something i missed that cause the lower acc?

2D Model

Hi, where can I find the 2D model? I don't mean the weights, I mean the code.
I see only mfnet3d.py file...

Thanks.

Video level accuracy - small question

Hi, I'm trying to understand your video level accuracy calculations.
Generally, and please correct me if I'm wrong, you take a video and sample N clips, accumulate the outputs, and then take the top1, right? let's say we have 2 classes and the results are:
0.3 , 0.7
1 , 0
0.4, 0.6
The predicted label will be 1?
And now I'm trying to understand your calculations, I couldn't understand how come your multiplying part with 0.92 and a part with 0.08 on the test file.
I'll be happy for a short explanation.

Thanks!

Some Questions about the Training Process

Hi Yunpeng, I am new to video recognition tasks. I ran the code and have some questions about the whole procedure.

For training, do you randomly sample 16 frames from the whole video to do classification? And each time it may be different 16 frames for the same video?
When I was trying to run the codes train_hmdb51, there are many logs like 'frame[30] is error, use backup item XXX.avi'. What does this mean? Does this mean that there are some errors in my video data?(I downloaded it from the official website)
It seems that the train_hmdb51 is doing both training and evaluation after each epoch. So why do we need another evaluation code like evaluate_video.py to do test?

Thanks a lot for your help!

Fine-tune with pre-trained model

Hi！How to save the fine-tuned model as a .pth file？

testing on kinetics takes long time?

This is some main configs:

"batch_size": 8,
"clip_length": 16,
"dataset": "Kinetics",
"debug_mode": true,
"frame_interval": 2,
"model_prefix": "././../exps/models/MFNet3D_Kinetics-400_72.8.pth",
"network": "mfnet_3d",
"task_name": "./../exps/models/MFNet3D_Kinetics-400_72.8.pth"

I use 4 Telsa-P40 and 48 cpu cores to test the model on Kinetics val set (19761 videos) with 10 round.
It takes about 27hours (10 round)
Is this a normal speed?
Is there some trick to speed up the dataloader?

Fine tune Kinetics on HMDB51 split2 - Can't achieve same results

Hi, as mentioned in the paper, and in most of the papers in the action recognition field, the result is averaged on the 3 splits. because only split1 finetuning shared, I'm trying to fine tune my self, but I can't get the same results, the training is very "noisy", I'm using the pretrained model on kinetics that was uploaded, is there anything else I need to keep my attention on?
I'm using your environment of course. (btw in the annotation files, others, I believe means validation set?)
Thanks!

higher accuracy

what a outstanding job. i think you can better higher accuracy if you use (4*corner crop + 1 center crop)*2 flip transforming operations in test stage which used in TSN.

could you share your downloaded dataset kinetics-400 ？

I am new to video classification and trying to reproduce MF-net's results。While i am stuck in download the kinetics-400 dataset, it would be a great help if anyone willing to share the dataset. Thanks.

loss-ce constantly on 1.0000

Hi , when training or finetuning, I'm always getting loss-ce=1.000, any suggestions?

Out of memory

@cypw
Hello!
Out of memory will occur during the training and testing of ucf-101 data. I only have a 32G CPU. Is this a normal phenomenon? How much memory is needed to train the network properly?
Thank you very much!

Where is the pretrained model for ucf101 to finetune

In the train_ucf101, the default pretrained model for ucf101 to finetune is
'./network/pretrained/vY5_866M_Kinetics_v50_fm16-it123_ep-0019.pth'
Is the model is same as MFNet3D_Kinetics-400_72.8.pth ?

Insert MF into bigger networks?

Hi, Yunpeng, have you tried to insert your MF block into bigger networks, e.g. ResNet 152?

Can you provide me with the download address of ImageNet-1k?

Is it possible for you to also provide a flow model on kinetic?

I think to train a flow mfnet on kinetic probably could further boost the performance. And then we can also use the mfnet as backbone for a two-stream method for other task like action detection.

Question about counting the parameter number and FLOPs

I counted the parameter number and FLOPs of MFNet, the
parameter number is computed by the code

model = MFNET_3D(num_classes=101)
params = sum(p.numel() for p in model.parameters())

outputs a same result 7996368 as shown in the paper: 8.0 M.

But the FLOPs I got is different from the result in the paper, so whether you can show me the code you compute the FLOPS, especially for the nn.Conv3d layer, I think I made some mistake of computing the FLOPS of nn.Conv3d layer.

When using the train_ucf101.py file for model training, the model evaluation part cannot enter the loop.

Traceback (most recent call last):
File "train_ucf101.py", line 146, in
train_model(sym_net=net, **kwargs)
File "/PyTorch-MFNet/train_model.py", line 117, in train_model
epoch_end=end_epoch,)
File "/PyTorch-MFNet/train/model.py", line 328, in fit
self.callback_kwargs['sample_elapse'] = sum_sample_elapse / sum_sample_inst
ZeroDivisionError: float division by zero

The error content is as above. When I use the train_ucf101.py file for model training, the loop of the model evaluation part in the /train/model.py file cannot enter, resulting in the sum_sample_inst parameter being zero, and subsequent division operations report an error. I checked the loop body content and The data set reads part of the code and found that all variables have corresponding values. How can I modify the code to make the error disappear?

Initial weights

Hi.

I see that the code requires initialisation weights from kinetics (vY5_866M_Kinetics_v50_fm16-it123_ep-0019.pth).

Where can we find it?

Thanks

questions about number of trainning epochs

how many epochs do I need to train the Kinetics400 model using train_kinetics.py to get 72.8 accuary?
https://github.com/cypw/PyTorch-MFNet/blob/master/train_kinetics.py#L67 says 10000?
Is this 10000 too large?

Where to start from a Beginner's point of view ?

Hi
I am a beginner in the ML field. So far I have just created my own models in Jupyter notebooks and ran them. But using a baseline model is kind of new for me. I don't know where to start from.
I understand I need to compile the models first. But can someone lay down some beginner steps from where to start with using this code.