cypw / pytorch-mfnet Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi,I can not download the knetics pretrained models at https://goo.gl/QdE85T
while excecuting python evaluate_video_ucf101_split1.py
File "evaluate_video_ucf101_split1.py", line 107, in
net.load_checkpoint(epoch=args.load_epoch)
File "../train/model.py", line 62, in load_checkpoint
assert os.path.exists(load_path), "Failed to load: {} (file not exist)".format(load_path)
AssertionError: Failed to load: ./../exps/<your_tesk_name>_ep-0000.pth (file not exist)
Kindly help
I've noticed the higher memory usage of MFNet compared to that of ResNet in image processing.
Settings:
Framework: pytorch
resnet model: resnet18 from torchvision
mfnet model: modification of /network/mfnet_3d.py for 2d processing
input size: 128x3x224x224
with enabled gradients computation.
Results:
GPU memory consumption observed: 8GB for mfnet vs 3.4GB for resnet
The number of params matches to the paper for each model, but the actual memory consumption of mfnet doesn't reflect the reduced FLOPS. Have you observed the same behaviour, or could this be caused by the 2d conversion of the model? I'm confident that the modifications I made follows the description of 2D architecture on Table2 and it shouldn't be tricky. Any idea?
(By the way I still appreciate if you could release an official 2D version of MFNet, even though that's not the main point of your work.)
I didn't find that you use 'softmax' after the last layer(after 'classifier' in 'mfnet_3d.py') or in 'model.fit' (in 'model.py'), but you still use 'CrossEntropyLoss'. However, in 'evaluate_video_ucf101_split1.py' I found 'softmax' was used. Did I overlook anything? If you really did not use 'softmax' in training process, why did you do so? Thanks!
VideoIter:: >> frame [] is error & backup is inavailable. [./dataset/UCF101/raw/data/CricketBowling/v_CricketBowling_g22_c07.avi]'
2019-03-09 22:30:13: >> I/O error(None): None
2019-03-09 22:30:13: VideoIter:: ERROR!! (Force using another index:
3279)
VideoIter:: >> frame [] is error & backup is inavailable. [./dataset/UCF101/raw/data/TableTennisShot/v_TableTennisShot_g11_c05.avi]'
2019-03-09 22:30:13: VideoIter:: ERROR!! (Force using another index:
2801)
VideoIter:: >> frame [] is error & backup is inavailable. [./dataset/UCF101/raw/data/ApplyEyeMakeup/v_ApplyEyeMakeup_g08_c05.avi]'
why I have this question?
Hi Yunpeng, just read your paper and have a couple of quick questions:
Why do you implement batchnorm + relu before the convolution in the BN_AC_CONV3D
class here?
In your paper you say: we set the number of the first-layer output channels to be 4 times smaller than its input channels, ...
Hence, shouldn't this line be written as shown below?
# current version: num_ix = int(num_mid/4)
num_ix = int(num_in/4)
I believe self.conv_i1
and self.conv_i2
are the layers for the multiplexer. Or am I getting it wrong?
Lastly, a small suggestion: since you are using PyTorch v0.4, you need not use Variable
anymore. Hence you can write this line as:
data = torch.randn(1,3,16,224,224)
Thank you.
Hi, great code !
I have been noticing GPU usage is a bit low (around 40%), and trying to optimize.
I've been noticing that HLSTransform is very CPU intensive.
Are you aware of any way to have it executed on GPU instead of CPU ?
Do you think it could help ?
Thanks
hello,
do you have real time video recognition implementation of mfnet?
thanks
Hi.
I see in the paper that the accuracy on kinetics dataset is 72.8%.
As seen in this table
But on the graph below, it seems that the results are presented on the training set and not validation set.
So I wanted to know if I misunderstood something, and if the aforementioned result is on the training set or the validation set. And if you presented the accuracy on the training set, what accuracy on validation set did you get?
Thanks in advance.
2018-11-01 12:03:35 WARNING: >> Failed to load: ['module.conv1.bn.num_batches_tracked', 'module.conv2.B01.conv_i1.bn.num_batches_tracked', 'module.conv2.B01.conv_i2.bn.num_batches_tracked', 'module.conv2.B01.conv_m1.bn.num_batches_tracked', 'module.conv2.B01.conv_m2.bn.num_batches_tracked', 'module.conv2.B01.conv_w1.bn.num_batches_tracked', 'module.conv2.B02.conv_i1.bn.num_batches_tracked', 'module.conv2.B02.conv_i2.bn.num_batches_tracked', 'module.conv2.B02.conv_m1.bn.num_batches_tracked', 'module.conv2.B02.conv_m2.bn.num_batches_tracked', 'module.conv2.B03.conv_i1.bn.num_batches_tracked', 'module.conv2.B03.conv_i2.bn.num_batches_tracked', 'module.conv2.B03.conv_m1.bn.num_batches_tracked', 'module.conv2.B03.conv_m2.bn.num_batches_tracked', 'module.conv3.B01.conv_i1.bn.num_batches_tracked', 'module.conv3.B01.conv_i2.bn.num_batches_tracked', 'module.conv3.B01.conv_m1.bn.num_batches_tracked', 'module.conv3.B01.conv_m2.bn.num_batches_tracked', 'module.conv3.B01.conv_w1.bn.num_batches_tracked', 'module.conv3.B02.conv_i1.bn.num_batches_tracked', 'module.conv3.B02.conv_i2.bn.num_batches_tracked', 'module.conv3.B02.conv_m1.bn.num_batches_tracked', 'module.conv3.B02.conv_m2.bn.num_batches_tracked', 'module.conv3.B03.conv_i1.bn.num_batches_tracked', 'module.conv3.B03.conv_i2.bn.num_batches_tracked', 'module.conv3.B03.conv_m1.bn.num_batches_tracked', 'module.conv3.B03.conv_m2.bn.num_batches_tracked', 'module.conv3.B04.conv_i1.bn.num_batches_tracked', 'module.conv3.B04.conv_i2.bn.num_batches_tracked', 'module.conv3.B04.conv_m1.bn.num_batches_tracked', 'module.conv3.B04.conv_m2.bn.num_batches_tracked', 'module.conv4.B01.conv_i1.bn.num_batches_tracked', 'module.conv4.B01.conv_i2.bn.num_batches_tracked', 'module.conv4.B01.conv_m1.bn.num_batches_tracked', 'module.conv4.B01.conv_m2.bn.num_batches_tracked', 'module.conv4.B01.conv_w1.bn.num_batches_tracked', 'module.conv4.B02.conv_i1.bn.num_batches_tracked', 'module.conv4.B02.conv_i2.bn.num_batches_tracked', 'module.conv4.B02.conv_m1.bn.num_batches_tracked', 'module.conv4.B02.conv_m2.bn.num_batches_tracked', 'module.conv4.B03.conv_i1.bn.num_batches_tracked', 'module.conv4.B03.conv_i2.bn.num_batches_tracked', 'module.conv4.B03.conv_m1.bn.num_batches_tracked', 'module.conv4.B03.conv_m2.bn.num_batches_tracked', 'module.conv4.B04.conv_i1.bn.num_batches_tracked', 'module.conv4.B04.conv_i2.bn.num_batches_tracked', 'module.conv4.B04.conv_m1.bn.num_batches_tracked', 'module.conv4.B04.conv_m2.bn.num_batches_tracked', 'module.conv4.B05.conv_i1.bn.num_batches_tracked', 'module.conv4.B05.conv_i2.bn.num_batches_tracked', 'module.conv4.B05.conv_m1.bn.num_batches_tracked', 'module.conv4.B05.conv_m2.bn.num_batches_tracked', 'module.conv4.B06.conv_i1.bn.num_batches_tracked', 'module.conv4.B06.conv_i2.bn.num_batches_tracked', 'module.conv4.B06.conv_m1.bn.num_batches_tracked', 'module.conv4.B06.conv_m2.bn.num_batches_tracked', 'module.conv5.B01.conv_i1.bn.num_batches_tracked', 'module.conv5.B01.conv_i2.bn.num_batches_tracked', 'module.conv5.B01.conv_m1.bn.num_batches_tracked', 'module.conv5.B01.conv_m2.bn.num_batches_tracked', 'module.conv5.B01.conv_w1.bn.num_batches_tracked', 'module.conv5.B02.conv_i1.bn.num_batches_tracked', 'module.conv5.B02.conv_i2.bn.num_batches_tracked', 'module.conv5.B02.conv_m1.bn.num_batches_tracked', 'module.conv5.B02.conv_m2.bn.num_batches_tracked', 'module.conv5.B03.conv_i1.bn.num_batches_tracked', 'module.conv5.B03.conv_i2.bn.num_batches_tracked', 'module.conv5.B03.conv_m1.bn.num_batches_tracked', 'module.conv5.B03.conv_m2.bn.num_batches_tracked', 'module.tail.bn.num_batches_tracked']
2018-11-01 12:03:35 INFO: Only model state resumed from: ././../exps/models/MFNet3D_UCF-101_Split-1_96.3.pth_ep-0000.pth' 2018-11-01 12:03:35 WARNING: >> Epoch information inconsistant: 30 vs 0 2018-11-01 12:03:35 WARNING: VideoIter:: >>
check_video' is off, `tolerant_corrupted_video' is automatically activated.
Should I neglect the warning while loading models?
Hi ! my friend
Your code is well written.I'm a novice in behavioral recognition.
I want to ask why 'for i_round in range(total_round):' is set in the test.py file.
Isn't it enough to test once?
Although I don't understand why, I tested it with this loop.
I found that the final accuracy was affected by the number of cycles.
This makes me even more confused, for a fixed weight, how can the final accuracy be different?
Is it influenced by 'duplication = 0.92 * duplication + 0.08 * avg_score[video_subpath_i][3]'?
I'm ashamed to say that I didn't understand this line of code.
How to solve this problem:
File "MFNet/train/metric.py", line122, in update
self.num_inst += loss.shape[0]
IndexError: tuple index out of range
Executed your pre-trained model and tested it on some videos of HMDB51. I want to get the output features of the videos I am inserting to it. Where can I get it ( resulting feature vector or output or anything like that) ?
2019-03-15 15:54:10 INFO: VideoIter:: found 32 videos in `../dataset/HMDB51/raw/list_cvt/testlist01.txt'
2019-03-15 15:54:10 INFO: VideoIter:: iterator initialized (phase: 'test', num: 32)
2019-03-15 15:54:10 INFO: round #0/5
2019-03-15 15:54:24 INFO: 0.0%, 1.0 | Batch [0,0] Avg: loss-ce = 4.13500, top1 = 0.00000, top5 = 0.12500
2019-03-15 15:54:25 INFO: round #1/5
2019-03-15 15:54:35 INFO: 0.0%, 1.5 | Batch [0,0] Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.06250
2019-03-15 15:54:36 INFO: round #2/5
2019-03-15 15:54:46 INFO: 0.0%, 2.5 | Batch [0,0] Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.03125
2019-03-15 15:54:47 INFO: round #3/5
2019-03-15 15:54:57 INFO: 0.0%, 3.4 | Batch [0,0] Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.09375
2019-03-15 15:54:59 INFO: round #4/5
2019-03-15 15:55:07 INFO: 0.0%, 4.4 | Batch [0,0] Avg: loss-ce = 4.67265, top1 = 0.00000, top5 = 0.06250
2019-03-15 15:55:09 INFO: Evaluation Finished!
2019-03-15 15:55:09 INFO: Total time cost: 7.2 sec
2019-03-15 15:55:09 INFO: Speed: 22.3596 samples/sec
2019-03-15 15:55:09 INFO: Accuracy:
2019-03-15 15:55:09 INFO: [
[
[
"loss-ce",
4.672654986381531
]
],
[
[
"top1",
0.0
]
],
[
[
"top5",
0.0625
]
]
]
python?
opencv?
Do you have any training tricks on ucf101,when I train MFNET on ucf101,it turns out to overfitting easily. Is a pretraining on large dataset like kinetics necessary and what's your key tricks to improve accuracy?
Model: the MF-Net (3D) model(split1) provided on the google drive
Data: UCF101 split1 testlist01.txt
Code: python evaluate_video_ucf101_split1.py --task-name ./../exps/models/MFNet3D_UCF-101_Split-1_96.3.pth
i test the model on that data, but get lower top1-accuary:
[
[
"top1",
0.9333862014274386
]
],
[
[
"top5",
0.9949775310600053
]
the top1 acc is 0.933, much lower than the average acc 0.96 reported in paper.
what is the crop method used in the paper?
is there something i missed that cause the lower acc?
Hi, where can I find the 2D model? I don't mean the weights, I mean the code.
I see only mfnet3d.py file...
Thanks.
Hi, I'm trying to understand your video level accuracy calculations.
Generally, and please correct me if I'm wrong, you take a video and sample N clips, accumulate the outputs, and then take the top1, right? let's say we have 2 classes and the results are:
0.3 , 0.7
1 , 0
0.4, 0.6
The predicted label will be 1?
And now I'm trying to understand your calculations, I couldn't understand how come your multiplying part with 0.92 and a part with 0.08 on the test file.
I'll be happy for a short explanation.
Thanks!
Hi Yunpeng, I am new to video recognition tasks. I ran the code and have some questions about the whole procedure.
For training, do you randomly sample 16 frames from the whole video to do classification? And each time it may be different 16 frames for the same video?
When I was trying to run the codes train_hmdb51, there are many logs like 'frame[30] is error, use backup item XXX.avi'. What does this mean? Does this mean that there are some errors in my video data?(I downloaded it from the official website)
It seems that the train_hmdb51 is doing both training and evaluation after each epoch. So why do we need another evaluation code like evaluate_video.py to do test?
Thanks a lot for your help!
Hi!How to save the fine-tuned model as a .pth file?
This is some main configs:
"batch_size": 8,
"clip_length": 16,
"dataset": "Kinetics",
"debug_mode": true,
"frame_interval": 2,
"model_prefix": "././../exps/models/MFNet3D_Kinetics-400_72.8.pth",
"network": "mfnet_3d",
"task_name": "./../exps/models/MFNet3D_Kinetics-400_72.8.pth"
I use 4 Telsa-P40 and 48 cpu cores to test the model on Kinetics val set (19761 videos) with 10 round.
It takes about 27hours (10 round)
Is this a normal speed?
Is there some trick to speed up the dataloader?
Hi, as mentioned in the paper, and in most of the papers in the action recognition field, the result is averaged on the 3 splits. because only split1 finetuning shared, I'm trying to fine tune my self, but I can't get the same results, the training is very "noisy", I'm using the pretrained model on kinetics that was uploaded, is there anything else I need to keep my attention on?
I'm using your environment of course. (btw in the annotation files, others, I believe means validation set?)
Thanks!
what a outstanding job. i think you can better higher accuracy if you use (4*corner crop + 1 center crop)*2 flip transforming operations in test stage which used in TSN.
I am new to video classification and trying to reproduce MF-net's results。While i am stuck in download the kinetics-400 dataset, it would be a great help if anyone willing to share the dataset. Thanks.
@cypw
Hello!
Out of memory will occur during the training and testing of ucf-101 data. I only have a 32G CPU. Is this a normal phenomenon? How much memory is needed to train the network properly?
Thank you very much!
In the train_ucf101, the default pretrained model for ucf101 to finetune is
'./network/pretrained/vY5_866M_Kinetics_v50_fm16-it123_ep-0019.pth'
Is the model is same as MFNet3D_Kinetics-400_72.8.pth ?
Hi, Yunpeng, have you tried to insert your MF block into bigger networks, e.g. ResNet 152?
I think to train a flow mfnet on kinetic probably could further boost the performance. And then we can also use the mfnet as backbone for a two-stream method for other task like action detection.
I counted the parameter number and FLOPs of MFNet, the
parameter number is computed by the code
model = MFNET_3D(num_classes=101)
params = sum(p.numel() for p in model.parameters())
outputs a same result 7996368
as shown in the paper: 8.0 M.
But the FLOPs I got is different from the result in the paper, so whether you can show me the code you compute the FLOPS, especially for the nn.Conv3d layer, I think I made some mistake of computing the FLOPS of nn.Conv3d layer.
Traceback (most recent call last):
File "train_ucf101.py", line 146, in
train_model(sym_net=net, **kwargs)
File "/PyTorch-MFNet/train_model.py", line 117, in train_model
epoch_end=end_epoch,)
File "/PyTorch-MFNet/train/model.py", line 328, in fit
self.callback_kwargs['sample_elapse'] = sum_sample_elapse / sum_sample_inst
ZeroDivisionError: float division by zero
The error content is as above. When I use the train_ucf101.py file for model training, the loop of the model evaluation part in the /train/model.py file cannot enter, resulting in the sum_sample_inst parameter being zero, and subsequent division operations report an error. I checked the loop body content and The data set reads part of the code and found that all variables have corresponding values. How can I modify the code to make the error disappear?
Hi.
I see that the code requires initialisation weights from kinetics (vY5_866M_Kinetics_v50_fm16-it123_ep-0019.pth).
Where can we find it?
Thanks
how many epochs do I need to train the Kinetics400 model using train_kinetics.py to get 72.8 accuary?
https://github.com/cypw/PyTorch-MFNet/blob/master/train_kinetics.py#L67 says 10000?
Is this 10000 too large?
Hi
I am a beginner in the ML field. So far I have just created my own models in Jupyter notebooks and ran them. But using a baseline model is kind of new for me. I don't know where to start from.
I understand I need to compile the models first. But can someone lay down some beginner steps from where to start with using this code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.