foolwood / dcfnet Goto Github PK
View Code? Open in Web Editor NEWDCFNet: Discriminant Correlation Filters Network for Visual Tracking
License: MIT License
DCFNet: Discriminant Correlation Filters Network for Visual Tracking
License: MIT License
In the training state, DCFNet use the target and the search image as input and the center gaussion label(then move to the left-top). Will it better if we generate gaussion label according to the position of the target in the search image? If the position of the target is not in the center, then generate the shifted label.
Hi, there:
U guys did a good job and thanks for sharing the code!
When I tried to replicate your work done in the preprint, I've encountered two issues:
When loading UAV123, the variable (or a function?) named configSeqs
is not defined when it is called at this line.
When loading NUS_PRO, the file named seq_list_with_gt.csv
is non-exist in this repo as mentioned at this line.
The two issues blocked us from following your paper, ie, to train the DCFNet on the three datasets.
So, could you please check them out and fix'em?
Thanks.
Hello,when I run the train_DCFNet.m,the display content in command window is like this:
train: epoch 08: 226/572: 84.8 (84.8) HZ objective: 44.632
train: epoch 08: 227/572: 84.8 (84.8) HZ objective: 44.628
train: epoch 08: 228/572: 84.8 (84.8) HZ objective: 44.549
train: epoch 08: 229/572: 84.8 (84.8) HZ objective: 44.570
......
the last value are always between 44 and 46,no obvious increase or decrease,I am not sure is it normal?
I use a subset of VID,the first 100 files in the first folder.
Thanks for your attention.
Excuse me, I want to ask how to implement the pooling layer or other functions defined by ourselves? @foolwood
Hi,
I noticed that while training, the CNN output (before DCF) is not mult with cos window (or hann) in contradiction to tracking pipeline that mult the CNN output.
Why?
thankyou~~~
Hi,
It seems that your work is closely related to CFNET, yet your performance is much better, 0.624 vs 0.568 in OTB100. Can you elaborate what makes such a big difference since CFNET also uses VID as training set and exponential decay learning rate schedule?
Thanks
In your tracking process, the output feature size is 125 12532 before the DCF layer.
However, in your training code, the networkType is set to 12, and the output size after two convolutional laysers is smaller than the input image size by 4. (feature_za=input_size-[4,4])
I wonder why the saved netork (.mat file) after training can generate different feature size?
Look for your early reply! And thank you very much.
Hi,
Thank you for great job.
I have few questions:
Thank you
作者想问下,测试你的training下训练的模型应该用哪个啊,得到model/DCFNet-net-12-125-2.mat 还有model_mulit还有data/DCFNet-net-12-125-2.0/net-epoch-1到50,我觉得是model/DCFNet那个的,不知道是不是,谢谢
Hello, does the training data use VID's latest 86G training? Why do I get out of memory errors when running train_DCFNet.m?
What is the function of LRN layer? I notice that some siamese trackers use BatchNorm layer.
Hi,
I noticed that the current train_DCFNet.m
script trains networkType = 12 while the released model (DCFNet-net-7-125-2.mat) use networkType = 7. So my question is how to retrain this model?
For now, the meta data of the released model is:
train_DCFNet.m
, then I am good to go ?Thanks
你好,想问下vot的话,要怎么测试呀,有评测的脚本么?谢谢,这个是很好的工作,非常感谢
Thanks for your work and code.
When i use the VID2015 as training data ,it will generate 464873 pairs traning data, even the batchsize is set to be 32, it still have more than 10000 calculations in one epoch. How to solve this problem when you are training the network?
Thanks~
why is it very slow to load when I want to run demo ?
Thanks for good work as usual~
Take type-7 and type-12 network for example. I find that the padding of the conv layers is all 0 when training(input size: 125 output size: 121). But when tracking, the padding is set to 1 (input size: 125 output size: 125) while using the same conv parameters. Can you explain why you do like this? It is a theoretical setting(better in theory) or just a experimental result (better performance in experiments)?
I use this net to train:
elseif networkType == 31
%% target
conv1 = dagnn.Conv('size', [5 5 3 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv1', conv1, {'target'}, {'conv1'}, {'conv1f', 'conv1b'}) ;
net.addLayer('relu1', dagnn.ReLU(), {'conv1'}, {'conv1r'});
conv2 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv2', conv2, {'conv1r'}, {'conv2'}, {'conv2f', 'conv2b'}) ;
net.addLayer('relu2', dagnn.ReLU(), {'conv2'}, {'conv2r'});
conv3 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv3', conv3, {'conv2r'}, {'conv3'}, {'conv3f', 'conv3b'}) ;
conv4 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv4', conv4, {'conv3'}, {'conv4'}, {'conv4f', 'conv4b'}) ;
conv5 = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv5', conv5, {'conv4'}, {'conv5'}, {'conv5f', 'conv5b'}) ;
net.addLayer('conv5_dropout' ,dagnn.DropOut('rate', 0.2),{'conv5'},{'x'});
%% search
conv1s = dagnn.Conv('size', [5 5 3 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv1s', conv1s, {'search'}, {'conv1s'}, {'conv1f', 'conv1b'}) ;
net.addLayer('relu1s', dagnn.ReLU(), {'conv1s'}, {'conv1sr'});
conv2s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv2s', conv2s, {'conv1sr'}, {'conv2s'}, {'conv2f', 'conv2b'}) ;
net.addLayer('relu2s', dagnn.ReLU(), {'conv2s'}, {'conv2sr'});
conv3s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv3s', conv3s, {'conv2sr'}, {'conv3s'}, {'conv3f', 'conv3b'}) ;
conv4s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv4s', conv4s, {'conv3s'}, {'conv4s'}, {'conv4f', 'conv4b'}) ;
conv5s = dagnn.Conv('size', [5 5 32 32],'pad', 2, 'stride', 1, 'dilate', 1, 'hasBias', true) ;
net.addLayer('conv5s', conv5s, {'conv4s'}, {'conv5s'}, {'conv5f', 'conv5b'}) ;
net.addLayer('conv5s_dropout' ,dagnn.DropOut('rate', 0.2),{'conv5s'},{'z'});
window_sz = [125,125];
end
Question: The CLE and objective (both train and val) decreases during training. But I test the saved model, surprisingly to find that the tracker fail to track the target as the epoch increases. I use the earlier model which the epoch is lower, the tracker track successfully. So I think maybe the model is overfitting? So I test one video in the train set. If the model become overfitting, tracker may perform excellently in the training set. However, the tracker fail to track the target in the early stage. Is there any bug in the layer DCF?
Another question is in the getlmdbRAM.m: imdb.images.set(randperm(num_all_frame,100)) = int8(2); The program select randomly 100 frames in the whole frames sequences to be the val set. But some val frames may be adjacent to the train frames. Maybe it will be better to choose the val set by the videos which don't exist in the train set?
In one of the issues #5, you mentioned that the resolution map is an important factor why DCFNet has better performance than CFNet. Could you elaborate more information about this? How much AUC would loss if the resolution of the feature map is smaller, such as 33/63 as mentioned?
I meet the same problem with the same environment. After I change the default setting to nonrecursive, the problem still exists. Although the hint change, the problem seems unchanged.
Error Message below:
Error using vl_argparse (line 108)
Expected either a param-value pair or a structure.
Error in run_DCFNet>DCFNet_initialize (line 54)
state = vl_argparse(state, param, 'nonrecursive');
Error in run_DCFNet (line 27)
[state, ~] = DCFNet_initialize(im{1}, init_rect, param);
Error in demo_DCFNet (line 15)
res = run_DCFNet(subS,0,0,param);
It seems like the param must be a structure. If not, the function will be in loop. But I do not allow it to loop, the function have to stop because of the lack of structured parameter. Can you help me?
Thanks a lot.
Originally posted by @tzjtatata in #9 (comment)
Hey,
Is there any newer implementation of the Discriminant Correlation Filters, which are working with the newer pytorch versions?
They changed a lot at their complex number calculations, so the code wouldn't work with a newer pytorch version.
Thanks a lot.
Dear Qiang Wang:
can you explain how imgcrop_multiscale.m work?
I cannot grasp your thought though DCFNet indeed achieve a better results.
all best;
heng.
Hi, thnaks for your sharing.
When i run the code in matlab2016a with matconvnet beta25, there is always an error:
error using vl_argparse (line 63)
OPTS must be a structure
error vl_argparse (line 160)
opts.(field) = vl_argparse(opts.(field), value,
'merge') ;
error vl_argparse (line 97)
opts = vl_argparse(opts, vertcat(params,values),
varargin{:}) ;
error DCFNet_initialize (line 16)
state = vl_argparse(state, param);
error demo_DCFNet (line 43)
[state, ~] = DCFNet_initialize(im{1}, init_rect, opts);
After a check on the relevant code, the error could be aviod when the code 'state.net =[];' changed to be 'state.net = param.net;' in DCFNet_initialize.m.
Does the error occured because my matconvnet version or something else?
Thanks!
Dear @foolwood,
Is there any Python equivalent of your code?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.