wei-tim / yowo Goto Github PK
View Code? Open in Web Editor NEWYou Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
Hello, thank you for your excellent work.
I have some problems when training YOWO on jhmdb21 dataset:
How to convert jhmdb21 Ground Truth (joint_positions) .mat file to .txt just like ucf24/labels/ ?
How to get video_mAP? I have seen the video_mAP.py source code and find that I don't have "testlist_video.txt" file. How to generate such a file? or can you share yours?
testlist = os.path.join(base_path, 'testlist_video.txt')
In train.py, is locolization or localization? which one is right?
locolization_recall = 1.0 * total_detected / (total + eps) print("Classification accuracy: %.3f" % classification_accuracy) print("Locolization recall: %.3f" % locolization_recall)
Dear Sir:
when i evaluate the mAP on ucf dataset, get the following error:
print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 236, in evaluate_videoAP
pred_videos_format = imagebox_to_videts(all_boxes, CLASSES)
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 213, in imagebox_to_videts
preVideo = os.path.dirname(keys[0])
IndexError: list index out of range
I want to test the code on the webcam or on a chosen video to generate videos like the shown the folder examples.
How to do that?
I got following error:
(base) yu@64:~/cv2/YOWO$ python video_mAP.py --dataset ucf101-24 --data_cfg cfg/ucf24.data --cfg_file cfg/ucf24.cfg --n_classes 24 --backbone_3d resnext101 --backbone_2d darknet --resume_path /home/yuyonod/cv2/YOWO/models/yowo_ucf101-24_16f_best_fmap_08749.pth
DataParallel(
(module): YOWO(
(backbone_2d): Darknet(
(models): ModuleList()
/home/yuyonod/cv2/YOWO/test.avi
iou is: 0.05
Traceback (most recent call last):
File "video_mAP.py", line 334, in
video_mAP_ucf()
File "video_mAP.py", line 235, in video_mAP_ucf
print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 236, in evaluate_videoAP
pred_videos_format = imagebox_to_videts(all_boxes, CLASSES)
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 213, in imagebox_to_videts
preVideo = os.path.dirname(keys[0])
IndexError: list index out of range
Do you fix this problem? and Also is it possible to action localize with original video beyond the dataset?
Hi ,I want to know whether the train annotations label's format is [classes,x,y,w,h] or [classes,xmin,ymin,xmax,ymax]?
Thankfully, I have sucessed run train demo, but i want to know how can i see the rest time about training?
Try feed into a frame with shape: torch.Size([1, 3, 224, 224])
got error:
output = model(data)
File "E:\apps\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "D:\\YOWO\model.py", line 81, in forward
x_2d = input[:, :, -1, :, :] # Last frame of the clip that is read
IndexError: too many indices for tensor of dimension 4
Also tried permute make channel last but still got this.
Does input needs concate current frame and the frame before current togather when inference???
Can I maek my own dataset and how to do it?
Hi, can you share the scales you used in trainning on jhmdb and ucf24?
such as:
jhmdb21: [112, 224, 448]
ucf24: [224, 256, 320]
I download the dataset annotations from the link in README, but I find that there are only testing dataset annotations, no annotations of tranlist.txt. Can you check it out ? Thank you.
I have train yowo on ucf101-24 and jhmdb21 datasets, the best frame_mAP I can achieve is 84.3% and
70.9% for UCF101-24 and J-HMDB-21, which are far behind the accuracy of your report. And I see following words:
NOTE: With some extra tricks, YOWO achieves 87.5% and 76.7% frame_mAP for UCF101-24 and J-HMDB-21 datasets, respectively.
Can you share the extra tricks you use?
First of all ,Thanks for your project ,I am interested in this project,please show your label If it's convenient for you
Hello, I appreciate for your nice work and potential shared code.
It would be nice if you would share the pretrained model either.
And It would be wonderful if your code accept extra data such as original video and webcam for action localization.
Thank you in advance.
I want to ask how do you make the GIF visualization of the result?
I am wondering how do you calculate the loss function.
I know it is the sum of 6 aspects, but i am a bit confused on how it is calculated.
1.When I want to calculate video_map, I did not find the testlist_video.txt file.
2.I generated one myself, the format in it is like walk/50_FIRST_DATES_walk_f_cm_np1_le_med_33,it can run begin, but with the following error:
iou is: 0.05
Traceback (most recent call last):
File "video_mAP.py", line 327, in
video_mAP_jhmdb()
File "video_mAP.py", line 320, in video_mAP_jhmdb
print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/chase/YOWO/eval_results.py", line 244, in evaluate_videoAP
ap = video_ap_one_class(gt, pred_cls, iou_thresh, bTemporal, cls_len)
File "/home/chase/YOWO/eval_results.py", line 155, in video_ap_one_class
iou = np.array([iou3dt(np.array(g), boxes[:, :5]) for g in gt_this])
File "/home/chase/YOWO/eval_results.py", line 155, in
iou = np.array([iou3dt(np.array(g), boxes[:, :5]) for g in gt_this])
File "/home/chase/YOWO/utils.py", line 195, in iou3dt
return iou3d( b1[np.where(b1[:,0]==tmin)[0][0]:np.where(b1[:,0]==tmax)[0][0]+1,:] , b2[np.where(b2[:,0]==tmin)[0][0]:np.where(b2[:,0]==tmax)[0][0]+1,:] ) * temporal_inter / temporal_union
File "/home/chase/YOWO/utils.py", line 184, in iou3d
assert b1.shape[0] == b2.shape[0]
AssertionError
when i train the ucf24,it occurred,can you give me some advice? thank you
Traceback (most recent call last):
File "train.py", line 322, in
train(epoch)
File "train.py", line 180, in train
loss = region_loss(output, target)
File "/home/zc/anaconda3/envs/torch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/zc/coode/YOWO/region_loss.py", line 230, in forward
nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
File "/home/zc/coode/YOWO/region_loss.py", line 127, in build_targets
iou = bbox_iou(gt_box, pred_box, x1y1x2y2=False) # best_iou
File "/home/zc/coode/YOWO/utils.py", line 74, in bbox_iou
cw = w1 + w2 - uw
RuntimeError: Expected object of type torch.DoubleTensor but found type torch.FloatTensor for argument #3 'other'
Thanks for your awesome project. I want to evaluate the frame mAP performance as you offered in the paper. Could you please tell me which function is for frame mAP evaluation?
Thank you for your open source!
I want to use J-HMDB-21 to train the model. However, from your link here, I can only get rgb-images but without labels. I wonder if I miss anything or if the link is incorrect.
Hope to get your answer, thanks a lot!
Hi! Thanks for open sourcing the implementation. This is my first paper i read in spatio-temporal action localization.
I am aware that this might be not the best place to ask about the paper, but i cant seem to find other ways to contact (i.e, email address) in the paper so i decided to make an issue.
So i've read:
And from that, i make 1 conclusion :
That the whole linking thing is all about associating detections of one or more actions in one frame into one or more actions in subsequent frames
So does this mean it is possible for multiple actions of different classes to be detected at once?
I am sorry if this is such a silly question, i can't seem to find the statement above being explicitly stated.
Wanted to reduce num of classes to 8 but after starting train have this error.
void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [1,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
File "train.py", line 329, in <module>
train(epoch)
File "train.py", line 180, in train
loss = region_loss(output, target)
File "/home/nkise/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/nkise/Documents/neuron/YOWO/region_loss.py", line 259, in forward
loss_cls = self.class_scale * FL(cls, tcls)
File "/home/nkise/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/nkise/Documents/neuron/YOWO/FocalLoss.py", line 58, in forward
self.alpha = self.alpha.cuda()
RuntimeError: CUDA error: device-side assert triggered
Hi,
I have been trying to do inference on a video and I found myself lost. I have read that it is necessary to do inference in frames instead of videos, is it true?
Where is the script to run inference?
Thanks you your time,
Miguel.
Thank you very much for sharing your research!
I am trying to follow your README.md and want to do training.
But I have this error. I am guessing maybe I have different version of library?
I am really sorry to disturb you, but could you please give me some help?
`2020-02-05 21:44:41 training at epoch 1, lr 0.000100
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\Thesis\Master\YOWO-master\train.py", line 322, in
train(epoch)
File "F:\Thesis\Master\YOWO-master\train.py", line 170, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.`
In YOWO/evaluation/Object-Detection-Metrics/pascalvoc.py file,
from BoundingBox import BoundingBox
from BoundingBoxes import BoundingBoxes
but no BoundingBox and BoundingBoxes package
Hi sir, in region_loss.py, function 'build_targets', there is a judgment clause: 'if seen < 12800' and set groundtruth. I do really not understand the function of it. Can you give me some suggestions? Thank you.
Hi,
I have an Error while changing a clip_size in cfg:
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 4 and 5 at /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/THC/generic/THCTensorMath.cu:62
Is there any other configuration that I need to edit?
Thank You
thanks for open source code.
and i wonder can YOWO run on the AVA dataset?
@okankop @wei-tim thanks for open sourcing the code base , have few queries
Thanks for sharing the nice work!
I modify the script 'train.py' and use it as a demo for inference. Now I can get frame-level result, but I still have some problem:
When using the command stated in the README
file I got the following error:
2020-01-28 19:01:54 training at epoch 1, lr 0.000100
Traceback (most recent call last):
File "train.py", line 322, in <module>
train(epoch)
File "train.py", line 170, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/bt/Desktop/YOWO/dataset.py", line 53, in __getitem__
clip, label = load_data_detection(self.base_path, imgpath, self.train, self.clip_duration, self.shape, self.dataset_use, jitter, hue, saturation, exposure)
File "/home/bt/Desktop/YOWO/clip.py", line 161, in load_data_detection
im_ind = int(im_split[num_parts-1][0:5])
ValueError: invalid literal for int() with base 10: 'v_Flo'
Do you have any idea why this occur?
I have been digging into the code and I just can't get it to work. I understand the error itself, but not its origin.
Thank you for your time.
Why don't you use yolov3 or yolov3 tiny as backbone2d? Backbone2d network can be replaced with them?
in the def test do you do any optimization or training?
if not then i can use it for trying the model I had saved?
@okankop @wei-tim First of all thnaks for open sourcing your work and doing a great work , i had few queries
thanks in advance
Does the model saving checkpoints at the training? It seems that I need to start evaluation for it...
Thank you in advance for sharing your research!
I want to download ucf101-24 data to see it, but I can't get it from the link,Can you provide it to Baiduyun Disk together?
YOWO's input is a continuous frame of motion, not a video clip? How is the key frame of the 2D network input selected,Is it the last frame of the video?
I use the labelme tool to label my own video action clips. First, this video is cut into continuous frames,these frames are used as input to the network. Annotate each frame to get the corresponding json file and then converted to txt format as label. Is that right?
I am so sorry for my too many questions. I didn't understand it at the beginning of the research, hope to get your answer, thanks!
Hi, your work is amazing and thank you for sharing it.
If the prediction of action is considering "Clip Duration" of frames, how can the network trained using different batch size than the clip duration?
Thank you for your time.
The datastet annotations and pre-trained models can’t be downloaded. Can anyone share it?
Thanks!
Is there a way to predict only for one video and to get the respective annotations?
What steps do I need to take if I want to detect a certain action in my video data.
@okankop @wei-tim First of all thanks for open source and your great work , I had few queries when I try to run inference on cameras.
1.a little error need to be fixed.
https://github.com/wei-tim/YOWO/blob/master/model.py#L73
nn.Conv2d(1024, 5*(opt.n_classes+4+1), kernel_size=1, bias=False)
should be
nn.Conv2d(1024, 5*(int(opt.n_classes)+4+1), kernel_size=1, bias=False)
2.In the dataloader
https://github.com/wei-tim/YOWO/blob/master/dataset.py#L63
# (self.duration, -1) + self.shape = (8, -1, 224, 224)
clip = torch.cat(clip, 0).view((self.clip_duration, -1) + self.shape).permute(1, 0, 2, 3)
the input image clip is a 4-dims tensor with shape (3,D,H,W) .
However,when model forward,the last frame of the clip is sliced from the input by input[:, :, -1, :, :]
seems to be that the clip is a 5-dims tensor with shape (batch_size,3,D,H,W)
https://github.com/wei-tim/YOWO/blob/master/model.py#L81,
def forward(self, input):
x_3d = input # Input clip
x_2d = input[:, :, -1, :, :] # Last frame of the clip that is read
Thank you for your open source!
I'm interested in the activations maps for 2D and 3D backbones of the trained model,but relevant code is not open source.Can you share the Cam code
Thanks for your code sharing,when I run
python train.py --dataset ucf101-24 --data_cfg cfg/ucf24.data --cfg_file cfg/ucf24.cfg --n_classes 24 --backbone_3d resnext101 --backbone_2d darknet --backbone_2d_weights weights/yolo.weights - -resume_path weights/yowo_ucf101-24_16f_best.pth
shown in terminal
RuntimeError: Error(s) in loading state_dict for YOWO:
Missing key(s) in state_dict: "backbone_2d.models.0.conv1.weight", "backbone_2d.models.0.bn1.weight",
In the code, why is it when doing NMS, the confidence (detection confidence times class confidence) is substracted from 1 first? why is this flipping necessary?
Thank you for your sharing code!
But I found there are no labels of J-HMDB 21 dataset in the link.
Could you release the J-HMDB21 labels like 'labels/video_name/img_idx.txt' (which is needed by the training code.)
So that we can run the training code of J-HMDB 21 directly!
Thanks for your open source, but I'm confused about the environment requirements.
Could you please tell me what's the environments needed for running YOWO, especially the pytorch version?
Cause I didn't find anything about it in READ.ME
Thanks a lot
When I train ucf24, I get the following error:
Traceback (most recent call last):
File "train.py", line 322, in
train(epoch)
File "train.py", line 180, in train
loss = region_loss(output, target)
File "/home/chase/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/chase/YOWO/region_loss.py", line 229, in forward
nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
File "/home/chase/YOWO/region_loss.py", line 126, in build_targets
iou = bbox_iou(gt_box, pred_box, x1y1x2y2=False) # best_iou
File "/home/chase/YOWO/utils.py", line 72, in bbox_iou
cw = w1 + w2 - uw
RuntimeError: Expected object of type torch.DoubleTensor but found type torch.FloatTensor for argument #3 'other'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.