wei-tim / yowo Goto Github PK

View Code? Open in Web Editor NEW

831.0 831.0 161.0 104.3 MB

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

Python 100.00%

yowo's Issues

Train on jhmdb21

Hello, thank you for your excellent work.

I have some problems when training YOWO on jhmdb21 dataset:

How to convert jhmdb21 Ground Truth (joint_positions) .mat file to .txt just like ucf24/labels/ ?
How to get video_mAP? I have seen the video_mAP.py source code and find that I don't have "testlist_video.txt" file. How to generate such a file? or can you share yours?
testlist = os.path.join(base_path, 'testlist_video.txt')
In train.py, is locolization or localization? which one is right?
locolization_recall = 1.0 * total_detected / (total + eps) print("Classification accuracy: %.3f" % classification_accuracy) print("Locolization recall: %.3f" % locolization_recall)

error on evaluate UCF dataset

Dear Sir:

when i evaluate the mAP on ucf dataset, get the following error:

print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 236, in evaluate_videoAP
pred_videos_format = imagebox_to_videts(all_boxes, CLASSES)
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 213, in imagebox_to_videts
preVideo = os.path.dirname(keys[0])
IndexError: list index out of range

How to run the model on webcam

I want to test the code on the webcam or on a chosen video to generate videos like the shown the folder examples.
How to do that?

error with original video

I got following error:

(base) yu@64:~/cv2/YOWO$ python video_mAP.py --dataset ucf101-24 --data_cfg cfg/ucf24.data --cfg_file cfg/ucf24.cfg --n_classes 24 --backbone_3d resnext101 --backbone_2d darknet --resume_path /home/yuyonod/cv2/YOWO/models/yowo_ucf101-24_16f_best_fmap_08749.pth
DataParallel(
(module): YOWO(
(backbone_2d): Darknet(
(models): ModuleList()

===================================================================
loading checkpoint /home/yuyonod/cv2/YOWO/models/yowo_ucf101-24_16f_best_fmap_08749.pth

/home/yuyonod/cv2/YOWO/test.avi

iou is: 0.05
Traceback (most recent call last):
File "video_mAP.py", line 334, in
video_mAP_ucf()
File "video_mAP.py", line 235, in video_mAP_ucf
print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 236, in evaluate_videoAP
pred_videos_format = imagebox_to_videts(all_boxes, CLASSES)
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 213, in imagebox_to_videts
preVideo = os.path.dirname(keys[0])
IndexError: list index out of range

Do you fix this problem? and Also is it possible to action localize with original video beyond the dataset?

train annotation

Hi ,I want to know whether the train annotations label's format is [classes,x,y,w,h] or [classes,xmin,ymin,xmax,ymax]？

Runtime Question

Thankfully, I have sucessed run train demo, but i want to know how can i see the rest time about training?

what's the input shape of model?

Try feed into a frame with shape: torch.Size([1, 3, 224, 224]) got error:

   output = model(data)
  File "E:\apps\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\\YOWO\model.py", line 81, in forward
    x_2d = input[:, :, -1, :, :] # Last frame of the clip that is read
IndexError: too many indices for tensor of dimension 4

Also tried permute make channel last but still got this.

Does input needs concate current frame and the frame before current togather when inference???

When can you opensource YOWO？hah

How can I make my own dataset?

Can I maek my own dataset and how to do it?

How to perform muti-scale training?

Hi, can you share the scales you used in trainning on jhmdb and ucf24?

such as:
jhmdb21: [112, 224, 448]
ucf24: [224, 256, 320]

no training dataset annotations

I download the dataset annotations from the link in README, but I find that there are only testing dataset annotations, no annotations of tranlist.txt. Can you check it out ? Thank you.

[discuss] Can you share the extra tricks you use?

I have train yowo on ucf101-24 and jhmdb21 datasets, the best frame_mAP I can achieve is 84.3% and
70.9% for UCF101-24 and J-HMDB-21, which are far behind the accuracy of your report. And I see following words:

NOTE: With some extra tricks, YOWO achieves 87.5% and 76.7% frame_mAP for UCF101-24 and J-HMDB-21 datasets, respectively.

Can you share the extra tricks you use?

about own data

First of all ,Thanks for your project ，I am interested in this project,please show your label If it's convenient for you

Can you please share the pretrained model

Hello, I appreciate for your nice work and potential shared code.
It would be nice if you would share the pretrained model either.
And It would be wonderful if your code accept extra data such as original video and webcam for action localization.
Thank you in advance.

Result visualization

I want to ask how do you make the GIF visualization of the result?

Some puzzle about CAFM

In the paper chapter3.1.3, formula (4): ，why M is transpose ?
In my opinion, is j-th feature's impact on i-th feature, so after including this impact, the i-th feature should be . And M may needn't transpose.

Question about the loss function.

I am wondering how do you calculate the loss function.
I know it is the sum of 6 aspects, but i am a bit confused on how it is calculated.

video_mAP

1.When I want to calculate video_map, I did not find the testlist_video.txt file.

2.I generated one myself, the format in it is like walk/50_FIRST_DATES_walk_f_cm_np1_le_med_33，it can run begin, but with the following error：

iou is: 0.05
Traceback (most recent call last):
File "video_mAP.py", line 327, in
video_mAP_jhmdb()
File "video_mAP.py", line 320, in video_mAP_jhmdb
print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/chase/YOWO/eval_results.py", line 244, in evaluate_videoAP
ap = video_ap_one_class(gt, pred_cls, iou_thresh, bTemporal, cls_len)
File "/home/chase/YOWO/eval_results.py", line 155, in video_ap_one_class
iou = np.array([iou3dt(np.array(g), boxes[:, :5]) for g in gt_this])
File "/home/chase/YOWO/eval_results.py", line 155, in
iou = np.array([iou3dt(np.array(g), boxes[:, :5]) for g in gt_this])
File "/home/chase/YOWO/utils.py", line 195, in iou3dt
return iou3d( b1[np.where(b1[:,0]==tmin)[0][0]:np.where(b1[:,0]==tmax)[0][0]+1,:] , b2[np.where(b2[:,0]==tmin)[0][0]:np.where(b2[:,0]==tmax)[0][0]+1,:] ) * temporal_inter / temporal_union
File "/home/chase/YOWO/utils.py", line 184, in iou3d
assert b1.shape[0] == b2.shape[0]
AssertionError

train error

when i train the ucf24,it occurred,can you give me some advice? thank you
Traceback (most recent call last):
File "train.py", line 322, in
train(epoch)
File "train.py", line 180, in train
loss = region_loss(output, target)
File "/home/zc/anaconda3/envs/torch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/zc/coode/YOWO/region_loss.py", line 230, in forward
nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
File "/home/zc/coode/YOWO/region_loss.py", line 127, in build_targets
iou = bbox_iou(gt_box, pred_box, x1y1x2y2=False) # best_iou
File "/home/zc/coode/YOWO/utils.py", line 74, in bbox_iou
cw = w1 + w2 - uw
RuntimeError: Expected object of type torch.DoubleTensor but found type torch.FloatTensor for argument #3 'other'

Question about frame mAP evaluation

Thanks for your awesome project. I want to evaluate the frame mAP performance as you offered in the paper. Could you please tell me which function is for frame mAP evaluation?

JHMDB dataset missing labels

Thank you for your open source！
I want to use J-HMDB-21 to train the model. However, from your link here, I can only get rgb-images but without labels. I wonder if I miss anything or if the link is incorrect.
Hope to get your answer, thanks a lot!

Question regarding the paper

Hi! Thanks for open sourcing the implementation. This is my first paper i read in spatio-temporal action localization.

I am aware that this might be not the best place to ask about the paper, but i cant seem to find other ways to contact (i.e, email address) in the paper so i decided to make an issue.

So i've read:

the action tube paper, and
i also read issue #21, and
seeing that in the paper, the classification is associated with each bounding boxes

And from that, i make 1 conclusion :
That the whole linking thing is all about associating detections of one or more actions in one frame into one or more actions in subsequent frames

So does this mean it is possible for multiple actions of different classes to be detected at once?

I am sorry if this is such a silly question, i can't seem to find the statement above being explicitly stated.

Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed

Wanted to reduce num of classes to 8 but after starting train have this error.

void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [1,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "train.py", line 329, in <module>
    train(epoch)
  File "train.py", line 180, in train
    loss = region_loss(output, target)
  File "/home/nkise/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nkise/Documents/neuron/YOWO/region_loss.py", line 259, in forward
    loss_cls = self.class_scale * FL(cls, tcls)
  File "/home/nkise/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nkise/Documents/neuron/YOWO/FocalLoss.py", line 58, in forward
    self.alpha = self.alpha.cuda()
RuntimeError: CUDA error: device-side assert triggered

First of all thnaks for open sourcing your work and doing a great work. should I do to test with a video and get the detection results?

How is it supposed to do inference?

Hi,

I have been trying to do inference on a video and I found myself lost. I have read that it is necessary to do inference in frames instead of videos, is it true?

Where is the script to run inference?

Thanks you your time,

Miguel.

Cannot start training

Thank you very much for sharing your research!
I am trying to follow your README.md and want to do training.
But I have this error. I am guessing maybe I have different version of library?
I am really sorry to disturb you, but could you please give me some help?

`2020-02-05 21:44:41 training at epoch 1, lr 0.000100
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\Thesis\Master\YOWO-master\train.py", line 322, in
train(epoch)
File "F:\Thesis\Master\YOWO-master\train.py", line 170, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.`

There ara no BoundingBox and BoundingBoxes package

In YOWO/evaluation/Object-Detection-Metrics/pascalvoc.py file,
from BoundingBox import BoundingBox
from BoundingBoxes import BoundingBoxes
but no BoundingBox and BoundingBoxes package

What is the function of clause 'if seen < 12800' in region_loss.py?

Hi sir, in region_loss.py, function 'build_targets', there is a judgment clause: 'if seen < 12800' and set groundtruth. I do really not understand the function of it. Can you give me some suggestions? Thank you.

expect your opensource...

Change clip size

Hi,
I have an Error while changing a clip_size in cfg:
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 4 and 5 at /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/THC/generic/THCTensorMath.cu:62

Is there any other configuration that I need to edit?
Thank You

how it perfrom in AVA dataset?

thanks for open source code.

and i wonder can YOWO run on the AVA dataset?

Infernce pipeline

@okankop @wei-tim thanks for open sourcing the code base , have few queries

do we have a inference pipeline to test on few sets of images or should i use "run_video_mAP_jhmdb.sh" files
can i test on an offline video recorded by myself if so what is the command
in your "run_video_mAP_jhmdb.sh" script file there is a param "--backbone_3d resnext101 \ --backbone_2d darknet \ " should we pass both backbone_3d and 2d models ??
you have asked to download the model files right can you please specifiy in the reference folder which one to dowload since there are different models with same name
can we run the code on cpu , is so should we specifiy in the command prompt
thansk in advance

About linking strategy for action cube

Thanks for sharing the nice work!
I modify the script 'train.py' and use it as a demo for inference. Now I can get frame-level result, but I still have some problem:

Where is the linking strategy that your paper introduced in chapter3.3? I can't find it in code.
What does the action-cube mean? What is it used for? I see it as the start and end time of the action instance, is that right?

ValueError: invalid literal for int() with base 10: 'v_Sur'

When using the command stated in the README file I got the following error:

2020-01-28 19:01:54 training at epoch 1, lr 0.000100
Traceback (most recent call last):
  File "train.py", line 322, in <module>
    train(epoch)
  File "train.py", line 170, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
    return self._process_data(data)
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/bt/Desktop/YOWO/dataset.py", line 53, in __getitem__
    clip, label = load_data_detection(self.base_path, imgpath,  self.train, self.clip_duration, self.shape, self.dataset_use, jitter, hue, saturation, exposure)
  File "/home/bt/Desktop/YOWO/clip.py", line 161, in load_data_detection
    im_ind = int(im_split[num_parts-1][0:5])
ValueError: invalid literal for int() with base 10: 'v_Flo'

Do you have any idea why this occur?

I have been digging into the code and I just can't get it to work. I understand the error itself, but not its origin.

Thank you for your time.

abount backbone2d

Why don't you use yolov3 or yolov3 tiny as backbone2d? Backbone2d network can be replaced with them?

def test in train.py

in the def test do you do any optimization or training?

if not then i can use it for trying the model I had saved?

Training and Inference

@okankop @wei-tim First of all thnaks for open sourcing your work and doing a great work , i had few queries

can we run the inference pipeline on CPU
can we add the architectures like efficientnet ,centernet retinanet to ur 3dCNN backbone
to train it on the custom dataset what is the resolution required
have you tested this model on real time video and what is ur fb on this

thanks in advance

Save checkpoints while training

Does the model saving checkpoints at the training? It seems that I need to start evaluation for it...

some problems

Thank you in advance for sharing your research！

I want to download ucf101-24 data to see it, but I can't get it from the link，Can you provide it to Baiduyun Disk together?
YOWO's input is a continuous frame of motion, not a video clip? How is the key frame of the 2D network input selected，Is it the last frame of the video?
I use the labelme tool to label my own video action clips. First, this video is cut into continuous frames，these frames are used as input to the network. Annotate each frame to get the corresponding json file and then converted to txt format as label. Is that right？

I am so sorry for my too many questions. I didn't understand it at the beginning of the research, hope to get your answer, thanks!

Batch Size vs Clip Duration

Hi, your work is amazing and thank you for sharing it.

If the prediction of action is considering "Clip Duration" of frames, how can the network trained using different batch size than the clip duration?
Thank you for your time.

Download

The datastet annotations and pre-trained models can’t be downloaded. Can anyone share it?

Thanks！

Predicting for one video only

Is there a way to predict only for one video and to get the respective annotations?

How to detect certain actions in my own video data

What steps do I need to take if I want to detect a certain action in my video data.

some errors when try to run inference on cameras

@okankop @wei-tim First of all thanks for open source and your great work , I had few queries when I try to run inference on cameras.

1.a little error need to be fixed.
https://github.com/wei-tim/YOWO/blob/master/model.py#L73
nn.Conv2d(1024, 5*(opt.n_classes+4+1), kernel_size=1, bias=False)
should be
nn.Conv2d(1024, 5*(int(opt.n_classes)+4+1), kernel_size=1, bias=False)
2.In the dataloader
https://github.com/wei-tim/YOWO/blob/master/dataset.py#L63

        # (self.duration, -1) + self.shape = (8, -1, 224, 224)
        clip = torch.cat(clip, 0).view((self.clip_duration, -1) + self.shape).permute(1, 0, 2, 3)

the input image clip is a 4-dims tensor with shape (3,D,H,W) .
However,when model forward,the last frame of the clip is sliced from the input by input[:, :, -1, :, :]
seems to be that the clip is a 5-dims tensor with shape (batch_size,3,D,H,W)
https://github.com/wei-tim/YOWO/blob/master/model.py#L81,

    def forward(self, input):
        x_3d = input # Input clip
        x_2d = input[:, :, -1, :, :] # Last frame of the clip that is read

Can you share the Cam code

Thank you for your open source！
I'm interested in the activations maps for 2D and 3D backbones of the trained model,but relevant code is not open source.Can you share the Cam code

about train

Thanks for your code sharing,when I run
python train.py --dataset ucf101-24 --data_cfg cfg/ucf24.data --cfg_file cfg/ucf24.cfg --n_classes 24 --backbone_3d resnext101 --backbone_2d darknet --backbone_2d_weights weights/yolo.weights - -resume_path weights/yowo_ucf101-24_16f_best.pth
shown in terminal

RuntimeError: Error(s) in loading state_dict for YOWO:
        Missing key(s) in state_dict: "backbone_2d.models.0.conv1.weight", "backbone_2d.models.0.bn1.weight",

A bit confused about the NMS

In the code, why is it when doing NMS, the confidence (detection confidence times class confidence) is substracted from 1 first? why is this flipping necessary?

Release J-HMDB21 labels

Thank you for your sharing code!

But I found there are no labels of J-HMDB 21 dataset in the link.

Could you release the J-HMDB21 labels like 'labels/video_name/img_idx.txt' (which is needed by the training code.)

So that we can run the training code of J-HMDB 21 directly!

Environment requirements and Pytorch version

Thanks for your open source, but I'm confused about the environment requirements.
Could you please tell me what's the environments needed for running YOWO, especially the pytorch version?
Cause I didn't find anything about it in READ.ME
Thanks a lot

RuntimeError:

When I train ucf24, I get the following error：

Traceback (most recent call last):
File "train.py", line 322, in
train(epoch)
File "train.py", line 180, in train
loss = region_loss(output, target)
File "/home/chase/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/chase/YOWO/region_loss.py", line 229, in forward
nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
File "/home/chase/YOWO/region_loss.py", line 126, in build_targets
iou = bbox_iou(gt_box, pred_box, x1y1x2y2=False) # best_iou
File "/home/chase/YOWO/utils.py", line 72, in bbox_iou
cw = w1 + w2 - uw
RuntimeError: Expected object of type torch.DoubleTensor but found type torch.FloatTensor for argument #3 'other'

wei-tim / yowo Goto Github PK

yowo's Issues

=================================================================== loading checkpoint /home/yuyonod/cv2/YOWO/models/yowo_ucf101-24_16f_best_fmap_08749.pth

Recommend Projects

Recommend Topics

Recommend Org

===================================================================
loading checkpoint /home/yuyonod/cv2/YOWO/models/yowo_ucf101-24_16f_best_fmap_08749.pth