Giter Site home page Giter Site logo

gurkirt / realtime-action-detection Goto Github PK

View Code? Open in Web Editor NEW
318.0 23.0 97.0 117 KB

This repository host the code for real-time action detection paper

License: Other

Python 36.19% MATLAB 63.81%
action-recognition action-detection ssd pytorch online real-time ucf101 detection

realtime-action-detection's Introduction

Real-time online Action Detection: ROAD

NEWS Feb 2021: I released 3D-RetineNet, which is purly pytorch and python code. Linking and trimming is also implemented in python, however on actioness scores. Hope on to 3D-RetineNet repo, it might be more usful than this one for UCF24 dataset in some cases.

An implementation of our work (Online Real-time Multiple Spatiotemporal Action Localisation and Prediction) published in ICCV 2017.

Originally, we used Caffe implementation of SSD-V2 for publication. I have forked the version of SSD-CAFFE which I used to generate results for paper, you try that if you want to use caffe. You can use that repo if like caffe other I would recommend using this version. This implementation is bit off from original work. It works slightly, better on lower IoU and higher IoU and vice-versa. Tube generation part in original implementations as same as this. I found that this implementation of SSD is slight worse @ IoU greater or equal to 0.5 in context of the UCF24 dataset.

I decided to release the code with PyTorch implementation of SSD, because it would be easier to reuse than caffe version (where installation itself could be a big issue). We build on Pytorch implementation of SSD by Max deGroot, Ellis Brown. We made few changes like (different learning rate for bias and weights during optimization) and simplified some parts to accommodate ucf24 dataset.

Upgrade to pytorch 1.2

The previous version was in pytorch 0.2. The current one works on pytorch 1.2. Not everything is verified with current except single stream rgb training and testing, but everything should work alright. Here is the link to the previous version.

Table of Contents

Installation

  • Install PyTorch(version v0.3) by selecting your environment on the website and running the appropriate command.
  • Please install cv2 as well. I recommend using anaconda 3.6 and it's opnecv package.
  • You will also need Matlab. If you have distributed computing license then it would be faster otherwise it should also be fine. Just replace parfor with simple for in Matlab scripts. I would be happy to accept a PR for python version of this part.
  • Clone this repository.
    • Note: We currently only support Python 3+ with Pytorch version v0.2 on Linux system.
  • We currently only support UCF24 with revised annotaions released with our paper, we will try to add JHMDB21 as soon as possible, but can't promise, you can check out our BMVC2016 code to get started your experiments on JHMDB21.
  • To simulate the same training and evaluation setup we provide extracted rgb images from videos along with optical flow images (both brox flow and real-time flow) computed for the UCF24 dataset. You can download it from my google drive link
  • We also support Visdom for visualization of loss and frame-meanAP on subset during training.
    • To use Visdom in the browser:
    # First install Python server and client 
    pip install visdom
    # Start the server (probably in a screen or tmux)
    python -m visdom.server --port=8097
    • Then (during training) navigate to http://localhost:8097/ (see the Training section below for more details).

Dataset

To make things easy, we provide extracted rgb images from videos along with optical flow images (both brox flow and real-time flow) computed for ucf24 dataset, you can download it from my google drive link. It is almost 6Gb tarball, download it and extract it wherever you going to store your experiments.

UCF24DETECTION is a dataset loader Class in data/ucf24.py that inherits torch.utils.data.Dataset making it fully compatible with the torchvision.datasets API.

Training SSD

  • Requires fc-reduced VGG-16 model weights, weights are already there in dataset tarball under train_data subfolder.
  • By default, we assume that you have downloaded that dataset.
  • To train SSD using the training script simply specify the parameters listed in train-ucf24.py as a flag or manually change them.

Let's assume that you extracted dataset in /home/user/ucf24/ directory then your train command from the root directory of this repo is going to be:

CUDA_VISIBLE_DEVICES=0 python3 train-ucf24.py --data_root=/home/user/ucf24/ --save_root=/home/user/ucf24/ 
--visdom=True --input_type=rgb --stepvalues=30000,60000,90000 --max_iter=120000

To train of flow inputs

CUDA_VISIBLE_DEVICES=0 python3 train-ucf24.py --data_root=/home/user/ucf24/ --save_root=/home/user/ucf24/ 
--visdom=True --input_type=brox --stepvalues=70000,90000 --max_iter=120000

Different parameters in train-ucf24.py will result in different performance

  • Note:
    • Network occupies almost 9.2GB VRAM on a GPU, we used 1080Ti for training and normal training takes about 32-40 hrs
    • For instructions on Visdom usage/installation, see the Installation section. By default, it is off.
    • If you don't like to use visdom then you always keep track of train using logfile which is saved under save_root directory
    • During training checkpoint is saved every 10K iteration also log it's frame-level frame-mean-ap on a subset of 22k test images.
    • We recommend training for 120K iterations for all the input types.

Building Tubes

To generate the tubes and evaluate them, first, you will need frame-level detection then you can navigate to 'online-tubes' to generate tubes using I01onlineTubes and I02genFusedTubes.

produce frame-level detection

Once you have trained network then you can use test-ucf24.py to generate frame-level detections. To eval SSD using the test script simply specify the parameters listed in test-ucf24.py as a flag or manually change them. for e.g.:

CUDA_VISIBLE_DEVICES=0 python3 test-ucf24.py --data_root=/home/user/ucf24/ --save_root=/home/user/ucf24/
--input_type=rgb --eval_iter=120000

To evaluate on optical flow models

CUDA_VISIBLE_DEVICES=0 python3 test-ucf24.py --data_root=/home/user/ucf24/ --save_root=/home/user/ucf24/
--input_type=brox --eval_iter=120000

-Note

  • By default it will compute frame-level detections and store them as well as compute frame-mean-AP in models saved at 90k and 120k iteration.
  • There is a log file created for each iteration's frame-level evaluation.
Build tubes

You will need frame-level detections and you will need to navigate to online-tubes

Step-1: you will need to spacify data_root, data_root and iteration_num_* in I01onlineTubes and I02genFusedTubes;
Step 2: run I01onlineTubes and I02genFusedTubes in matlab this print out video-mean-ap and save the results in a .mat file

Results are saved in save_root/results.mat. Additionally,action-path and action-tubes are also stroed under save_root\ucf24\* folders.

  • NOTE: I01onlineTubes and I02genFusedTubes not only produce video-level mAP; they also produce video-level classification accuracy on 24 classes of UCF24.
frame-meanAP

To compute frame-mAP you can use frameAP.m script. You will need to specify data_root, data_root. Use this script to produce results for your publication not the python one, both are almost identical, but their ap computation from precision and recall is slightly different.

Performance

UCF24 Test

The table below is similar to table 1 in our paper. It contains more info than that in the paper, mostly about this implementation.

IoU Threshold = 0.20 0.50 0.75 0.5:0.95 [email protected] accuracy(%)
Peng et al [3] RGB+BroxFLOW 73.67 32.07 00.85 07.26 -- --
Saha et al [2] RGB+BroxFLOW 66.55 36.37 07.94 14.37 -- --
Singh et al [4] RGB+FastFLOW 70.20 43.00 14.10 19.20 -- --
Singh et al [4] RGB+BroxFLOW 73.50 46.30 15.00 20.40 -- 91.12
This implentation[4] RGB 72.08 40.59 14.06 18.48 64.96 89.78
This implentation[4] FastFLOW 46.32 15.86 00.20 03.66 22.91 73.08
This implentation[4] BroxFLOW 68.33 31.80 02.83 11.42 47.26 85.49
This implentation[4] RGB+FastFLOW (boost-fusion) 71.38 39.95 11.36 17.47 65.66 89.78
This implentation[4] RGB+FastFLOW (union-set) 73.68 42.08 12.45 18.40 61.82 90.55
This implentation[4] RGB+FastFLOW(mean fusion) 75.48 43.19 13.05 18.87 64.35 91.54
This implentation[4] RGB+BroxFLOW (boost-fusion) 73.34 42.47 12.23 18.67 68.31 90.88
This implentation[4] RGB+BroxFLOW (union-set) 75.01 44.98 13.89 19.76 64.97 90.77
This implentation[4] RGB+BroxFLOW(mean fusion) 76.43 45.18 14.39 20.08 67.81 92.20
Kalogeiton et al. [5] RGB+BroxFLOW (stack of flow images)(mean fusion) 76.50 49.20 19.70 23.40 69.50 --
Discussion:

Effect of training iterations: There is an effect due to the choice of learning rate and the number of iterations the model is trained. If you train the SSD network on initial learning rate for many iterations then it performs is better on lower IoU threshold, which is done in this case. In original work using caffe implementation of SSD, I trained the SSD network with 0.0005 learning rate for first 30K iterations and dropped then learning rate by the factor of 5 (divided by 5) and further trained up to 45k iterations. In this implementation, all the models are trained for 120K iterations, the initial learning rate is set to 0.0005 and learning is dropped by the factor of 5 after 70K and 90K iterations.

Kalogeiton et al. [5] make use mean fusion, so I thought we could try in our pipeline which was very easy to incorporate. It is evident from above table that mean fusion performs better than other fusion techniques. Also, their method relies on multiple frames as input in addition to post-processing of bounding box coordinates at tubelet level.

Real-time aspect:

This implementation is mainly focused on producing the best numbers (mAP) in the simplest manner, it can be modified to run faster. There few aspect that would need changes:

  • NMS is performed once in python then again in Matlab; one has to do that on GPU in python
  • Most of the time spent during tube generations is taken by disc operations; which can be eliminated completely.
  • IoU computation during action path is done multiple time just to keep the code clean that can be handled more smartly

Contact me if you want to implement the real-time version. The Proper real-time version would require converting Matlab part into python. I presented the timing of individual components in the paper, which still holds true.

Online-Code

Thanks to Zhujiagang, a matlab version of online demo video creation code is available under matlab-online-display directory.

Also, Feynman27 pushed a python version of the incremental_linking to his fork of this repo at: https://github.com/Feynman27/realtime-action-detection

Extras

To use pre-trained model download the pre-trained weights from the links given below and make changes in test-ucf24.py to accept the downloaded weights.

Download pre-trained networks
  • Currently, we provide the following PyTorch models:
    • SSD300 trained on ucf24 ; available from my google drive
      • appearence model trained on rgb-images (named rgb-ssd300_ucf24_120000)
      • accurate flow model trained on brox-images (named brox-ssd300_ucf24_120000)
      • real-time flow model trained on fastOF-images (named fastOF-ssd300_ucf24_120000)
  • These models can be used to reproduce above table which is almost identical in our paper

TODO

  • Incorporate JHMDB-21 dataset
  • Convert matlab part into python (happy to accept PR)

Citation

If this work has been helpful in your research please consider citing [1] and [4]

  @inproceedings{singh2016online,
    title={Online Real time Multiple Spatiotemporal Action Localisation and Prediction},
    author={Singh, Gurkirt and Saha, Suman and Sapienza, Michael and Torr, Philip and Cuzzolin, Fabio},
    jbooktitle={ICCV},
    year={2017}
  }

References

  • [1] Wei Liu, et al. SSD: Single Shot MultiBox Detector. ECCV2016.
  • [2] S. Saha, G. Singh, M. Sapienza, P. H. S. Torr, and F. Cuzzolin, Deep learning for detecting multiple space-time action tubes in videos. BMVC 2016
  • [3] X. Peng and C. Schmid. Multi-region two-stream R-CNN for action detection. ECCV 2016
  • [4] G. Singh, S Saha, M. Sapienza, P. H. S. Torr and F Cuzzolin. Online Real time Multiple Spatiotemporal Action Localisation and Prediction. ICCV, 2017.
  • [5] Kalogeiton, V., Weinzaepfel, P., Ferrari, V. and Schmid, C., 2017. Action Tubelet Detector for Spatio-Temporal Action Localization. ICCV, 2017.
  • Original SSD Implementation (CAFFE)
  • A huge thanks to Max deGroot, Ellis Brown for Pytorch implementation of SSD

realtime-action-detection's People

Contributors

gurkirt avatar zhujiagang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

realtime-action-detection's Issues

Inquiry about runtime

Hi,

I had a question about the reported runtime. Did you use any batch processing or is the reported number using only one frame at a time ? Additionally, the runtime for DIS-Fast you reported is 7 ms while it in the Flownet:2.0 paper they reported 70ms. I am aware you used CPU multiprocessing, but was the parallelization applied to different images or to the same image in order to get a 10x speed up ?

Thanks.

Testing Pre-trained models

Hi, thanks for the great work!

I was testing the model using pre-trained weights and came across this error-
Screenshot from 2019-11-11 14-47-11

How to solve this error? Thank you.

Pytorch 0.2 doesn't work with newest Visdom

When getting the training started, I get something along the lines of

  File "/scratch1/anaconda2/envs/road/lib/python3.6/site-packages/visdom/__init__.py", line 1171, in line
    assert Y.ndim == 1 or Y.ndim == 2, 'Y should have 1 or 2 dim'
AttributeError: 'torch.DoubleTensor' object has no attribute 'ndim'

Looking into it (https://github.com/facebookresearch/visdom/blob/master/py/visdom/__init__.py), their support for PyTorch < 0.4 is deprecated. What version worked here?

confusion about temporal labelling

it seems like we can have a better video-level performance without the temporal labelling process. So why would you like to use the temporal labelling?

How could I load the trained model

Thank you for your excellent work. I wonder how could load the trained model you have uploaded, such as: brox-ssd300_ucf24_120000.pth

Best Wishes.

try to test the models, but got wrong scores, thus wrong box

i'm new to use pytorch. i try to use your pretrained model (rgb or fast-op) to test the performance. but the output of the nets seems not correct, the first col of the output conf_scores are big (nearly 0.99), and the scores of other cols are all very small (far less than 0.01). I use the UCF24 dataset and my pytorch code are:
img_data = torchvision.datasets.ImageFolder('/home/vision1/hdd/dataset/test_data/',
transform=transforms.Compose([
transforms.Scale(300),
transforms.CenterCrop(300),transforms.ToTensor()])
)
torch.set_default_tensor_type('torch.cuda.FloatTensor')
batch_size=20
conf_thresh = 0.01
nms_thresh=0.45
topk = 20
print(len(img_data))
data_loader = torch.utils.data.DataLoader(img_data, batch_size=batch_size,shuffle=False)
print(len(data_loader))

def show_batch(imgs):
grid = utils.make_grid(imgs,nrow=5)
plt.imshow(grid.numpy().transpose((1, 2, 0)))
plt.title('Batch from dataloader')

trained_model_path = '/home/vision1/hdd/models/rgb-ssd300_ucf24_120000.pth'

num_classes = 25 #7 +1 background
net = build_ssd(300, num_classes) # initialize SSD
net.load_state_dict(torch.load(trained_model_path))
net.eval()

net = net.cuda()
cudnn.benchmark = True
print('Finished loading model %d !' % 1)
torch.cuda.synchronize()

for i, (images, batch_y) in enumerate(data_loader):
width = images.size()[2]
height = images.size()[3]
print(i, images.size(), batch_y.size())
images = Variable(images.cuda())
batch_y = batch_y.cuda(async=True)
output = net(images)
print(len(output))
loc_data = output[0]
conf_preds = output[1]
prior_data = output[2]

the problem about training SSD

hello,
when i am tryring to train ssd ,there is an error about the type of tensor for objects is not suitable.
for instance,
File "/home/xlp/.local/lib/python3.5/site-packages/torch/tensor.py", line 320, in rdiv
return self.reciprocal() * other
RuntimeError: reciprocal is not implemented for type torch.cuda.LongTensorh
how should i solve this problem?thanks.

image or video demo

I have not found a way to generate a video demo. Can you give me a way to implement a video demo? Looking forward to your reply.

Problem in running

Hello,
Is CUDA9 gtx 1050 with 16 gb RAM and 8 gb GPU memory sufficient to run the code?
As i am getting cuda out of memory error even when i am running test-ucf24.py with pre-trained model.
Also pytorch version 0.3 is not available for windows.
Meanwhile can you please share the detections (google drive) obtained from test-ucf24.py. It would be a great help to proceed further.

Error running I01onlineTubes

Output:

Video List 01 :: /scratch1/road/ucf24/splitfiles/testlist01.txt
AnnotFile :: /scratch1/road/ucf24/splitfiles/finalAnnots.ma.mat
Image  Dir :: /scratch1/road/ucf24/rgb-images/
Detection Dir:: /scratch1/road/ucf24/detections/CONV-rgb-01-120000/
Actionpath Dir:: /scratch1/road/ucf24/actionPaths/CONV-rgb-01-120000-score-3-0010/
Tube Dir:: /scratch1/road/ucf24/actionTubes/CONV-rgb-01-120000-score-3-0010/
Get both lis is /scratch1/road/ucf24/splitfiles/testlist01.txt
done computing action paths
Error using load
Unable to read file '/scratch1/road/ucf24/splitfiles/finalAnnots.ma.mat'. No such file or directory.

Error in I01onlineTubes>gettubes (line 99)
    annot = load(dopts.annotFile);

Error in I01onlineTubes (line 65)
        result_cell = gettubes(opts);

Looking in /scratch1/road/ucf24/splitfiles/, there is finalAnnots.mat. Is this a typo?

Side question: How can I visualize the tubes?

the problem fo training SSD

I'm use pytorch 0.2,if this make it error:
RuntimeError: unable to write to file </torch_3690_106607550> at /pytorch/torch/lib/TH/THAllocator.c:271

the details is:
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 44, in _worker_loop
data_queue.put((idx, samples))
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 348, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 113, in reduce_storage
fd, size = storage.share_fd()
RuntimeError: unable to write to file </torch_3690_106607550> at /pytorch/torch/lib/TH/THAllocator.c:271

About inference

I am trying to test on my custom images but I didn't find the inference code.
Thanks

Convert matlab part into python query

Hello,

Thanks for making this code available - I'm looking forward to using it. I was surprised to see that you used Matlab.

You mention in your Todo that you plan to Convert matlab part into python. I was wondering if anyone has gotten started on doing this, and if there is a git repo for the conversion that I could contribute to.

Thanks again,

Evaluation metric

I have checked the paper and the code in utils/evaluation.py, if I understood the code correctly you are using frame mAP not video mAP. So does this mean that the reported mAP in the paper is for frame mAP ?

One Question about 'import build_ssd'

Hello, I just begun to learn to use pytorch, and met this problem when trying to execute the code:

Traceback (most recent call last):
File "test-ucf24.py", line 15, in
from ssd import build_ssd
File "/home/rfb/0RealTimeActionDetection/realtime-action-detection/ssd.py", line 205
mbox[str(size)], num_classes), num_classes)
SyntaxError: only named arguments may follow *expression

So, could anyone give me a hand? Thanks a lot!

Online code

I've run and read your code, an excellent work. Have you realized your online real-time testing code, that is, given a video, frames are fed into the model by order and corresponding detection results are displayed including bbx and predicted label? Vkalogeiton #1 has released his code totally by python but without online version, I plan to modify his code to realize online deploying.

Question about label/

Hello,
I notice that the label files do not match the RGB images. For example, there are 141 RGB images in v_Basketball_g01_c01 file, but only 43 label files in label/ range 09 to 51. Can you explain the structure of train data?
Thanks.

problem with loading pretrained weights for custom dataset with different number of classes

Hi,
Thank you for providing the code. I am working on a dataset which has 11 classes (10 + 1background). However, I wish to use your pretrained weights and finetune it on my dataset. Unfortunately, when I initialize the model, it does not load the weights correctly. Could you kindly share your view on how to load your trained model for a different number of classes?
Thanks.

the problem training ssd

Thank you very much for repling my questionf:

Process Process-2:
Process Process-3:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 44, in _worker_loop
data_queue.put((idx, samples))
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 348, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 113, in reduce_storage
fd, size = storage.share_fd()
RuntimeError: unable to write to file </torch_2378_106607550> at /pytorch/torch/lib/TH/THAllocator.c:271
Traceback (most recent call last):
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 44, in _worker_loop
data_queue.put((idx, samples))
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 348, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 113, in reduce_storage
fd, size = storage.share_fd()
RuntimeError: unable to write to file </torch_2379_3620478786> at /pytorch/torch/lib/TH/THAllocator.c:271
Process Process-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 44, in _worker_loop
data_queue.put((idx, samples))
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 348, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 113, in reduce_storage
fd, size = storage.share_fd()
RuntimeError: unable to write to file </torch_2377_106607550> at /pytorch/torch/lib/TH/THAllocator.c:271

pytorch=1.2.0 ,I got a error. I hope to get your help

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 512, 38, 38]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

image

image

Training with other dataset.

Hello,
This project trained with UCF24 dataset with 320x240 images.
Can I combine this dataset with one more class which will be crawled from Youtube, 1280x720 images?
I think it is possible because all the images will be resize to 300x300 for the input to the SSD model.
But I saw some unexpected changes in the annotation file, pyannot.pkl
And can we train the SSD with only one class? I don't know whether the performance will be decreased or not.
Please give me some advice.
Thank you,
Khanh

Temproal labeling

  1. Hi, do you think your temporal labeling also works for multiple-label per-frame results or not?
  2. I found the tube results visualization code in the folder 'matlab-online-display' didn't include temporal labeling.

Info related to mean, nwsum-plus, and cat fusion

Hi, @gurkirt , thank you for sharing the source codes, I am running and reading the online-tubes part, and want to consult few questions about the codes.

  1. fusion methods: Does 'nwsum-plus' and 'cat' separately refers to 'boost-fusion' and 'union-set' as in your ICCV paper? And does the 'mean' fusion has the same meaning as in Kalogeiton et al. [5] ? (averaging scores of anchors from the two-stream, but retaining boundingbox generated by RGB.).

  2. It seem that there are some minor problems relating to the hard-coded parameters:
    2.1 In script test-ucf24.py, line 188, "args.listid = '099'" seems to be "args.listid = '01'" for UCF24 dataset ?
    2.2 In script initDatasetOpts.m, line 45 and initDatasetOptsFused.m, line 74, "annots.mat" seems to be "finalAnnots.mat" ?

How did you visualize your optical flow picture

I am trying to compare my generated optical flow pictures with your provided ground truth ones. I want to know how should i convert my x, y-direction 2-channel matrix to a 3-channel RGB visualized picture.

problems about test just useing one action of ucf24 which you provided

Hello,when I use one action of ucf24 to detet. I changed the data/ucf24.py :class={'BasketBall'}
and I have the problem:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 482, in load_state_dict
own_state[name].copy_(param)
RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/THCTensorCopy.cu:31

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test-ucf24.py", line 224, in
main()
File "test-ucf24.py", line 198, in main
net.load_state_dict(torch.load(trained_model_path))
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 487, in load_state_dict
.format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named conf.0.weight, whose dimensions in the model are torch.Size([44, 512, 3, 3]) and whose dimensions in the checkpoint are torch.Size([100, 512, 3, 3]).

Detection box

Not really an issue, but I was wondering if the detection box is supposed to be this big as opposed to the ground truth (red box)? I just applied your trained model and was able to achieve the results quantitatively, but this doesn't look like the one in your video demo.

Screen Shot 2019-05-13 at 11 39 51 AM

Real time implementation

Hi, thank you for your work.
I'm interested to try the real-time implementation aspect of your work. How could I obtain the code, if possible? Thank you.

Appearance SSD result

Thank you for sharing your work.
The following animated GIF file is the sampled images with bounding boxes of detected actions.

https://komputervision.files.wordpress.com/2019/01/bbox.gif

The detection is done by rgb-ssd300_ucf24_120000.pth (the pre-trained appearance SSD weights for 120000 iteration which you provided) with following commands.

CUDA_VISIBLE_DEVICES=0 python3 test-ucf24.py --data_root=/home/user/ucf24/ --save_root=/home/user/ucf24/ --input_type=rgb --eval_iter=120000

As you can see, its detection rate is very low except few classes such as 'surfing' and 'jetski'. Is this normal ? It seems that it fails to make big candidate bounding boxes when a person is so close to the camera that he/she appears as pretty big. It only works well for 'surfing' and 'jetski' where the people in the images are small enough.
Of course I have to go thru the remaining process which is fusion with optical flow SSD and tube generation. However, I was wondering if I am doing wrong or 120000 iteration is not enough, so more training is needed.

Following is the average precisions of appearance SSD for each class on 159289 test images.

Basketball : 0.005688362254987029
BasketballDunk : 0.059442425139795585
Biking : 0.027080132217020384
CliffDiving : 0.09571676832006965
CricketBowling : 0.007188381472302061
Diving : 0.0027860880597105134
Fencing : 0.00010385603971070008
FloorGymnastics : 0.10125374519594244
GolfSwing : 0.0
HorseRiding : 0.012732889758008315
IceDancing : 1.0259629201686981e-05
LongJump : 0.018733521925503246
PoleVault : 0.011395145119329191
RopeClimbing : 0.022515197847725814
SalsaSpin : 0.0
SkateBoarding : 0.033668873961642254
Skiing : 0.35775074128159023
Skijet : 0.5063684722953027
SoccerJuggling : 0.0
Surfing : 0.6024563891531864
TennisSwing : 0.06753013234296809
TrampolineJumping : 2.3870315645686263e-06
VolleyballSpiking : 0.000479302362317051
WalkingWithDog : 0.0002781709791225087

MEANAP:::=>0.08054922

How to detect action on new (unseen) video frames

Can anyone help me how to detect an action and classify into a class (from given 24 classes) for a new (unseen) video frames. I've gone through the code it does it for UCF-24 dataset.. there was no option to provide our data,... and code seems to be tailored to laod UCF-24.

Frame ratios for each class

#4500ratios = np.asarray([1.1, 0.8, 4.7, 1.4, 0.9, 2.6, 2.2, 3.0, 3.0, 5.0, 6.2, 2.7,

How do you change the above class ratios to the following?
ratios = np.asarray([1.03, 0.75, 4.22, 1.32, 0.8, 2.36, 1.99, 2.66, 2.68, 4.51, 5.56, 2.46, 3.17, 2.76, 3.89, 2.28, 4.01, 3.08, 6.06, 3.28, 1.51, 3.05, 0.6, 3.84])

Is there any performance improvment after this change?

New dataset

Hello @gurkirt , thank for your code. I have few question about the code.
1.
What is stored in pyannot.pkl on ucf24 dataset?
I saw the dictionaries but don't really know what its meaning.
2.
How to use my own dataset instead of ucf24 to run the code you gave ?
My dataset includes few folders and a lot of pictures in each folder.

Question on ground truth for action recognition

Hi @gurkirt ,

Firstly, thank you for publishing your work.

Second, I have a question and I hope we can discuss it here.

  1. In almost all recent paper on action recognition topic, they resized the original frame image from (320*240*3) into other sizes, for yours is (300*300*3).

That makes me wonder what about the ground truth? The ground truth which is 4 number of pixel coordinates, is for original video dataset (320*240*3) and when resizing the image, did you also resized the ground truth into (300*300*3) form as well or use the original one? Because their algorithms usually based on the ground truth overlapping for evaluating the accuracy, so I think it will make a big difference in the result.
By the way, do you know which version for the ground truth they used in their work for UCF-101 24 classes and other datasets? I am asking because I found UCF-101 has 2 version of the ground truth: one is the original and the other one was remade by you for Thumos 2015. This gets me confused.

  1. Could you explain more about the training SSD step. What is purpose of the training step?
    Your work used the pre-trained Vgg-16 on VOC (21 classes) to detect for UCF-24 (24 classes), not to detect directly the person?

Thank you.

About reported results

Hi. I'm just confused about your reported results. It says in your paper that using RGB images only, the results are as follow:
@0.2: 69.8, @0.5:40.9, @.75: 15.5, @AvG: 18.7
But from the table in this repo:
This implentation[4] RGB: @0.2: 72.08, @0.5: 40.59, @0.75: 14.06, @AvG: 18.48

Why is that so?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.