Giter Site home page Giter Site logo

yowo's People

Contributors

okankop avatar wei-tim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yowo's Issues

Some puzzle about CAFM

In the paper chapter3.1.3, formula (4): ,why M is transpose ?
In my opinion, is j-th feature's impact on i-th feature, so after including this impact, the i-th feature should be . And M may needn't transpose.

video_mAP

1.When I want to calculate video_map, I did not find the testlist_video.txt file.

2.I generated one myself, the format in it is like walk/50_FIRST_DATES_walk_f_cm_np1_le_med_33,it can run begin, but with the following error:

iou is: 0.05
Traceback (most recent call last):
File "video_mAP.py", line 327, in
video_mAP_jhmdb()
File "video_mAP.py", line 320, in video_mAP_jhmdb
print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/chase/YOWO/eval_results.py", line 244, in evaluate_videoAP
ap = video_ap_one_class(gt, pred_cls, iou_thresh, bTemporal, cls_len)
File "/home/chase/YOWO/eval_results.py", line 155, in video_ap_one_class
iou = np.array([iou3dt(np.array(g), boxes[:, :5]) for g in gt_this])
File "/home/chase/YOWO/eval_results.py", line 155, in
iou = np.array([iou3dt(np.array(g), boxes[:, :5]) for g in gt_this])
File "/home/chase/YOWO/utils.py", line 195, in iou3dt
return iou3d( b1[np.where(b1[:,0]==tmin)[0][0]:np.where(b1[:,0]==tmax)[0][0]+1,:] , b2[np.where(b2[:,0]==tmin)[0][0]:np.where(b2[:,0]==tmax)[0][0]+1,:] ) * temporal_inter / temporal_union
File "/home/chase/YOWO/utils.py", line 184, in iou3d
assert b1.shape[0] == b2.shape[0]
AssertionError

Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed

Wanted to reduce num of classes to 8 but after starting train have this error.

void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [1,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "train.py", line 329, in <module>
    train(epoch)
  File "train.py", line 180, in train
    loss = region_loss(output, target)
  File "/home/nkise/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nkise/Documents/neuron/YOWO/region_loss.py", line 259, in forward
    loss_cls = self.class_scale * FL(cls, tcls)
  File "/home/nkise/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nkise/Documents/neuron/YOWO/FocalLoss.py", line 58, in forward
    self.alpha = self.alpha.cuda()
RuntimeError: CUDA error: device-side assert triggered

Batch Size vs Clip Duration

Hi, your work is amazing and thank you for sharing it.

If the prediction of action is considering "Clip Duration" of frames, how can the network trained using different batch size than the clip duration?
Thank you for your time.

Release J-HMDB21 labels

Thank you for your sharing code!

But I found there are no labels of J-HMDB 21 dataset in the link.

Could you release the J-HMDB21 labels like 'labels/video_name/img_idx.txt' (which is needed by the training code.)

So that we can run the training code of J-HMDB 21 directly!

RuntimeError:

When I train ucf24, I get the following error:

Traceback (most recent call last):
File "train.py", line 322, in
train(epoch)
File "train.py", line 180, in train
loss = region_loss(output, target)
File "/home/chase/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/chase/YOWO/region_loss.py", line 229, in forward
nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
File "/home/chase/YOWO/region_loss.py", line 126, in build_targets
iou = bbox_iou(gt_box, pred_box, x1y1x2y2=False) # best_iou
File "/home/chase/YOWO/utils.py", line 72, in bbox_iou
cw = w1 + w2 - uw
RuntimeError: Expected object of type torch.DoubleTensor but found type torch.FloatTensor for argument #3 'other'

abount backbone2d

Why don't you use yolov3 or yolov3 tiny as backbone2d? Backbone2d network can be replaced with them?

Infernce pipeline

@okankop @wei-tim thanks for open sourcing the code base , have few queries

  1. do we have a inference pipeline to test on few sets of images or should i use "run_video_mAP_jhmdb.sh" files
  2. can i test on an offline video recorded by myself if so what is the command
  3. in your "run_video_mAP_jhmdb.sh" script file there is a param "--backbone_3d resnext101 \ --backbone_2d darknet \ " should we pass both backbone_3d and 2d models ??
  4. you have asked to download the model files right can you please specifiy in the reference folder which one to dowload since there are different models with same name
  5. can we run the code on cpu , is so should we specifiy in the command prompt
    thansk in advance

Change clip size

Hi,
I have an Error while changing a clip_size in cfg:
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 4 and 5 at /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/THC/generic/THCTensorMath.cu:62

Is there any other configuration that I need to edit?
Thank You

Can you please share the pretrained model

Hello, I appreciate for your nice work and potential shared code.
It would be nice if you would share the pretrained model either.
And It would be wonderful if your code accept extra data such as original video and webcam for action localization.
Thank you in advance.

about own data

First of all ,Thanks for your project ,I am interested in this project,please show your label If it's convenient for you

Cannot start training

Thank you very much for sharing your research!
I am trying to follow your README.md and want to do training.
But I have this error. I am guessing maybe I have different version of library?
I am really sorry to disturb you, but could you please give me some help?

`2020-02-05 21:44:41 training at epoch 1, lr 0.000100
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\Thesis\Master\YOWO-master\train.py", line 322, in
train(epoch)
File "F:\Thesis\Master\YOWO-master\train.py", line 170, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\envs\yowo_1\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.`

How to run the model on webcam

I want to test the code on the webcam or on a chosen video to generate videos like the shown the folder examples.
How to do that?

Train on jhmdb21

Hello, thank you for your excellent work.

I have some problems when training YOWO on jhmdb21 dataset:

  1. How to convert jhmdb21 Ground Truth (joint_positions) .mat file to .txt just like ucf24/labels/ ?

  2. How to get video_mAP? I have seen the video_mAP.py source code and find that I don't have "testlist_video.txt" file. How to generate such a file? or can you share yours?
    testlist = os.path.join(base_path, 'testlist_video.txt')

  3. In train.py, is locolization or localization? which one is right?
    locolization_recall = 1.0 * total_detected / (total + eps) print("Classification accuracy: %.3f" % classification_accuracy) print("Locolization recall: %.3f" % locolization_recall)

JHMDB dataset missing labels

Thank you for your open source!
I want to use J-HMDB-21 to train the model. However, from your link here, I can only get rgb-images but without labels. I wonder if I miss anything or if the link is incorrect.
Hope to get your answer, thanks a lot!

error on evaluate UCF dataset

Dear Sir:

when i evaluate the mAP on ucf dataset, get the following error:

print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 236, in evaluate_videoAP
pred_videos_format = imagebox_to_videts(all_boxes, CLASSES)
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 213, in imagebox_to_videts
preVideo = os.path.dirname(keys[0])
IndexError: list index out of range

About linking strategy for action cube

Thanks for sharing the nice work!
I modify the script 'train.py' and use it as a demo for inference. Now I can get frame-level result, but I still have some problem:

  1. Where is the linking strategy that your paper introduced in chapter3.3? I can't find it in code.
  2. What does the action-cube mean? What is it used for? I see it as the start and end time of the action instance, is that right?

train error

when i train the ucf24,it occurred,can you give me some advice? thank you
Traceback (most recent call last):
File "train.py", line 322, in
train(epoch)
File "train.py", line 180, in train
loss = region_loss(output, target)
File "/home/zc/anaconda3/envs/torch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/zc/coode/YOWO/region_loss.py", line 230, in forward
nH, nW, self.noobject_scale, self.object_scale, self.thresh, self.seen)
File "/home/zc/coode/YOWO/region_loss.py", line 127, in build_targets
iou = bbox_iou(gt_box, pred_box, x1y1x2y2=False) # best_iou
File "/home/zc/coode/YOWO/utils.py", line 74, in bbox_iou
cw = w1 + w2 - uw
RuntimeError: Expected object of type torch.DoubleTensor but found type torch.FloatTensor for argument #3 'other'

Runtime Question

Thankfully, I have sucessed run train demo, but i want to know how can i see the rest time about training?

A bit confused about the NMS

In the code, why is it when doing NMS, the confidence (detection confidence times class confidence) is substracted from 1 first? why is this flipping necessary?

Environment requirements and Pytorch version

Thanks for your open source, but I'm confused about the environment requirements.
Could you please tell me what's the environments needed for running YOWO, especially the pytorch version?
Cause I didn't find anything about it in READ.ME
Thanks a lot

Question about the loss function.

I am wondering how do you calculate the loss function.
I know it is the sum of 6 aspects, but i am a bit confused on how it is calculated.

what's the input shape of model?

Try feed into a frame with shape: torch.Size([1, 3, 224, 224]) got error:

   output = model(data)
  File "E:\apps\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\\YOWO\model.py", line 81, in forward
    x_2d = input[:, :, -1, :, :] # Last frame of the clip that is read
IndexError: too many indices for tensor of dimension 4

Also tried permute make channel last but still got this.

Does input needs concate current frame and the frame before current togather when inference???

error with original video

I got following error:

(base) yu@64:~/cv2/YOWO$ python video_mAP.py --dataset ucf101-24 --data_cfg cfg/ucf24.data --cfg_file cfg/ucf24.cfg --n_classes 24 --backbone_3d resnext101 --backbone_2d darknet --resume_path /home/yuyonod/cv2/YOWO/models/yowo_ucf101-24_16f_best_fmap_08749.pth
DataParallel(
(module): YOWO(
(backbone_2d): Darknet(
(models): ModuleList()

===================================================================
loading checkpoint /home/yuyonod/cv2/YOWO/models/yowo_ucf101-24_16f_best_fmap_08749.pth

/home/yuyonod/cv2/YOWO/test.avi

iou is: 0.05
Traceback (most recent call last):
File "video_mAP.py", line 334, in
video_mAP_ucf()
File "video_mAP.py", line 235, in video_mAP_ucf
print(evaluate_videoAP(gt_videos, detected_boxes, CLASSES, iou_th, True))
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 236, in evaluate_videoAP
pred_videos_format = imagebox_to_videts(all_boxes, CLASSES)
File "/home/yuyonod/cv2/YOWO/eval_results.py", line 213, in imagebox_to_videts
preVideo = os.path.dirname(keys[0])
IndexError: list index out of range

Do you fix this problem? and Also is it possible to action localize with original video beyond the dataset?

def test in train.py

in the def test do you do any optimization or training?

if not then i can use it for trying the model I had saved?

train annotation

Hi ,I want to know whether the train annotations label's format is [classes,x,y,w,h] or [classes,xmin,ymin,xmax,ymax]?

Question regarding the paper

Hi! Thanks for open sourcing the implementation. This is my first paper i read in spatio-temporal action localization.

I am aware that this might be not the best place to ask about the paper, but i cant seem to find other ways to contact (i.e, email address) in the paper so i decided to make an issue.

So i've read:

  1. the action tube paper, and
  2. i also read issue #21, and
  3. seeing that in the paper, the classification is associated with each bounding boxes

And from that, i make 1 conclusion :
That the whole linking thing is all about associating detections of one or more actions in one frame into one or more actions in subsequent frames

So does this mean it is possible for multiple actions of different classes to be detected at once?

I am sorry if this is such a silly question, i can't seem to find the statement above being explicitly stated.

some problems

Thank you in advance for sharing your research!

  1. I want to download ucf101-24 data to see it, but I can't get it from the link,Can you provide it to Baiduyun Disk together?

  2. YOWO's input is a continuous frame of motion, not a video clip? How is the key frame of the 2D network input selected,Is it the last frame of the video?

  3. I use the labelme tool to label my own video action clips. First, this video is cut into continuous frames,these frames are used as input to the network. Annotate each frame to get the corresponding json file and then converted to txt format as label. Is that right?

I am so sorry for my too many questions. I didn't understand it at the beginning of the research, hope to get your answer, thanks!

[discuss] Can you share the extra tricks you use?

I have train yowo on ucf101-24 and jhmdb21 datasets, the best frame_mAP I can achieve is 84.3% and
70.9% for UCF101-24 and J-HMDB-21, which are far behind the accuracy of your report. And I see following words:

NOTE: With some extra tricks, YOWO achieves 87.5% and 76.7% frame_mAP for UCF101-24 and J-HMDB-21 datasets, respectively.

Can you share the extra tricks you use?

no training dataset annotations

I download the dataset annotations from the link in README, but I find that there are only testing dataset annotations, no annotations of tranlist.txt. Can you check it out ? Thank you.

How is it supposed to do inference?

Hi,

I have been trying to do inference on a video and I found myself lost. I have read that it is necessary to do inference in frames instead of videos, is it true?

Where is the script to run inference?

Thanks you your time,

Miguel.

Question about frame mAP evaluation

Thanks for your awesome project. I want to evaluate the frame mAP performance as you offered in the paper. Could you please tell me which function is for frame mAP evaluation?

about train

Thanks for your code sharing,when I run
python train.py --dataset ucf101-24 --data_cfg cfg/ucf24.data --cfg_file cfg/ucf24.cfg --n_classes 24 --backbone_3d resnext101 --backbone_2d darknet --backbone_2d_weights weights/yolo.weights - -resume_path weights/yowo_ucf101-24_16f_best.pth
shown in terminal

RuntimeError: Error(s) in loading state_dict for YOWO:
        Missing key(s) in state_dict: "backbone_2d.models.0.conv1.weight", "backbone_2d.models.0.bn1.weight",

Can you share the Cam code

Thank you for your open source!
I'm interested in the activations maps for 2D and 3D backbones of the trained model,but relevant code is not open source.Can you share the Cam code

some errors when try to run inference on cameras

@okankop @wei-tim First of all thanks for open source and your great work , I had few queries when I try to run inference on cameras.

1.a little error need to be fixed.
https://github.com/wei-tim/YOWO/blob/master/model.py#L73
nn.Conv2d(1024, 5*(opt.n_classes+4+1), kernel_size=1, bias=False)
should be
nn.Conv2d(1024, 5*(int(opt.n_classes)+4+1), kernel_size=1, bias=False)
2.In the dataloader
https://github.com/wei-tim/YOWO/blob/master/dataset.py#L63

        # (self.duration, -1) + self.shape = (8, -1, 224, 224)
        clip = torch.cat(clip, 0).view((self.clip_duration, -1) + self.shape).permute(1, 0, 2, 3)

the input image clip is a 4-dims tensor with shape (3,D,H,W) .
However,when model forward,the last frame of the clip is sliced from the input by input[:, :, -1, :, :]
seems to be that the clip is a 5-dims tensor with shape (batch_size,3,D,H,W)
https://github.com/wei-tim/YOWO/blob/master/model.py#L81,

    def forward(self, input):
        x_3d = input # Input clip
        x_2d = input[:, :, -1, :, :] # Last frame of the clip that is read

Download

The datastet annotations and pre-trained models can’t be downloaded. Can anyone share it?

Thanks!

ValueError: invalid literal for int() with base 10: 'v_Sur'

When using the command stated in the README file I got the following error:

2020-01-28 19:01:54 training at epoch 1, lr 0.000100
Traceback (most recent call last):
  File "train.py", line 322, in <module>
    train(epoch)
  File "train.py", line 170, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
    return self._process_data(data)
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/bt/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/bt/Desktop/YOWO/dataset.py", line 53, in __getitem__
    clip, label = load_data_detection(self.base_path, imgpath,  self.train, self.clip_duration, self.shape, self.dataset_use, jitter, hue, saturation, exposure)
  File "/home/bt/Desktop/YOWO/clip.py", line 161, in load_data_detection
    im_ind = int(im_split[num_parts-1][0:5])
ValueError: invalid literal for int() with base 10: 'v_Flo'

Do you have any idea why this occur?

I have been digging into the code and I just can't get it to work. I understand the error itself, but not its origin.

Thank you for your time.

Training and Inference

@okankop @wei-tim First of all thnaks for open sourcing your work and doing a great work , i had few queries

  1. can we run the inference pipeline on CPU
  2. can we add the architectures like efficientnet ,centernet retinanet to ur 3dCNN backbone
  3. to train it on the custom dataset what is the resolution required
  4. have you tested this model on real time video and what is ur fb on this

thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.