Giter Site home page Giter Site logo

bsn-boundary-sensitive-network.pytorch's People

Contributors

v-iashin avatar wzmsltw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

bsn-boundary-sensitive-network.pytorch's Issues

New dataset

Hi there, thanks for releasing your code. I've went through it with the intention of adding a new dataset and, as far as I can tell, the main thing that needs to be done is to generate the video_anno file, which is a large json consisting of:

I understand that the annotations field is meant to be a list of {'label': , 'segment': [start, end]}, but can you verify what the other three are meant to be? It's not clear if duration_second is according to a normalized FPS or if it's just the timestamp in the video. It's also unclear what the difference is between duration_frame and feature_frame.

In what units is the start and end of segment, i.e. is it relative to the actual time in the video or a normalized time?

Additionally, I will not be modifying the video to be 100 frames each. It seems like you did that for ActivityNet but the paper doesn't mention anything similar for Thumos. What was your strategy for Thumos?

Finally, what's the story with video_df in _get_base_data? It seems like it loads the full data in every time. That's 11G uncompressed. Is this right?

pem_low_iou_thres default value

In the opts.py, the 'pem_low_iou_thres' default value is 2.2. I suppose this is a small mistake. Do you mean 0.2 for this?

####code#######
parser.add_argument(
'--pem_low_iou_thres',
type=float,
default=2.2)
###############

Thanks.

Testing framework on THUMOS

@cinjon how is your progress to try this code on THUMOS?

If I understand this correctly I have to extract the snippet level features using TSN (https://github.com/yjxiong/anet2016-cuhk). But the anet2016-cuhk is pretrained on activity net so you first have to finetune the network on THUMOS and then extract the snippet level features from THUMOS and do the TEM, PGM & finally the PEM training? Is this correct?

Originally posted by @tobiascz in #12 (comment)

Feature extraction questions

Hi there, I am trying to train this on another dataset and am getting a bit stuck trying to figure out how exactly you extracted the features using TSN.

I am using the mmaction repository (which is what the authors of the TSN library you suggest ... suggest using) and the approach in that repository is to oversample by first computing the crops and flips and then run that through the model.

I noticed in #3 that you said you don't remember if you used oversample or not. Has that changed by any chance? It would save me a lot of time if you can remember that.

Also, I noticed that the size of the features in the provided CSV were each of size 400. That seems small given that the TSN outputs features of size 1024 out of the box. Is there some other setting you used to get size 400 features?

Thanks for your help.

Error occurs when training PEM The size of tensor a (1600) must match the size of tensor b (8000) at non-singleton dimension 0

When I train PEM module, an error occurs as followed.
But the first time I trained this module around a week ago, it works well.
Could someone solve this??


		Traceback (most recent call last):
		  File "main.py", line 297, in <module>
		    main(opt)
		  File "main.py", line 269, in main
		    BSN_Train_PEM(opt)
		  File "main.py", line 170, in BSN_Train_PEM
		    test_PEM(test_loader,model,epoch,writer,opt)
		  File "main.py", line 104, in test_PEM
		    iou_loss = PEM_loss_function(PEM_output,label_iou,model,opt)
		  File "/lvjc/project/BSN-boundary-sensitive-network.pytorch/loss_function.py", line 71, in PEM_loss_function
		    iou_loss = F.smooth_l1_loss(anchors_iou,match_iou)
		  File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 2113, in smooth_l1_loss
		    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
		  File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/functional.py", line 49, in broadcast_tensors
		    return torch._C._VariableFunctions.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (1600) must match the size of tensor b (8000) at non-singleton dimension 0

feature of thumos14

iam stuck to prepare data by slide window for thumos14,is anyone can send me the BSN code on thumos14? my email is [email protected],i would be really appreciated!

Please post training curves for Thumos

Here are what mine look like using the provided code for Thumos dataset, but converted to Pytorch. I am trying to debug this because I don't think these are correct given the poor test results.

Training curve. The line to the right is the same as the line to the left, except using 4 gpus and a corresponding 4x batch size and 4x learning rate.
train_total_loss-VS-step

Test curve. Note how this never goes down; it only shows overfitting to Train.
test_total_loss-VS-step

Error: Sizes of tensors must match except in dimension 0. Got 1000 and 723 in dimension 1

It works well until run this step.

python main.py --module PEM --mode inference

Here's the output result:

PEM inference start
validation subset video numbers: 4728
Traceback (most recent call last):
  File "main.py", line 298, in <module>
    main(opt)
  File "main.py", line 276, in main
    BSN_inference_PEM(opt)
  File "main.py", line 221, in BSN_inference_PEM
    for idx,(video_feature,video_xmin,video_xmax,video_xmin_score,video_xmax_score) in enumerate(test_loader):
  File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 209, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1000 and 723 in dimension 1

Is the weight initialization correct?

The weight_init code in models.py TEM is:

    @staticmethod
    def weight_init(m):
        if isinstance(m, nn.Conv2d):
            init.xavier_normal(m.weight)
            init.constant(m.bias, 0)

However, there are no nn.Conv2d modules in TEM, so this is never triggered. What is this supposed to be?

Is oversample used for feature extraction?

Thank you for your work!
When I used TSN to extract features, I found oversample was used in the code, which results in the output of 'fc-action' being 10*200 (input being an image).
So I have to ask the following questions.
1, Whether oversample is used for feature extraction? If so, is the average taken at the end?
2, Or center-crop(w: 224 h: 224 c: 3) is adopted when extracting features in an image(340 256 3) or flow stack(340 256 10)?

A question about TEM module loss

Hello, Mr. Lin, very thanks, the code is useful for me.
Sorry to be a bother.
When I try to run it, the TEM testing loss is almost equal from epoch 1 to 20, the module learned nothing.
What maybe cause this question? Is there any advises?
Looking forward to your reply, thank you very much!

feature extraction in THUMOS14

Thx for the excellent work for the community! I have two confusions hoping to be answered:

  1. When reading the paper, I found the 2 stream network released in NIPS was used. But the TSN is used, when I read the code here. So what do we use?
  2. I found that the process is different between THUMOS14 and ActivityNet in Feature extraction questions #14. Could you sent me the code on THUMOS14 please? My email is [email protected]

Expected time to train on Thumos?

Training on Thumos seems to go extremely quickly given the data loader that you sent out. There appears to be only ~2500 examples in the resulting dataset, which only takes ~3-5 minutes to train to 20 epochs. Is this correct?

Real time video detection

Hi,
Thanks for the code! I just wanted to ask if BSN(Boundary sensitive network) as well as BMN(Boundary matching network) can be applied for real time action detection in videos?

Training procedure

As far as I can tell, the training procedure with this repo is to first train the TEM, then generate the PGMs, then use those to train the PEM. Do you then repeat or do you do it just once?

Is that possible to release the detection demo?

Great work! Thank you for sharing!

This work is for generating action proposals.
You mentioned in your paper, for temporal action detection, on activityNet-1.3, you adopt top-1 video-level classification results generated by method of [Zhao, Y., et al., CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2017], and use confidence scores of our proposals for detection results retrieving. I did not get how you do it. Could you please explain my following questions:

  1. In the paper of Zhao, Y., et al., CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2017, it mentioned different methods for different tasks. Which method is adopted by you? Is it SSN(Structure Segment Networks)?

  2. BSN will generate at least 100 proposals. Will you chose all proposals for action classification?

  3. Assume you only choose top-k (such as k = 2, 3, 4, 5) proposals output from BSN for action classification. For each selected proposal generated from BSN, you do video-level action classification. Does it mean for a proposal "started on frame m, and ended on frame n", you will generate only one action label?

  4. I wonder if it is possible to release your detection demo. That will be awesome!

PCA for extracted feature?

Thanks for your impressive work. I note that the feature extracted by TSN is 3072-dim but your default input dimension is 400. Have you used PCA for dimensionality reduction?

TEM module

Great job for action proposal.

    1, By reading your paper, i am wondering whether conv2d is better than conv1d in TEM module for action score regression?

   2, Can overlaping of slide windows in TEM feature sequence give us better scores?

looking forward for your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.