wzmsltw / bsn-boundary-sensitive-network.pytorch Goto Github PK
View Code? Open in Web Editor NEWCodes of our paper: "BSN: Boundary Sensitive Network for Temporal Action Proposal Generation"
Codes of our paper: "BSN: Boundary Sensitive Network for Temporal Action Proposal Generation"
Hi,thx for the brilliant work!
I encounter a problem for using sliding window method in THUMOS14 dataset and I notice your reply in Feature extraction questions #14. I would be appreciate if you can send me codes of BSN in THUMOS14? My email address is [email protected]. Thx again!
Hi there, thanks for releasing your code. I've went through it with the intention of adding a new dataset and, as far as I can tell, the main thing that needs to be done is to generate the video_anno file, which is a large json consisting of:
I understand that the annotations field is meant to be a list of {'label': , 'segment': [start, end]}, but can you verify what the other three are meant to be? It's not clear if duration_second is according to a normalized FPS or if it's just the timestamp in the video. It's also unclear what the difference is between duration_frame and feature_frame.
In what units is the start and end of segment, i.e. is it relative to the actual time in the video or a normalized time?
Additionally, I will not be modifying the video to be 100 frames each. It seems like you did that for ActivityNet but the paper doesn't mention anything similar for Thumos. What was your strategy for Thumos?
Finally, what's the story with video_df in _get_base_data? It seems like it loads the full data in every time. That's 11G uncompressed. Is this right?
In the opts.py, the 'pem_low_iou_thres' default value is 2.2. I suppose this is a small mistake. Do you mean 0.2 for this?
####code#######
parser.add_argument(
'--pem_low_iou_thres',
type=float,
default=2.2)
###############
Thanks.
@cinjon how is your progress to try this code on THUMOS?
If I understand this correctly I have to extract the snippet level features using TSN (https://github.com/yjxiong/anet2016-cuhk). But the anet2016-cuhk is pretrained on activity net so you first have to finetune the network on THUMOS and then extract the snippet level features from THUMOS and do the TEM, PGM & finally the PEM training? Is this correct?
Originally posted by @tobiascz in #12 (comment)
Hi there, I am trying to train this on another dataset and am getting a bit stuck trying to figure out how exactly you extracted the features using TSN.
I am using the mmaction repository (which is what the authors of the TSN library you suggest ... suggest using) and the approach in that repository is to oversample by first computing the crops and flips and then run that through the model.
I noticed in #3 that you said you don't remember if you used oversample or not. Has that changed by any chance? It would save me a lot of time if you can remember that.
Also, I noticed that the size of the features in the provided CSV were each of size 400. That seems small given that the TSN outputs features of size 1024 out of the box. Is there some other setting you used to get size 400 features?
Thanks for your help.
Hi,
I want to know if the evaluation program is the same on the THUMOS14 dataset?
When I train PEM
module, an error occurs as followed.
But the first time I trained this module around a week ago, it works well.
Could someone solve this??
Traceback (most recent call last):
File "main.py", line 297, in <module>
main(opt)
File "main.py", line 269, in main
BSN_Train_PEM(opt)
File "main.py", line 170, in BSN_Train_PEM
test_PEM(test_loader,model,epoch,writer,opt)
File "main.py", line 104, in test_PEM
iou_loss = PEM_loss_function(PEM_output,label_iou,model,opt)
File "/lvjc/project/BSN-boundary-sensitive-network.pytorch/loss_function.py", line 71, in PEM_loss_function
iou_loss = F.smooth_l1_loss(anchors_iou,match_iou)
File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 2113, in smooth_l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/functional.py", line 49, in broadcast_tensors
return torch._C._VariableFunctions.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (1600) must match the size of tensor b (8000) at non-singleton dimension 0
Hi Tianwei @wzmsltw
Thanks for your awesome work! I'm also trying to run BSN with some customized video data. However, I'm confused with a part of your code: it' line 86 of the post_processing.py. I also noticed that you mentioned 16 in your paper in multiple places and I have no idea about the 16*16 in your code means.
Would you mind helping me with this question?
Thanks!
iam stuck to prepare data by slide window for thumos14,is anyone can send me the BSN code on thumos14? my email is [email protected],i would be really appreciated!
Here are what mine look like using the provided code for Thumos dataset, but converted to Pytorch. I am trying to debug this because I don't think these are correct given the poor test results.
Training curve. The line to the right is the same as the line to the left, except using 4 gpus and a corresponding 4x batch size and 4x learning rate.
Test curve. Note how this never goes down; it only shows overfitting to Train.
When training, at approximately what level of loss for each of the parts does the model start to become reasonable? And what kind of curves should we expect?
It works well until run this step.
python main.py --module PEM --mode inference
Here's the output result:
PEM inference start
validation subset video numbers: 4728
Traceback (most recent call last):
File "main.py", line 298, in <module>
main(opt)
File "main.py", line 276, in main
BSN_inference_PEM(opt)
File "main.py", line 221, in BSN_inference_PEM
for idx,(video_feature,video_xmin,video_xmax,video_xmin_score,video_xmax_score) in enumerate(test_loader):
File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
return self._process_next_batch(batch)
File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File "/lvjc/envs/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 209, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1000 and 723 in dimension 1
In this repo, it is .01. In the tensorflow rep, it is .1. Which (of either) did you actually use?
Hi TianWei,
How many optical flow do you stack in a snippet(as described in your paper) s around center frame. Is it related to the length of snippet(delta)?
The weight_init code in models.py TEM is:
@staticmethod
def weight_init(m):
if isinstance(m, nn.Conv2d):
init.xavier_normal(m.weight)
init.constant(m.bias, 0)
However, there are no nn.Conv2d modules in TEM, so this is never triggered. What is this supposed to be?
Thank you for your work!
When I used TSN to extract features, I found oversample was used in the code, which results in the output of 'fc-action' being 10*200 (input being an image).
So I have to ask the following questions.
1, Whether oversample is used for feature extraction? If so, is the average taken at the end?
2, Or center-crop(w: 224 h: 224 c: 3) is adopted when extracting features in an image(340 256 3) or flow stack(340 256 10)?
Hello, Mr. Lin, very thanks, the code is useful for me.
Sorry to be a bother.
When I try to run it, the TEM testing loss is almost equal from epoch 1 to 20, the module learned nothing.
What maybe cause this question? Is there any advises?
Looking forward to your reply, thank you very much!
Thx for the excellent work for the community! I have two confusions hoping to be answered:
Hello, in models.py line 28 (
)Training on Thumos seems to go extremely quickly given the data loader that you sent out. There appears to be only ~2500 examples in the resulting dataset, which only takes ~3-5 minutes to train to 20 epochs. Is this correct?
Hi,
Thanks for the code! I just wanted to ask if BSN(Boundary sensitive network) as well as BMN(Boundary matching network) can be applied for real time action detection in videos?
Is there a way (or let's say a snippet) of getting predictions for a custom video?
Example signature:
start_points, end_points = BSN(path_to_dir_with_video_frames)
As far as I can tell, the training procedure with this repo is to first train the TEM, then generate the PGMs, then use those to train the PEM. Do you then repeat or do you do it just once?
Hi, do you have the results for Table 2 (https://arxiv.org/pdf/1806.02964.pdf) for ActivityNet? I am speaking of the @50, @100, ..., @1000 proposal recall numbers. I only see those reported for Thumos. Thanks.
Great work! Thank you for sharing!
This work is for generating action proposals.
You mentioned in your paper, for temporal action detection, on activityNet-1.3, you adopt top-1 video-level classification results generated by method of [Zhao, Y., et al., CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2017], and use confidence scores of our proposals for detection results retrieving. I did not get how you do it. Could you please explain my following questions:
In the paper of Zhao, Y., et al., CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2017, it mentioned different methods for different tasks. Which method is adopted by you? Is it SSN(Structure Segment Networks)?
BSN will generate at least 100 proposals. Will you chose all proposals for action classification?
Assume you only choose top-k (such as k = 2, 3, 4, 5) proposals output from BSN for action classification. For each selected proposal generated from BSN, you do video-level action classification. Does it mean for a proposal "started on frame m, and ended on frame n", you will generate only one action label?
I wonder if it is possible to release your detection demo. That will be awesome!
@wzmsltw Can you please share the BSN in THUMOS code details at [email protected] ? Thanks!
With pytorch1.1 & cuda9.0 & python3
Thanks for your impressive work. I note that the feature extracted by TSN is 3072-dim but your default input dimension is 400. Have you used PCA for dimensionality reduction?
Great job for action proposal.
1, By reading your paper, i am wondering whether conv2d is better than conv1d in TEM module for action score regression?
2, Can overlaping of slide windows in TEM feature sequence give us better scores?
looking forward for your reply!
My video has different scenes.
First, I want to cut my video to different parts , then i want to do scene classification on different parts of video . Can I use BSN to finish my first step ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.