lisaanne / localizingmoments Goto Github PK

View Code? Open in Web Editor NEW

188.0 188.0 44.0 13.38 MB

Github for my ICCV 2017 paper: "Localizing Moments in Video with Natural Language"

Python 6.82% Shell 0.90% OpenEdge ABL 92.28%

localizingmoments's People

Contributors

Stargazers

Watchers

Forkers

vanpersie32 mohanarunachalam lwwang hyzcn iqbal-chowdhury binbinbian brussell123 liviust afcarl mbajaj01 wwn1994 niluthpol ammieqi lianglili sy-zhang theshadow29 so2jia huaizhengzhang lwye xiaobingdu jonghwanmun asankagp tanwey electryone m-and-ms madatonline abhishek9686 xiaoweihappy123 ameenali lithomas1 yangwf1 onlyonewater josipd meigaoms howardhsu tongbaochen hgzjy25 fdk1030 kylemin jacobswan1 5l1v3r1 sylee0520 aspnetcs

localizingmoments's Issues

Downloading videos issue

Hi! Thanks for providing the download script. I ran it with the proper flags and it downloads 47 videos to my directory and then stops. The output is this:

Downlaoding video: 2/10464Could not download link: https://www.flickr.com/video_download.gne?id=4138851955
Downlaoding video: 14/10464Could not download link: https://www.flickr.com/video_download.gne?id=3844533419
Downlaoding video: 40/10464Could not download link: https://www.flickr.com/video_download.gne?id=6187277904
Downlaoding video: 49/10464
0 videos are missing

Please let me know how to download the rest of the videos. Thanks!

The paper states that each video is 25-30 secs long and segmented into 5-6 5 sec clips. But there are some videos (example) that are > 1 min in duration. Do you have a pre-processing step where you cut 25-30sec chunks from the videos or do you change the FPS so each video is 25-30secs long? Please let me know if I am missing something.

Thanks!

about the time points

Hi,
In the .json file, each video has more than one time points. For example, "video": "26292851@N04_4253489686_265c3c8051.m4v" has 7 time points: "times": [[4, 4], [4, 4], [0, 0], [4, 4], [0, 0], [0, 0], [4, 4]]. I wanna to know which one you choose as ground truth when training your model?
And how to evaluate the testing result with these time points?

Temporal info in RGB features

Hi Lisa,

Thank you for the great work!

For the RGB features, it seems that there's a 4096 vector for each of the 6 segments, which correspond to a number of frames. May I ask whether there are temporal information encoded in the features? e.g. 4096 is flattened from a 16*256 matrix, where the ith row is a 256 feature vector for the ith image?

Thanks a lot!

could not download videos

hello!I find I can't download the Didemo dataset from AWS,why is it?

Where to obtain annotations?

Hi! I could find where the video data located, but did not find the annotations for train, val and test. Wondering if you could point me to that? Thanks

Question about extract feature

Hi, Lisa. Could you tell me what model you use to extract RGB and Flow feature from each frame? VGG16 or VGG19 pretrained on ImageNet?

Cannot download the videos

Hi Lisa,

Thanks for the very useful dataset.

When i was trying to download the data from google drive, there are around half training videos missing due to some unknown errors (seems some cannot be compressed) so I tried the script you provided - but it seems all the links are inactive on AWS. Could you please shed some lights on this? Much appreciated!

Best wishes

Question About the Evaluation

Hello

I have a question regarding calculating the Averge IoU
Specially in the following Line:
https://github.com/LisaAnne/LocalizingMoments/blob/master/utils/eval.py#L27
ious = [iou(pred, t) for t in d['times']]

average_iou.append(np.mean(np.sort(ious)[-3:]))
Why are you taking only the best 3 ?
My guess is that in the valset the minimum groundtruth for each video is 3?

About video duration

Hi, @LisaAnne , I have a question, how can I get the information about the duration of each video?

input and loss of your model

why need add 1 ?

in utils/eval.py
def iou(pred, gt):
intersection = max(0, min(pred[1], gt[1]) + 1 - max(pred[0], gt[0]))
union = max(pred[1], gt[1]) + 1 - min(pred[0], gt[0])
return float(intersection)/union

if
p=[3,5]
g=[4,6]
then the iou should be (5-4)/(6-3)=0.33333
but iou(p,g)=0.5
Did I get it wrong？

the experiment result

Hi,Lisa. I have tested your released model and the model trained with your training code, but neither these models can achieve the accuracy in your paper. Could you give me some help?

Downloading videos

Can you provide MD5sum code for videos? I run the download py file but sometimes get stuck and I have to restart downloading after checkpoint. Besides, If you can provide the datasets via url link on google cloud or torrents for ipv6, it would be a gift for us!

Cannot download files from https://people.eecs.berkeley.edu/~lisa_anne/didemo/

I cannot access https://people.eecs.berkeley.edu/~lisa_anne/didemo/. It asks for a username and password to log in. Are there other ways to download the models and the 13 videos missing from AWS? When I run download/get_models.sh I got:
--2020-06-03 22:13:15-- https://people.eecs.berkeley.edu/~lisa_anne/didemo/models/deploy_clip_retrieval_rgb_iccv_release_feature_process_context_recurrent_embedding_lfTrue_dv0.3_dl0.0_nlv2_nlllstm_no_embed_edl1000-100_edv500-100_pmFalse_losstriplet_lwInter0.2.prototxt
Resolving people.eecs.berkeley.edu (people.eecs.berkeley.edu)... 128.32.189.73
Connecting to people.eecs.berkeley.edu (people.eecs.berkeley.edu)|128.32.189.73|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Setup with conda

Hi @LisaAnne!

You might wanna take a look at this conda environment to ease setup for non-caffe users.

Cheers,
Victor

Can not reproduce your result by the model you provide

Thanks for your good work.
I am trying yo run your model but get a much lower result(0.10@1, 0.4@5, 0.25iou), I guess it may result from the glove version you use(I am using glove.6b), could you tell me the version you use?

By the way, the stacked mode I use is 'overall-video_mean + local-video_mean + segment'(by experiment, it outperforms the other stacked way, but I still want to make sure the way you use...)
Thank you

Question about dataset and code

Hello, thanks for your great work and releasing your experiment code !
I want to use your experiment result as my baseline. But i have met some problems :

I used your python script to download DeDiMo dataset, but too many videos have been removed from Flickr. Could you help me download the whole dataset ?
I found that the code you released can't be directly used to train. Could you release the train and test code ? If you could release, please tell me when :).
Thanks a lot !