lisaanne / localizingmoments Goto Github PK
View Code? Open in Web Editor NEWGithub for my ICCV 2017 paper: "Localizing Moments in Video with Natural Language"
Github for my ICCV 2017 paper: "Localizing Moments in Video with Natural Language"
Hi! Thanks for providing the download script. I ran it with the proper flags and it downloads 47 videos to my directory and then stops. The output is this:
Downlaoding video: 2/10464Could not download link: https://www.flickr.com/video_download.gne?id=4138851955
Downlaoding video: 14/10464Could not download link: https://www.flickr.com/video_download.gne?id=3844533419
Downlaoding video: 40/10464Could not download link: https://www.flickr.com/video_download.gne?id=6187277904
Downlaoding video: 49/10464
0 videos are missing
Please let me know how to download the rest of the videos. Thanks!
Hi Lisa,
Thanks for this dataset!
The paper states that each video is 25-30 secs long and segmented into 5-6 5 sec clips. But there are some videos (example) that are > 1 min in duration. Do you have a pre-processing step where you cut 25-30sec chunks from the videos or do you change the FPS so each video is 25-30secs long? Please let me know if I am missing something.
Thanks!
Hi,
In the .json file, each video has more than one time points. For example, "video": "26292851@N04_4253489686_265c3c8051.m4v" has 7 time points: "times": [[4, 4], [4, 4], [0, 0], [4, 4], [0, 0], [0, 0], [4, 4]]. I wanna to know which one you choose as ground truth when training your model?
And how to evaluate the testing result with these time points?
Hi Lisa,
Thank you for the great work!
For the RGB features, it seems that there's a 4096 vector for each of the 6 segments, which correspond to a number of frames. May I ask whether there are temporal information encoded in the features? e.g. 4096 is flattened from a 16*256 matrix, where the ith row is a 256 feature vector for the ith image?
Thanks a lot!
hello!I find I can't download the Didemo dataset from AWS,why is it?
Hi! I could find where the video data located, but did not find the annotations for train, val and test. Wondering if you could point me to that? Thanks
Hi, Lisa. Could you tell me what model you use to extract RGB and Flow feature from each frame? VGG16 or VGG19 pretrained on ImageNet?
Hi Lisa,
Thanks for the very useful dataset.
When i was trying to download the data from google drive, there are around half training videos missing due to some unknown errors (seems some cannot be compressed) so I tried the script you provided - but it seems all the links are inactive on AWS. Could you please shed some lights on this? Much appreciated!
Best wishes
Hello
I have a question regarding calculating the Averge IoU
Specially in the following Line:
https://github.com/LisaAnne/LocalizingMoments/blob/master/utils/eval.py#L27
ious = [iou(pred, t) for t in d['times']]
average_iou.append(np.mean(np.sort(ious)[-3:]))
Why are you taking only the best 3 ?
My guess is that in the valset the minimum groundtruth for each video is 3?
Hi, @LisaAnne , I have a question, how can I get the information about the duration of each video?
in utils/eval.py
def iou(pred, gt):
intersection = max(0, min(pred[1], gt[1]) + 1 - max(pred[0], gt[0]))
union = max(pred[1], gt[1]) + 1 - min(pred[0], gt[0])
return float(intersection)/union
if
p=[3,5]
g=[4,6]
then the iou should be (5-4)/(6-3)=0.33333
but iou(p,g)=0.5
Did I get it wrong?
Hi,Lisa. I have tested your released model and the model trained with your training code, but neither these models can achieve the accuracy in your paper. Could you give me some help?
Can you provide MD5sum code for videos? I run the download py file but sometimes get stuck and I have to restart downloading after checkpoint. Besides, If you can provide the datasets via url link on google cloud or torrents for ipv6, it would be a gift for us!
I cannot access https://people.eecs.berkeley.edu/~lisa_anne/didemo/. It asks for a username and password to log in. Are there other ways to download the models and the 13 videos missing from AWS? When I run download/get_models.sh
I got:
--2020-06-03 22:13:15-- https://people.eecs.berkeley.edu/~lisa_anne/didemo/models/deploy_clip_retrieval_rgb_iccv_release_feature_process_context_recurrent_embedding_lfTrue_dv0.3_dl0.0_nlv2_nlllstm_no_embed_edl1000-100_edv500-100_pmFalse_losstriplet_lwInter0.2.prototxt
Resolving people.eecs.berkeley.edu (people.eecs.berkeley.edu)... 128.32.189.73
Connecting to people.eecs.berkeley.edu (people.eecs.berkeley.edu)|128.32.189.73|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Hi @LisaAnne!
You might wanna take a look at this conda environment to ease setup for non-caffe users.
Cheers,
Victor
Thanks for your good work.
I am trying yo run your model but get a much lower result(0.10@1, 0.4@5, 0.25iou), I guess it may result from the glove version you use(I am using glove.6b), could you tell me the version you use?
By the way, the stacked mode I use is 'overall-video_mean + local-video_mean + segment'(by experiment, it outperforms the other stacked way, but I still want to make sure the way you use...)
Thank you
Hello, thanks for your great work and releasing your experiment code !
I want to use your experiment result as my baseline. But i have met some problems :
Hi,
While trying to reproduce your work, I found "data/frame_rate_clean.p" used in make_average_video_dict_flow.py is missing. Should I use constant frame rate for it? Or, could you provide the file?
Hi Lisa, you might be interested in adding DiDeMo here.
Cheers 👋
Only 10464 items in hash.txt, but 10642 videos in JSON data? Where to find the missing 178 videos?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.