mbuckler / youtube-bb Goto Github PK
View Code? Open in Web Editor NEWPublic repo for helpful scripts when using the YouTube Bounding Boxes dataset
License: MIT License
Public repo for helpful scripts when using the YouTube Bounding Boxes dataset
License: MIT License
The videos are downloaded but are named _temp and not converted to frames. Any idea what the issue is?
Hello,
I'm wondering what is the size of the classification dataset(before and after decoding)? I assume I do not have so much disk space therefore some information would be very helpful for me to cut it into subsets.
Also appreciate for the work. Many thanks!.
How much space does the entire data set need?
Hi @mbuckler,
The download.py file runs fine for me. It downloads the csv, creates directories to download videos, says on the command line that Downloaded video: 193733 / 193733. But no video shows up in the specified directory that the script created. Can you tell me what am I missing?
Thank You
While downloading the video is it possible to download the videos of a predefined high resolution (let's say 1080p)?
What do I need to do in order to achieve this?
System: Ubuntu 14.04.5 LTS
Python: Python 2.7.6
Pip packages:
ffmpy==0.2.2
futures==3.0.5
imageio==2.1.1
moviepy==0.2.2.13
multiprocess==0.70.5
youtube-dl==2017.3.2
I get the following error when I run your script. Any help would be appreciated.
yt_bb_classification_train: Downloading annotations...
--2017-03-04 00:16:19-- https://research.google.com/youtube-bb/yt_bb_classification_train.csv.gz
Resolving research.google.com (research.google.com)... 172.217.5.110, 2607:f8b0:4005:808::200e
Connecting to research.google.com (research.google.com)|172.217.5.110|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/gzip]
Saving to: 'yt_bb_classification_train.csv.gz'
[ <=> ] 28,582,014 16.7MB/s in 1.6s
2017-03-04 00:16:21 (16.7 MB/s) - 'yt_bb_classification_train.csv.gz' saved [28582014]
yt_bb_classification_train: Unzipping annotations...
yt_bb_classification_train: Parsing annotations into clip data...
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/process.py", line 208, in _queue_management_worker
result_item = result_queue.get(block=True)
File "/usr/lib/python2.7/multiprocessing/queues.py", line 117, in get
res = self._recv()
TypeError: ('__init__() takes at least 3 arguments (1 given)', <class 'subprocess.CalledProcessError'>, ())
For those who might find the problem that video can't be downloaded like me (similar to #25 ).
It seems like the download URL doesn't work in the original way.
To resolve this issue, try changing this line of code:
Line 152 in a0749ef
'https://www.youtube.com/watch?v=' + vid.yt_id
then wait for a long long time ๐.
I've been trying to run the downloader and it hangs every time. Any ideas on what's going on?
`yt_bb_classification_train: Downloading annotations...
--2017-03-14 15:10:00-- https://research.google.com/youtube-bb/yt_bb_classification_train.csv.gz
Resolving research.google.com (research.google.com)... 216.58.194.174, 2607:f8b0:4005:804::200e
Connecting to research.google.com (research.google.com)|216.58.194.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/gzip]
Saving to: โyt_bb_classification_train.csv.gzโ
[ <=> ] 28,582,014 12.9MB/s in 2.1s
2017-03-14 15:10:02 (12.9 MB/s) - โyt_bb_classification_train.csv.gzโ saved [28582014]
yt_bb_classification_train: Unzipping annotations...
yt_bb_classification_train: Parsing annotations into clip data...
Downloaded video: 253567 / 253569
`
I did as as you said,but the data folder had empty images,and the scripts gave no errors.
Hi Mark,
Thank you for the scripts. I have noticed that the jpg images extracted by voc_convert have some visible blocks as you can see in the attached images. Same thing happens even if I modified the image extension to 'png'.
I think this creates some unwanted edges on the image, do you have an idea why and how to solve it?
Many thanks,
Yiming
Hi I am really new to this and I am getting the following error...any help?
MBP:youtube-bb-master_jul19 Mac$ python3 download.py [videos] [30]
Traceback (most recent call last):
File "download.py", line 43, in
parse_and_sched(sys.argv[1],int(sys.argv[2]))
ValueError: invalid literal for int() with base 10: '[30]'
The download and cut script currently uses keyframe searching when cutting into clips. Full re-encoding is necessary to ensure frame alignment: http://www.markbuckler.com/post/cutting-ffmpeg/
Hi,
Thanks for your sharing. I am planning to work on Youtube-BB dataset too. The problem is, there are tons of videos and I don't really need to use all of them except frames with bounding box annotations. Others have expressed similar demands in stackoverflow.
So could you please upload only images with annotations? I am sure it will contribute to our community by making this dataset more available to researchers with limited storage or internet bandwidth.
Connecting to research.google.com (research.google.com)|2404:6800:4008:800::200e
|:443... failed: Unknown error.
Connecting to research.google.com (research.google.com)|172.217.24.14|:443... fa
iled: Unknown error.
While downloading the youtube-bb-videos getting the following issue of directory not present. I am using python 3.7 on windows.
Complete Log:
C:\Users\spaul\Downloads\youtube-bb-master\youtube-bb-master>python download.py videos 1
Traceback (most recent call last):
_File "download.py", line 43, in
parse_and_sched(sys.argv[1],int(sys.argv[2]))
File "download.py", line 30, in parse_and_sched
check_call(['mkdir', '-p', dl_dir])
File "C:\python37\lib\subprocess.py", line 342, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\python37\lib\subprocess.py", line 323, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\python37\lib\subprocess.py", line 775, in init
restore_signals, start_new_session)
File "C:\python37\lib\subprocess.py", line 1178, in execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
I also tried to provide the complete path
C:\Users\spaul\Downloads\youtube-bb-master\youtube-bb-master>python download.py C:\Users\spaul\Downloads\youtube-bb-master\youtube-bb-master\videos\ 1
But still the same error. can you please tell me how to solve this?
Hi Mark,
I have been running the voc_convert.py for few days and there are 587703 images now. How many in total should I expect?
Best,
Yiming
Why is there only a progress bar when downloading, but there is no video in folder?
Hello, thank you for your sharing. But i have some problems about the dataset.
I downloaded 74459 videos successfully, and it takes 1.3T of space...
It begin to dowloaded yt_bb_detection_validation.csv.gz
, but my hard drive ran out of space.
Does this mean the download of yt_bb_detection_train.csv
is complete?
Can i use voc_convert.py
to dealing with the 70,000 videos ?
How much space will it take for all video downloads of the training set and validation set?
And how much space will the dataset take after use the voc_convert.py
?
Hi, your repo is good , and I hava downloaded the youtube data, now I wanna decode them into the voc training data, but I found the running speed is so slow,and it also has a bad case:
is there any good solution to avoid this ? only try more threads to speed up? I have used 64 threads... but it is also slow...
Now getting a new error message, and i have installed wget on my system.
Safats-MBP:youtube_BB_jul19 SMT_Mac$ python3 download.py vid_dir 30
yt_bb_detection_validation: Downloading annotations...
Traceback (most recent call last):
File "download.py", line 41, in
parse_and_sched(sys.argv[1],int(sys.argv[2]))
File "download.py", line 32, in parse_and_sched
annotations,clips,vids = youtube_bb.parse_annotations(d_set,dl_dir)
File "/Users/SMT_Mac/Desktop/youtube_BB_jul19/youtube_bb.py", line 182, in parse_annotations
check_call(['wget', web_host+d_set+'.csv.gz'])
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 579, in check_call
retcode = call(*popenargs, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 560, in call
with Popen(*popenargs, **kwargs) as p:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 950, in init
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1544, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'wget'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.