mbuckler / youtube-bb Goto Github PK

View Code? Open in Web Editor NEW

191.0 191.0 57.0 66 KB

Public repo for helpful scripts when using the YouTube Bounding Boxes dataset

License: MIT License

Python 100.00%

youtube-bb's People

Contributors

Stargazers

Watchers

Forkers

sampsyo kastnerkyle sunjieee soledad89 benjamesbabala ml-lab erkang xhwxd mehdi-shiba donghaima anguoyang ahmadh84 peratham phimachine charan223 moddel ajherman bonseyes pruebasvisilab caizex fqss0436 leeyeehoo aznikline deruci zyh0904 perfectzh 1587257781 sddai lcaikk1314 shiyongde meitianjinbu hedes1992 wufenggit lancewalker91 marmotatzju bebubu emptyops georgeandrei1998 primecai maxizi pberrt xilong-zhang faizmanjatech saharhusseini ml-edu fat-kid

youtube-bb's Issues

Download.py is not converting to frames

The videos are downloaded but are named _temp and not converted to frames. Any idea what the issue is?

download.py has very high memory usage

The main program itself is using nearly 4GB total, before downloading anything.

unable to establish SSL connection

classification dataset size

Hello,
I'm wondering what is the size of the classification dataset(before and after decoding)? I assume I do not have so much disk space therefore some information would be very helpful for me to cut it into subsets.
Also appreciate for the work. Many thanks!.

disk space needed

How much space does the entire data set need?

Says videos downloaded, but didn't actually download

Hi @mbuckler,
The download.py file runs fine for me. It downloads the csv, creates directories to download videos, says on the command line that Downloaded video: 193733 / 193733. But no video shows up in the specified directory that the script created. Can you tell me what am I missing?
Thank You

Downloading videos of a fixed resolution

While downloading the video is it possible to download the videos of a predefined high resolution (let's say 1080p)?

What do I need to do in order to achieve this?

Multithreading error

System: Ubuntu 14.04.5 LTS
Python: Python 2.7.6
Pip packages:
ffmpy==0.2.2
futures==3.0.5
imageio==2.1.1
moviepy==0.2.2.13
multiprocess==0.70.5
youtube-dl==2017.3.2

I get the following error when I run your script. Any help would be appreciated.

yt_bb_classification_train: Downloading annotations...
--2017-03-04 00:16:19--  https://research.google.com/youtube-bb/yt_bb_classification_train.csv.gz
Resolving research.google.com (research.google.com)... 172.217.5.110, 2607:f8b0:4005:808::200e
Connecting to research.google.com (research.google.com)|172.217.5.110|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/gzip]
Saving to: 'yt_bb_classification_train.csv.gz'

    [        <=>                                                                                                                         ] 28,582,014  16.7MB/s   in 1.6s   

2017-03-04 00:16:21 (16.7 MB/s) - 'yt_bb_classification_train.csv.gz' saved [28582014]

yt_bb_classification_train: Unzipping annotations...
yt_bb_classification_train: Parsing annotations into clip data...
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/process.py", line 208, in _queue_management_worker
    result_item = result_queue.get(block=True)
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 117, in get
    res = self._recv()
TypeError: ('__init__() takes at least 3 arguments (1 given)', <class 'subprocess.CalledProcessError'>, ())

Solve video can't be downloaded issue

For those who might find the problem that video can't be downloaded like me (similar to #25 ).
It seems like the download URL doesn't work in the original way.
To resolve this issue, try changing this line of code:

youtube-bb/youtube_bb.py

Line 152 in a0749ef

'youtu.be/'+vid.yt_id ], \

'https://www.youtube.com/watch?v=' + vid.yt_id

then wait for a long long time 🙂.

Downloader hangs

I've been trying to run the downloader and it hangs every time. Any ideas on what's going on?

`yt_bb_classification_train: Downloading annotations...
--2017-03-14 15:10:00-- https://research.google.com/youtube-bb/yt_bb_classification_train.csv.gz
Resolving research.google.com (research.google.com)... 216.58.194.174, 2607:f8b0:4005:804::200e
Connecting to research.google.com (research.google.com)|216.58.194.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/gzip]
Saving to: ‘yt_bb_classification_train.csv.gz’

[          <=>                                                                                                                       ] 28,582,014  12.9MB/s   in 2.1s

2017-03-14 15:10:02 (12.9 MB/s) - ‘yt_bb_classification_train.csv.gz’ saved [28582014]

yt_bb_classification_train: Unzipping annotations...
yt_bb_classification_train: Parsing annotations into clip data...
Downloaded video: 253567 / 253569
`

the question of downloading

I did as as you said,but the data folder had empty images,and the scripts gave no errors.

blocky effects in jpg images extracted by voc_convert

Hi Mark,
Thank you for the scripts. I have noticed that the jpg images extracted by voc_convert have some visible blocks as you can see in the attached images. Same thing happens even if I modified the image extension to 'png'.
I think this creates some unwanted edges on the image, do you have an idea why and how to solve it?
Many thanks,
Yiming

Getting this error while trying to download the videos...please help!

Hi I am really new to this and I am getting the following error...any help?

MBP:youtube-bb-master_jul19 Mac$ python3 download.py [videos] [30]

Traceback (most recent call last):
File "download.py", line 43, in
parse_and_sched(sys.argv[1],int(sys.argv[2]))
ValueError: invalid literal for int() with base 10: '[30]'

Cutting code uses keyframe search

The download and cut script currently uses keyframe searching when cutting into clips. Full re-encoding is necessary to ensure frame alignment: http://www.markbuckler.com/post/cutting-ffmpeg/

upload extracted images

Hi,

Thanks for your sharing. I am planning to work on Youtube-BB dataset too. The problem is, there are tons of videos and I don't really need to use all of them except frames with bounding box annotations. Others have expressed similar demands in stackoverflow.

So could you please upload only images with annotations? I am sure it will contribute to our community by making this dataset more available to researchers with limited storage or internet bandwidth.

connecting failed

Connecting to research.google.com (research.google.com)|2404:6800:4008:800::200e
|:443... failed: Unknown error.
Connecting to research.google.com (research.google.com)|172.217.24.14|:443... fa
iled: Unknown error.

I was downloading, but my computer shuts down. Can I resume where it stopped?

Error while download videos using download.py

While downloading the youtube-bb-videos getting the following issue of directory not present. I am using python 3.7 on windows.

Complete Log:
C:\Users\spaul\Downloads\youtube-bb-master\youtube-bb-master>python download.py videos 1
Traceback (most recent call last):
_File "download.py", line 43, in
parse_and_sched(sys.argv[1],int(sys.argv[2]))
File "download.py", line 30, in parse_and_sched
check_call(['mkdir', '-p', dl_dir])
File "C:\python37\lib\subprocess.py", line 342, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\python37\lib\subprocess.py", line 323, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\python37\lib\subprocess.py", line 775, in init
restore_signals, start_new_session)
File "C:\python37\lib\subprocess.py", line 1178, in execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

I also tried to provide the complete path
C:\Users\spaul\Downloads\youtube-bb-master\youtube-bb-master>python download.py C:\Users\spaul\Downloads\youtube-bb-master\youtube-bb-master\videos\ 1

But still the same error. can you please tell me how to solve this?

how many images in youtubebbdevkit2017 after voc_convert.py?

Hi Mark,
I have been running the voc_convert.py for few days and there are 587703 images now. How many in total should I expect?
Best,
Yiming

Why is there only a progress bar when downloading, but there is no video in folder?

About the dataset size

Hello, thank you for your sharing. But i have some problems about the dataset.
I downloaded 74459 videos successfully, and it takes 1.3T of space...
It begin to dowloaded yt_bb_detection_validation.csv.gz , but my hard drive ran out of space.

Does this mean the download of yt_bb_detection_train.csv is complete?
Can i use voc_convert.py to dealing with the 70,000 videos ?
How much space will it take for all video downloads of the training set and validation set?
And how much space will the dataset take after use the voc_convert.py ?

the running speed of voc_convert.py

Hi, your repo is good , and I hava downloaded the youtube data, now I wanna decode them into the voc training data, but I found the running speed is so slow,and it also has a bad case:

We should use loop to handle all data in train.csv, it will cost a lot of time
after handling all data in train.csv, we get the "present_annots" variable, and to decode frame, but it will break because of the [error 26]Too many open files: '/dev/null'

is there any good solution to avoid this ? only try more threads to speed up? I have used 64 threads... but it is also slow...

wget error ?

Now getting a new error message, and i have installed wget on my system.

Safats-MBP:youtube_BB_jul19 SMT_Mac$ python3 download.py vid_dir 30

yt_bb_detection_validation: Downloading annotations...
Traceback (most recent call last):
File "download.py", line 41, in
parse_and_sched(sys.argv[1],int(sys.argv[2]))
File "download.py", line 32, in parse_and_sched
annotations,clips,vids = youtube_bb.parse_annotations(d_set,dl_dir)
File "/Users/SMT_Mac/Desktop/youtube_BB_jul19/youtube_bb.py", line 182, in parse_annotations
check_call(['wget', web_host+d_set+'.csv.gz'])
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 579, in check_call
retcode = call(*popenargs, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 560, in call
with Popen(*popenargs, **kwargs) as p:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 950, in init
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1544, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'wget'