martouta / speech_processor Goto Github PK
View Code? Open in Web Editor NEWSpeech-to-text from videos and audios (including youtube and tiktok links)
License: GNU General Public License v3.0
Speech-to-text from videos and audios (including youtube and tiktok links)
License: GNU General Public License v3.0
โ๐ผ
๐โโ๏ธ
๐
โ
This may be tricky ๐ฌ
๐
marta@Martas-MacBook-Pro speech_processor % MAX_THREADS=8 INPUT_FILE='tests/fixtures/example_input.json' SPEECH_ENV='production' SUBS_LOCATION='file' python3 -u .
2021-10-25 04:00:50,661 - root - INFO - [1/6] Downloading multimedia from URL ... [123145541492736-id_zWQJqt_D-vo-10-25.02:00:50661682]
2021-10-25 04:00:50,662 - root - INFO - [1/6] Downloading multimedia from URL ... [123145546747904-id_CNHe4qXqsck-10-25.02:00:50662092]
2021-10-25 04:00:50,725 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
2021-10-25 04:00:50,725 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
2021-10-25 04:00:50,727 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
2021-10-25 04:00:50,727 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
๐
The OOMKilled
error, also indicated by exit code 137, means that a container or pod was terminated because they used more memory than allowed. OOM stands for "Out Of Memory"
For youtube videos around 45 minutes from National Geography Abu Dhabi.
speech_processor/app/resource_audio.py
Lines 19 to 27 in 248b209
marta@Martas-MacBook-Pro my-language-experience-web % kubectl describe pod speech-processor-...
Name: speech-processor-...
...
Start Time: Sat, 14 May 2022 18:21:36 +0200
...
Containers:
speech-processor:
Container ID: ...
Image: martouta/speech_processor:v1.0.7
Image ID: ...
...
Args:
python3
-u
.
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Sun, 15 May 2022 10:40:42 +0200
Finished: Sun, 15 May 2022 10:43:30 +0200
Ready: False
Restart Count: 7
Environment:
...
MAX_THREADS: 4
Mounts:
...
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 12m kubelet Successfully pulled image "martouta/speech_processor:v1.0.7" in 1.078442083s
Normal Pulled 7m18s kubelet Successfully pulled image "martouta/speech_processor:v1.0.7" in 944.444131ms
Normal Pulled 5m38s kubelet Successfully pulled image "martouta/speech_processor:v1.0.7" in 918.041631ms
Normal Pulled 3m58s kubelet Successfully pulled image "martouta/speech_processor:v1.0.7" in 903.742868ms
Normal Pulling 3m58s (x8 over 16h) kubelet Pulling image "martouta/speech_processor:v1.0.7"
Normal Started 3m57s (x8 over 16h) kubelet Started container speech-processor
Normal Created 3m57s (x8 over 16h) kubelet Created container speech-processor
Warning BackOff 24s (x11 over 7m31s) kubelet Back-off restarting failed container
I have encountered a potential issue where the speech_processor tool does not seem to work as expected with Galia. I have a perception that the processing is not functioning correctly or is incompatible when using Galia.
๐ค๐ผ
Closed in favor of #14
๐
๐๐
It seems that it works locally when what I really mean is "use it only locally to check if changes work".
Also explain/show how to use it in Production.
Closed by #19
2022-07-04 08:39:46,176 - root - INFO - [1/6] Downloading multimedia from URL ... [139726513235712-5625-07-04.08:39:46174075]
[INFO] Starting Chromium download.
2022-07-04 08:39:46,214 - pyppeteer.chromium_downloader - INFO - Starting Chromium download.
0%| | 0.00/109M [00:00<?, ?b/s] 24%|โโโ | 26.4M/109M [00:00<00:00, 264Mb/s] 48%|โโโโโ | 52.7M/109M [00:00<00:00, 249Mb/s] 76%|โโโโโโโโ | 82.9M/109M [00:00<00:00, 272Mb/s] 100%|โโโโโโโโโโ| 109M/109M [00:00<00:00, 273Mb/s]
[INFO] Beginning extraction
2022-07-04 08:39:46,787 - pyppeteer.chromium_downloader - INFO - Beginning extraction
[INFO] Chromium extracted to: /root/.local/share/pyppeteer/local-chromium/588429
2022-07-04 08:39:48,910 - pyppeteer.chromium_downloader - INFO - Chromium extracted to: /root/.local/share/pyppeteer/local-chromium/588429
2022-07-04 08:40:18,928 - app.process_resource - ERROR - <class 'pyppeteer.errors.BrowserError'> : Browser closed unexpectedly:
TRACEBACK:
Traceback (most recent call last):
File "/usr/src/app/./app/process_resource.py", line 14, in process_resource
return __process_resource(input_item)
File "/usr/src/app/./app/process_resource.py", line 23, in __process_resource
filepath = input_item.save()
File "/usr/src/app/./app/input_items/input_item.py", line 24, in save
self.download(filepath)
File "/usr/src/app/./app/input_items/input_item_tiktok.py", line 15, in download
api.downloadVideoById(self.id, filepath)
File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktokapi.py", line 239, in downloadVideoById
video_info = self.getVideoById(video_id)
File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktokapi.py", line 236, in getVideoById
return self.send_get_request(url, params)
File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktokapi.py", line 74, in send_get_request
signature = self.tiktok_browser.fetch_auth_params(url, language=self.language)
File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktok_browser.py", line 54, in fetch_auth_params
return asyncio.new_event_loop().run_until_complete(self.async_fetch_auth_params(url, language))
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktok_browser.py", line 57, in async_fetch_auth_params
browser = await launch(self.options)
File "/usr/local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/usr/local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/usr/local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
๐
2023-04-28 11:53:26,148 - root - INFO - [1/6] Downloading multimedia from URL ... [140487100974848-11640-04-28.11:53:26148304]
2023-04-28 11:53:26,844 - app.process_resource - ERROR - <class 'KeyError'> : 'streamingData'
TRACEBACK:
Traceback (most recent call last):
File "/usr/src/app/app/process_resource.py", line 13, in process_resource
return input_item.call_resource_processor()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/app/app/input_items/input_item.py", line 29, in call_resource_processor
return processor_class(self).call()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/app/app/services/resource_processors/ai_resource_processor.py", line 11, in call
filepath = self.input_item.save()
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/app/app/input_items/input_item.py", line 24, in save
self.download(filepath)
File "/usr/src/app/app/input_items/input_item_youtube.py", line 13, in download
.streams \
^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pytube/__main__.py", line 296, in streams
return StreamQuery(self.fmt_streams)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pytube/__main__.py", line 176, in fmt_streams
stream_manifest = extract.apply_descrambler(self.streaming_data)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pytube/__main__.py", line 161, in streaming_data
return self.vid_info['streamingData']
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'streamingData'
๐ฆ
๐ฐ
๐
Example:
from pydub import AudioSegment
# read in .m4a file
audio = AudioSegment.from_file("audio.m4a", format="m4a")
# export to .wav file
audio.export("audio.wav", format="wav")
https://pytube.io/en/latest/user/captions.html
Make it optional by resource
Too generic โ๏ธ
It's failing again. It is fixed in pytube/pytube#1327
Less bandwidth is needed ๐ we do not need the video, but only the audio
https://github.com/pytube/pytube/blob/bff3e77253e39ee4e1865f8b9dc03632346ba9b5/pytube/query.py#L34
๐
2023-04-28 11:42:35,459 - root - INFO - [6/6] Cleaning up temporary generated files ... [140487100974848-12477-04-28.11:42:05304209]
2023-04-28 11:42:35,462 - app.process_resource - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'group'
TRACEBACK:
Traceback (most recent call last):
File "/usr/src/app/app/process_resource.py", line 13, in process_resource
return input_item.call_resource_processor()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/app/app/input_items/input_item.py", line 29, in call_resource_processor
return processor_class(self).call()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/app/app/services/resource_processors/ai_resource_processor.py", line 21, in call
TemporaryFilesCleaner.call(self.recognition_id(), filepath)
File "/usr/src/app/app/services/temporary_files_cleaner.py", line 19, in call
downloaded_multimedia_path).group(1)
^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
Read #75 (comment)
speech_processor_1 | 2021-12-16 15:15:32,620 - root - INFO - [1/6] Downloading multimedia from URL ... [140638112765696-1-12-16.15:15:32608107]
speech_processor_1 | 2021-12-16 15:15:34,391 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
speech_processor_1 | TRACEBACK:
speech_processor_1 | Traceback (most recent call last):
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
speech_processor_1 | return __process_resource(json_parsed, recognition_id)
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
speech_processor_1 | filepath = download_multimedia_from_url(recognition_id, json_parsed)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
speech_processor_1 | __download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
speech_processor_1 | .streams
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 292, in streams
speech_processor_1 | return StreamQuery(self.fmt_streams)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 177, in fmt_streams
speech_processor_1 | extract.apply_signature(stream_manifest, self.vid_info, self.js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
speech_processor_1 | cipher = Cipher(js=js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 44, in init
speech_processor_1 | self.throttling_array = get_throttling_function_array(js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 323, in get_throttling_function_array
speech_processor_1 | str_array = throttling_array_split(array_raw)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/parser.py", line 158, in throttling_array_split
speech_processor_1 | match_start, match_end = match.span()
speech_processor_1 | AttributeError: 'NoneType' object has no attribute 'span'
speech_processor_1 | 2021-12-16 15:15:34,391 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
speech_processor_1 | TRACEBACK:
speech_processor_1 | Traceback (most recent call last):
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
speech_processor_1 | return __process_resource(json_parsed, recognition_id)
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
speech_processor_1 | filepath = download_multimedia_from_url(recognition_id, json_parsed)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
speech_processor_1 | __download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
speech_processor_1 | .streams
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 292, in streams
speech_processor_1 | return StreamQuery(self.fmt_streams)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 177, in fmt_streams
speech_processor_1 | extract.apply_signature(stream_manifest, self.vid_info, self.js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
speech_processor_1 | cipher = Cipher(js=js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 44, in init
speech_processor_1 | self.throttling_array = get_throttling_function_array(js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 323, in get_throttling_function_array
speech_processor_1 | str_array = throttling_array_split(array_raw)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/parser.py", line 158, in throttling_array_split
speech_processor_1 | match_start, match_end = match.span()
speech_processor_1 | AttributeError: 'NoneType' object has no attribute 'span'
๐งน
2022-02-07 04:02:23,950 - root - INFO - [1/6] Downloading multimedia from URL ... [140451010660096-2086-02-07.04:02:23950386]
2022-02-07 04:02:25,086 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
TRACEBACK:
Traceback (most recent call last):
File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
return __process_resource(json_parsed, recognition_id)
File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
.streams \
File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 292, in streams
return StreamQuery(self.fmt_streams)
File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 177, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 43, in __init__
self.throttling_plan = get_throttling_plan(js)
File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 387, in get_throttling_plan
raw_code = get_throttling_function_code(js)
File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 301, in get_throttling_function_code
code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
AttributeError: 'NoneType' object has no attribute 'span'
2022-02-07 04:02:25,086 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
TRACEBACK:
Traceback (most recent call last):
File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
return __process_resource(json_parsed, recognition_id)
File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
.streams \
File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 292, in streams
return StreamQuery(self.fmt_streams)
File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 177, in fmt_streams
extract.apply_signature(stream_manifest, self.vid_info, self.js)
File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
cipher = Cipher(js=js)
File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 43, in __init__
self.throttling_plan = get_throttling_plan(js)
File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 387, in get_throttling_plan
raw_code = get_throttling_function_code(js)
File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 301, in get_throttling_function_code
code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
AttributeError: 'NoneType' object has no attribute 'span'
I've encountered an issue with incorrect timestamps in the subtitles generated for a YouTube video using the speech_processor tool.
xjEFo3a1AnI_input.json
with the following content: [
{
"integration": "youtube",
"id": "xjEFo3a1AnI",
"language_code": "en-US",
"resource_id": 7231,
"recognizer": "google",
"captions": "try"
}
]
Note: The YouTube video in question exists and has manual captions in English (US).
LOG_OUTPUT=standard MAX_THREADS=8 INPUT_FILE='xjEFo3a1AnI_input.json' SPEECH_ENV='development' SUBS_LOCATION='file' python3 .
resources/subtitles/development
.$ cd resources/subtitles/development
$ grep -n "1038" 123145579446272-7231-11-11.18:03:26723463-subs.srt | cut -d: -f1 | xargs -I {} awk 'NR>={}-5 && NR<={}+5' 123145579446272-7231-11-11.18:03:26723463-subs.srt
The timestamps in the generated subtitles file 123145579446272-7231-11-11.18:03:26723463-subs.srt
are incorrect. For instance, the following excerpt shows an issue with the timestamps:
1037
00:00:02,731 --> 00:00:02,736
Usually, for depression, I
want to see at least greater
1038
00:00:02,736 --> 00:00:02,739
than probably 0.8 minimal.
1039
00:00:02,739 --> 00:00:02,744
The timestamps in the subtitles should accurately reflect the timing of the spoken words in the video.
Environment: Development
Tool Version: v3.0.0
Python Version: Python 3.10.0
Operating System: macOS 14.1.1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.