Giter Site home page Giter Site logo

martouta / speech_processor Goto Github PK

View Code? Open in Web Editor NEW
18.0 18.0 0.0 1.74 MB

Speech-to-text from videos and audios (including youtube and tiktok links)

License: GNU General Public License v3.0

Python 96.38% Mako 1.14% Makefile 1.56% Shell 0.91%
python speech-recognition speech-to-text tiktok youtube

speech_processor's People

Contributors

dependabot[bot] avatar martouta avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

speech_processor's Issues

Save timestamp in subtitles

  • Add models Duration & RecognitionLine. #/140
  • Save timestamp in SRT subtitles (in files) - event empty lines ๐Ÿ˜‚. #138
  • Do not save empty lines in subs (SRT files and MongoDB). #141
  • [Refactor] Extract Network requests into a Service class. #142
  • Save timestamps in MongoDB. #174

Bug: urllib.error.URLError: urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate

marta@Martas-MacBook-Pro speech_processor % MAX_THREADS=8 INPUT_FILE='tests/fixtures/example_input.json' SPEECH_ENV='production' SUBS_LOCATION='file' python3 -u .

2021-10-25 04:00:50,661 - root - INFO - [1/6] Downloading multimedia from URL ... [123145541492736-id_zWQJqt_D-vo-10-25.02:00:50661682]
2021-10-25 04:00:50,662 - root - INFO - [1/6] Downloading multimedia from URL ... [123145546747904-id_CNHe4qXqsck-10-25.02:00:50662092]
2021-10-25 04:00:50,725 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
2021-10-25 04:00:50,725 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
2021-10-25 04:00:50,727 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
2021-10-25 04:00:50,727 - root - ERROR - <class 'urllib.error.URLError'> : <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>
TRACEBACK:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1032, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 972, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1447, in connect
server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./app/process_resource.py", line 14, in process_resource
return __process_resource(json_parsed)
File "./app/process_resource.py", line 29, in __process_resource
filepath = download_multimedia_from_url(recognition_id, json_parsed)
File "./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
__download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
File "./app/download_multimedia_from_url.py", line 34, in __download_youtube_video
YouTube(f"youtube.com/watch?v={youtube_reference_id}")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 291, in streams
self.check_availability()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 206, in check_availability
status, messages = extract.playability_status(self.watch_html)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/main.py", line 98, in watch_html
self._watch_html = request.get(url=self.watch_url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 53, in get
response = _execute_request(url, headers=extra_headers, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytube/request.py", line 37, in _execute_request
return urlopen(request, timeout=timeout) # nosec
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)>

[Bug] memory issue in "[2/6] Saving audio as WAP ..."

The OOMKilled error, also indicated by exit code 137, means that a container or pod was terminated because they used more memory than allowed. OOM stands for "Out Of Memory"

For youtube videos around 45 minutes from National Geography Abu Dhabi.

Code of the issue

@staticmethod
def save_as_wav(recognition_id, original_file_path):
sound = AudioSegment.from_file(original_file_path)
name = re.match("^.*\\/([^/]*)\\.(mp\\d+|wav)$",
original_file_path).group(1)
sp_path = Path(__file__).resolve().parent.parent
new_path = f"{sp_path}/audios/{os.environ['SPEECH_ENV']}/{name}.wav"
sound.export(new_path, format='wav')
return ResourceAudio(recognition_id, AudioSegment.from_wav(new_path))

K8s Pod description

marta@Martas-MacBook-Pro my-language-experience-web % kubectl describe pod speech-processor-...
Name:         speech-processor-...
...
Start Time:   Sat, 14 May 2022 18:21:36 +0200
...
Containers:
  speech-processor:
    Container ID:  ...
    Image:         martouta/speech_processor:v1.0.7
    Image ID:      ...
    ...
    Args:
      python3
      -u
      .
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sun, 15 May 2022 10:40:42 +0200
      Finished:     Sun, 15 May 2022 10:43:30 +0200
    Ready:          False
    Restart Count:  7
    Environment:
      ...
      MAX_THREADS:                     4
    Mounts:
      ...
...
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Normal   Pulled   12m                   kubelet  Successfully pulled image "martouta/speech_processor:v1.0.7" in 1.078442083s
  Normal   Pulled   7m18s                 kubelet  Successfully pulled image "martouta/speech_processor:v1.0.7" in 944.444131ms
  Normal   Pulled   5m38s                 kubelet  Successfully pulled image "martouta/speech_processor:v1.0.7" in 918.041631ms
  Normal   Pulled   3m58s                 kubelet  Successfully pulled image "martouta/speech_processor:v1.0.7" in 903.742868ms
  Normal   Pulling  3m58s (x8 over 16h)   kubelet  Pulling image "martouta/speech_processor:v1.0.7"
  Normal   Started  3m57s (x8 over 16h)   kubelet  Started container speech-processor
  Normal   Created  3m57s (x8 over 16h)   kubelet  Created container speech-processor
  Warning  BackOff  24s (x11 over 7m31s)  kubelet  Back-off restarting failed container

[Bug] Possible Issue with Galia Processing

I have encountered a potential issue where the speech_processor tool does not seem to work as expected with Galia. I have a perception that the processing is not functioning correctly or is incompatible when using Galia.

Clarify what 'GOOGLE_LOCAL' means.

It seems that it works locally when what I really mean is "use it only locally to check if changes work".
Also explain/show how to use it in Production.

[Bug] Error trying to download tiktok - Browser closed unexpectedly

2022-07-04 08:39:46,176 - root - INFO - [1/6] Downloading multimedia from URL ... [139726513235712-5625-07-04.08:39:46174075]
[INFO] Starting Chromium download.
2022-07-04 08:39:46,214 - pyppeteer.chromium_downloader - INFO - Starting Chromium download.
   0%|          | 0.00/109M [00:00<?, ?b/s]  24%|โ–ˆโ–ˆโ–       | 26.4M/109M [00:00<00:00, 264Mb/s]  48%|โ–ˆโ–ˆโ–ˆโ–ˆโ–Š     | 52.7M/109M [00:00<00:00, 249Mb/s]  76%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ  | 82.9M/109M [00:00<00:00, 272Mb/s] 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 109M/109M [00:00<00:00, 273Mb/s] 
[INFO] Beginning extraction
2022-07-04 08:39:46,787 - pyppeteer.chromium_downloader - INFO - Beginning extraction
[INFO] Chromium extracted to: /root/.local/share/pyppeteer/local-chromium/588429
2022-07-04 08:39:48,910 - pyppeteer.chromium_downloader - INFO - Chromium extracted to: /root/.local/share/pyppeteer/local-chromium/588429
2022-07-04 08:40:18,928 - app.process_resource - ERROR - <class 'pyppeteer.errors.BrowserError'> : Browser closed unexpectedly:
    TRACEBACK:
    Traceback (most recent call last):
  File "/usr/src/app/./app/process_resource.py", line 14, in process_resource
    return __process_resource(input_item)
  File "/usr/src/app/./app/process_resource.py", line 23, in __process_resource
    filepath = input_item.save()
  File "/usr/src/app/./app/input_items/input_item.py", line 24, in save
    self.download(filepath)
  File "/usr/src/app/./app/input_items/input_item_tiktok.py", line 15, in download
    api.downloadVideoById(self.id, filepath)
  File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktokapi.py", line 239, in downloadVideoById
    video_info = self.getVideoById(video_id)
  File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktokapi.py", line 236, in getVideoById
    return self.send_get_request(url, params)
  File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktokapi.py", line 74, in send_get_request
    signature = self.tiktok_browser.fetch_auth_params(url, language=self.language)
  File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktok_browser.py", line 54, in fetch_auth_params
    return asyncio.new_event_loop().run_until_complete(self.async_fetch_auth_params(url, language))
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/TikTokAPI/tiktok_browser.py", line 57, in async_fetch_auth_params
    browser = await launch(self.options)
  File "/usr/local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 307, in launch
    return await Launcher(options, **kwargs).launch()
  File "/usr/local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 168, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/usr/local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

Error fetching YT media: KeyError: 'streamingData'

2023-04-28 11:53:26,148 - root - INFO - [1/6] Downloading multimedia from URL ... [140487100974848-11640-04-28.11:53:26148304]
2023-04-28 11:53:26,844 - app.process_resource - ERROR - <class 'KeyError'> : 'streamingData'
    TRACEBACK:
    Traceback (most recent call last):
  File "/usr/src/app/app/process_resource.py", line 13, in process_resource
    return input_item.call_resource_processor()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/app/input_items/input_item.py", line 29, in call_resource_processor
    return processor_class(self).call()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/app/services/resource_processors/ai_resource_processor.py", line 11, in call
    filepath = self.input_item.save()
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/app/input_items/input_item.py", line 24, in save
    self.download(filepath)
  File "/usr/src/app/app/input_items/input_item_youtube.py", line 13, in download
    .streams \
     ^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pytube/__main__.py", line 296, in streams
    return StreamQuery(self.fmt_streams)
                       ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pytube/__main__.py", line 176, in fmt_streams
    stream_manifest = extract.apply_descrambler(self.streaming_data)
                                                ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pytube/__main__.py", line 161, in streaming_data
    return self.vid_info['streamingData']
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'streamingData'

Support audio .m4a

Example:

from pydub import AudioSegment

# read in .m4a file
audio = AudioSegment.from_file("audio.m4a", format="m4a")

# export to .wav file
audio.export("audio.wav", format="wav")

Error cleaning up leftovers: AttributeError: 'NoneType' object has no attribute 'group'

2023-04-28 11:42:35,459 - root - INFO - [6/6] Cleaning up temporary generated files ... [140487100974848-12477-04-28.11:42:05304209]
2023-04-28 11:42:35,462 - app.process_resource - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'group'
    TRACEBACK:
    Traceback (most recent call last):
  File "/usr/src/app/app/process_resource.py", line 13, in process_resource
    return input_item.call_resource_processor()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/app/input_items/input_item.py", line 29, in call_resource_processor
    return processor_class(self).call()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/app/services/resource_processors/ai_resource_processor.py", line 21, in call
    TemporaryFilesCleaner.call(self.recognition_id(), filepath)
  File "/usr/src/app/app/services/temporary_files_cleaner.py", line 19, in call
    downloaded_multimedia_path).group(1)
                                ^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

AttributeError: 'NoneType' object has no attribute 'span'; File "/usr/local/lib/python3.10/site-packages/pytube/parser.py"

speech_processor_1 | 2021-12-16 15:15:32,620 - root - INFO - [1/6] Downloading multimedia from URL ... [140638112765696-1-12-16.15:15:32608107]
speech_processor_1 | 2021-12-16 15:15:34,391 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
speech_processor_1 | TRACEBACK:
speech_processor_1 | Traceback (most recent call last):
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
speech_processor_1 | return __process_resource(json_parsed, recognition_id)
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
speech_processor_1 | filepath = download_multimedia_from_url(recognition_id, json_parsed)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
speech_processor_1 | __download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
speech_processor_1 | .streams
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 292, in streams
speech_processor_1 | return StreamQuery(self.fmt_streams)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 177, in fmt_streams
speech_processor_1 | extract.apply_signature(stream_manifest, self.vid_info, self.js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
speech_processor_1 | cipher = Cipher(js=js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 44, in init
speech_processor_1 | self.throttling_array = get_throttling_function_array(js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 323, in get_throttling_function_array
speech_processor_1 | str_array = throttling_array_split(array_raw)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/parser.py", line 158, in throttling_array_split
speech_processor_1 | match_start, match_end = match.span()
speech_processor_1 | AttributeError: 'NoneType' object has no attribute 'span'
speech_processor_1 | 2021-12-16 15:15:34,391 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
speech_processor_1 | TRACEBACK:
speech_processor_1 | Traceback (most recent call last):
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
speech_processor_1 | return __process_resource(json_parsed, recognition_id)
speech_processor_1 | File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
speech_processor_1 | filepath = download_multimedia_from_url(recognition_id, json_parsed)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
speech_processor_1 | __download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
speech_processor_1 | File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
speech_processor_1 | .streams
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 292, in streams
speech_processor_1 | return StreamQuery(self.fmt_streams)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/main.py", line 177, in fmt_streams
speech_processor_1 | extract.apply_signature(stream_manifest, self.vid_info, self.js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
speech_processor_1 | cipher = Cipher(js=js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 44, in init
speech_processor_1 | self.throttling_array = get_throttling_function_array(js)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 323, in get_throttling_function_array
speech_processor_1 | str_array = throttling_array_split(array_raw)
speech_processor_1 | File "/usr/local/lib/python3.10/site-packages/pytube/parser.py", line 158, in throttling_array_split
speech_processor_1 | match_start, match_end = match.span()
speech_processor_1 | AttributeError: 'NoneType' object has no attribute 'span'

2022-02-07 04:02:25,086 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'

2022-02-07 04:02:23,950 - root - INFO - [1/6] Downloading multimedia from URL ... [140451010660096-2086-02-07.04:02:23950386]
2022-02-07 04:02:25,086 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
    TRACEBACK:
    Traceback (most recent call last):
  File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
    return __process_resource(json_parsed, recognition_id)
  File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
    filepath = download_multimedia_from_url(recognition_id, json_parsed)
  File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
    __download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
  File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
    .streams \
  File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 292, in streams
    return StreamQuery(self.fmt_streams)
  File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 177, in fmt_streams
    extract.apply_signature(stream_manifest, self.vid_info, self.js)
  File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
    cipher = Cipher(js=js)
  File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 43, in __init__
    self.throttling_plan = get_throttling_plan(js)
  File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 387, in get_throttling_plan
    raw_code = get_throttling_function_code(js)
  File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 301, in get_throttling_function_code
    code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
AttributeError: 'NoneType' object has no attribute 'span'
2022-02-07 04:02:25,086 - root - ERROR - <class 'AttributeError'> : 'NoneType' object has no attribute 'span'
    TRACEBACK:
    Traceback (most recent call last):
  File "/usr/src/app/./app/process_resource.py", line 15, in process_resource
    return __process_resource(json_parsed, recognition_id)
  File "/usr/src/app/./app/process_resource.py", line 25, in __process_resource
    filepath = download_multimedia_from_url(recognition_id, json_parsed)
  File "/usr/src/app/./app/download_multimedia_from_url.py", line 20, in download_multimedia_from_url
    __download_youtube_video(json_parsed['youtube_reference_id'], fp_tuple)
  File "/usr/src/app/./app/download_multimedia_from_url.py", line 35, in __download_youtube_video
    .streams \
  File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 292, in streams
    return StreamQuery(self.fmt_streams)
  File "/usr/local/lib/python3.10/site-packages/pytube/__main__.py", line 177, in fmt_streams
    extract.apply_signature(stream_manifest, self.vid_info, self.js)
  File "/usr/local/lib/python3.10/site-packages/pytube/extract.py", line 409, in apply_signature
    cipher = Cipher(js=js)
  File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 43, in __init__
    self.throttling_plan = get_throttling_plan(js)
  File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 387, in get_throttling_plan
    raw_code = get_throttling_function_code(js)
  File "/usr/local/lib/python3.10/site-packages/pytube/cipher.py", line 301, in get_throttling_function_code
    code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
AttributeError: 'NoneType' object has no attribute 'span'

[Bug] Incorrect Timestamps in Subtitles for YouTube Video

Description

I've encountered an issue with incorrect timestamps in the subtitles generated for a YouTube video using the speech_processor tool.

Steps to Reproduce

  1. Prepare an input JSON file named xjEFo3a1AnI_input.json with the following content:
   [
     {
       "integration": "youtube",
       "id": "xjEFo3a1AnI",
       "language_code": "en-US",
       "resource_id": 7231,
       "recognizer": "google",
       "captions": "try"
     }
   ]   

Note: The YouTube video in question exists and has manual captions in English (US).

  1. Run the speech_processor with the following command:
LOG_OUTPUT=standard MAX_THREADS=8 INPUT_FILE='xjEFo3a1AnI_input.json' SPEECH_ENV='development' SUBS_LOCATION='file' python3 .
  1. After the process completes, inspect the generated subtitles in resources/subtitles/development.
$ cd resources/subtitles/development
$ grep -n "1038" 123145579446272-7231-11-11.18:03:26723463-subs.srt | cut -d: -f1 | xargs -I {} awk 'NR>={}-5 && NR<={}+5'  123145579446272-7231-11-11.18:03:26723463-subs.srt

Observed Behavior

The timestamps in the generated subtitles file 123145579446272-7231-11-11.18:03:26723463-subs.srt are incorrect. For instance, the following excerpt shows an issue with the timestamps:

1037
00:00:02,731 --> 00:00:02,736
Usually, for depression, I
want to see at least greater

1038
00:00:02,736 --> 00:00:02,739
than probably 0.8 minimal.

1039
00:00:02,739 --> 00:00:02,744

Expected Behavior

The timestamps in the subtitles should accurately reflect the timing of the spoken words in the video.

Additional Information

Environment: Development
Tool Version: v3.0.0
Python Version: Python 3.10.0
Operating System: macOS 14.1.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.