h2soong / m3u8_to_mp4 Goto Github PK

View Code? Open in Web Editor NEW

65.0 3.0 24.0 214 KB

Python downloader for saving m3u8 videos to local MP4 files.

License: MIT License

Python 100.00%

m3u8 scrapy mp4 asynchronous-programming m3u8-downloader multithreading cdn-optimization concurrent

m3u8_to_mp4's Introduction

M3u8-To-MP4

Python downloader for saving m3u8 videos to local MP4 files.

QuickStart

Install m3u8_To_MP4 via pip

Preparation: configure ffmpeg. (e.g., Win10)
- Download "release full" build. It will have the largest set of libraries with greater functionality.
- Extract the contents in the ZIP file to a folder of your choice.
- To add FFmpeg to Win10 path. (User variables -> Path -> New and add)
- Verify. Open the Command Prompt or PowerShell window, type ffmpeg, and press Enter.

Installation: m3u8_To_MP4

# via pypi.org
python -m pip install m3u8_To_MP4

# first clone project, and install.
git clone https://github.com/songs18/m3u8_To_MP4.git
python -m pip install ./m3u8_To_MP4

Download a mp4 video

There are two options to download a m3u8 video into a mp4 file: async and multi-threads.

Multi-thread downloader (recommend)

import m3u8_To_MP4

if __name__ == '__main__':
    # 1. Download videos from uri.
    m3u8_To_MP4.multithread_download('http://videoserver.com/playlist.m3u8')

    # 2. Download videos from existing m3u8 files.
    m3u8_To_MP4.multithread_file_download('http://videoserver.com/playlist.m3u8',m3u8_file_path)

    # For compatibility, i reserve this api, but i do not recommend to you again.
    # m3u8_To_MP4.download('http://videoserver.com/playlist.m3u8')

Asynchronous downloader

import m3u8_To_MP4

if __name__ == '__main__':
    # 1. Download mp4 from uri.
    m3u8_To_MP4.async_download('http://videoserver.com/playlist.m3u8')

    # 2. Download mp4 from existing m3u8 files.
    m3u8_To_MP4.async_file_download('http://videoserver.com/playlist.m3u8',m3u8_file_path)

Resuming

If you use default tmp dir, resuming the transfer from the point of interruption will be executed automatically (based on crc32 hashing).

Custom http request header

In some cases, customized http request headers help to match some website requirements. For the available APIs, you can pass in a dictionary type header, which overrides the settings in the program. A simple example is:

import m3u8_To_MP4

if __name__ == '__main__':
    customized_http_header=dict()
    customized_http_header['Referer']='https://videoserver.com/'

    m3u8_To_MP4.multithread_download('http://videoserver.com/playlist.m3u8',customized_http_header=customized_http_header)

Features

Treat ffmpeg as a system service to achieve cross-platform.
If ffmpeg is not found, archiving is also supported. (v0.1.3 new features)
Resume from interruption. (based on crc32 temp directory path)
Use system tmp folder.
Concurrent requests based on the thread pool.
Concurrent requests based on efficient coroutines (v0.1.3 new features).
The retry strategy is carried out collectively after the whole cycle is repeated, avoiding the problem of short retry interval.
Download videos from existing m3u8 files.
Anti-crawler parameters based on customized request headers.
Clean codes based on inheritance.

TODO

Errors: application data after close notify (related to the Python interpreter).
Extract independent asynchronous http package.
Support IPv6.
Compare ffmpeg/avconv/mencoder/moviepy.
Support bilibili etc.

m3u8_to_mp4's People

Contributors

Stargazers

Watchers

m3u8_to_mp4's Issues

增加decode后，导致编码报错，去除该行后正常运行

https://github.com/songs18/m3u8_To_MP4/blob/66cfbaac2135d58c1fbf988efc84900fa4bfacba/m3u8_To_MP4/v2_abstract_task_processor.py#L156

rpi 4b
python 3.10.4
m3u8_To_MP 0.1.11

Something happened when I download a video!

FileNotFoundError

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\m3u8_To_MP4_init_.py:131 in multithread_download
crawler.fetch_mp4_by_m3u8_uri(True)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\m3u8_To_MP4\v2_abstract_crawler_processor.py:229 in fetch_mp4_by_m3u8_uri
self._merge_to_mp4_by_ffmpeg()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\m3u8_To_MP4\v2_abstract_crawler_processor.py:188 in _merge_to_mp4_by_ffmpeg
if os.path.getsize(self.mp4_file_path) < 1:

File :50 in getsize

FileNotFoundError: [WinError 2] 系统找不到指定的文件。: 'e:\test.mp4'

Certain m3u8s Reliably Fail

Not sure what these have in common. Around 90% or higher of m3u8 from this domain work fine. I have identified many which do not.

Here is a typical output from multithread_download:

2022-06-16 11:17:00,726 | INFO | Resolved available hosts:
2022-06-16 11:17:00,726 | INFO | 104.20.245.10:443
2022-06-16 11:17:00,726 | INFO | 104.20.244.10:443
2022-06-16 11:17:00,726 | INFO | 172.67.3.62:443
segment set: |##################################################| 100.0% downloaded segments successfully!
2022-06-16 11:17:06,473 | INFO | decrypt and dump segments...
2022-06-16 11:17:06,483 | INFO | merging segments...
2022-06-16 11:17:06,571 | INFO | download successfully！ take 5.85s,  average download speed is 0.00KB/s

But the resulting file is 0 size. Note that using Video DownloadHelper Firefox extension has no issue downloading these videos, and they play fine in the browser.
Here is one example. Note that nearly every other video on this website works fine for me with multithread_download.

https://streaming.britishpathe.com/hls-vod/MEDIA-5/archive/KQWA/2014-12-15T084152Z_1_LVAA29SOIM1N0DTY89EAVN9VAWQK_RTRWNEV_F_CHINA-DEMONSTRATION-OUTSIDE-REUTER-HOUSE-50-16MM-COLOUR.MP4.m3u8

If you want more examples let me know, I have dozens which don't work (and hundreds which do).

[email protected]

Hello, I have used this component for downloading files, but I have encountered an issue. My device has limited memory, and when downloading files larger than 1GB, it causes a memory overflow. I would like to know if there is a configuration method to reduce the buffer size or optimize the process.

Error processing segments

When you set a link of the format https://example.m3u8 everything works fine, but as soon as the parameters appear in the link by type https://example.m3u8?e=720&h=23c it throws an error while processing

As I understand it, this happens because forbidden characters for names in Windows appear in the link (such as a question mark, for example)

No problem on linux

m3u8 content may be obsfucated

最近遇到一些网站会把m3u8文件中的xxx.ts给改成其他后缀名，导致ffmpeg不能识别这些segment，最后不能合成正确的文件。

这个问题是这个包应该考虑的吗？

As far as I am concerned, some m3u8 would replace xxx.ts with some other suffix, like xxx.jpeg, which would lead to the result that ffmpeg fails to recognize the segments seemly with wrong extensions and finally fails to covert them into the video file.

Should this package deal with such a problem?

error in pydroid3

hi, is there any method or update for pydroid3 on android,thx

Unable to download video along with audio when there are multiple audio files in m3u8

basically what the url says even if I share the url here it wont be helpful as it has expiration token and else but this is what m3u8 content looked like when I requested from python hmm

b'#EXTM3U\n#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="hin",NAME="Hindi",DEFAULT=YES,AUTOSELECT=YES,URI="1080/audio/0.m3u8?token=e5839df81ef00f6a673a6af8adcc9b72&client=d41d8cd98f00b204e9800998ecf8427e&expires=1703476254&type=edge&node=PARI0g042ZWQe3BDJHsMT74uTgMLS7sirocCNiVrqyC6AhgjYAOn3ywFW8xvbzFhU90JUhHaG92x2oQwffVqdy5BM3h16QKN22FGC26yt1HxvOSYz67xzX1L0sYn19-2"\n#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="eng",NAME="English",DEFAULT=NO,AUTOSELECT=YES,URI="1080/audio/1.m3u8?token=e5839df81ef00f6a673a6af8adcc9b72&client=d41d8cd98f00b204e9800998ecf8427e&expires=1703476254&type=edge&node=PARI0g042ZWQe3BDJHsMT74uTgMLS7sirocCNiVrqyC6AhgjYAOn3ywFW8xvbzFhU90JUhHaG92x2oQwffVqdy5BM3h16QKN22FGC26yt1HxvOSYz67xzX1L0sYn19-2"\n#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="jpn",NAME="Japanese",DEFAULT=NO,AUTOSELECT=YES,URI="1080/audio/2.m3u8?token=e5839df81ef00f6a673a6af8adcc9b72&client=d41d8cd98f00b204e9800998ecf8427e&expires=1703476254&type=edge&node=PARI0g042ZWQe3BDJHsMT74uTgMLS7sirocCNiVrqyC6AhgjYAOn3ywFW8xvbzFhU90JUhHaG92x2oQwffVqdy5BM3h16QKN22FGC26yt1HxvOSYz67xzX1L0sYn19-2"\n#EXT-X-STREAM-INF:BANDWIDTH=549632,RESOLUTION=480x360,AUDIO="audio"\n360.m3u8?token=e5839df81ef00f6a673a6af8adcc9b72&client=d41d8cd98f00b204e9800998ecf8427e&expires=1703476254&type=edge&node=PARI0g042ZWQe3BDJHsMT74uTgMLS7sirocCNiVrqyC6AhgjYAOn3ywFW8xvbzFhU90JUhHaG92x2oQwffVqdy5BM3h16QKN22FGC26yt1HxvOSYz67xzX1L0sYn19-2\n#EXT-X-STREAM-INF:BANDWIDTH=985305,RESOLUTION=1280x720,AUDIO="audio"\n720.m3u8?token=e5839df81ef00f6a673a6af8adcc9b72&client=d41d8cd98f00b204e9800998ecf8427e&expires=1703476254&type=edge&node=PARI0g042ZWQe3BDJHsMT74uTgMLS7sirocCNiVrqyC6AhgjYAOn3ywFW8xvbzFhU90JUhHaG92x2oQwffVqdy5BM3h16QKN22FGC26yt1HxvOSYz67xzX1L0sYn19-2\n#EXT-X-STREAM-INF:BANDWIDTH=1498827,RESOLUTION=1920x1080,AUDIO="audio"\n1080.m3u8?token=e5839df81ef00f6a673a6af8adcc9b72&client=d41d8cd98f00b204e9800998ecf8427e&expires=1703476254&type=edge&node=PARI0g042ZWQe3BDJHsMT74uTgMLS7sirocCNiVrqyC6AhgjYAOn3ywFW8xvbzFhU90JUhHaG92x2oQwffVqdy5BM3h16QKN22FGC26yt1HxvOSYz67xzX1L0sYn19-2'

download taptap Video error

'utf-8' codec can't decode byte 0xe0 in position 1: invalid continuation byte

2022-12-17 14:55:30,173 | INFO | Resolved available hosts:
2022-12-17 14:55:30,173 | INFO | 195.175.181.160:443
2022-12-17 14:55:30,174 | INFO | 195.175.181.128:443

summary
m3u8_uri: https://btk-cdn01.cinema8.com/hls/content/entry/data/0/38/0_htzs22hi_0_97xxb0o1_2.mp4/index-v1-a1.m3u8;
max_retry_times: 3;
tmp_dir: C:\Users\ATES-PC\AppData\Local\Temp\m3u8_2138975056;
mp4_file_path: ./m3u8_To_MP4.mp4;

Output exceeds the size limit. Open the full output data in a text editor

UnicodeDecodeError Traceback (most recent call last)
d:\PRG\BTK_Akademi\my\dwnload.ipynb Hücre 3 in <cell line: 3>()
1 import m3u8_To_MP4
----> 3 m3u8_To_MP4.multithread_download('https://btk-cdn01.cinema8.com/hls/content/entry/data/0/38/0_htzs22hi_0_97xxb0o1_2.mp4/index-v1-a1.m3u8')

File c:\Users\ATES-PC\anaconda3\lib\site-packages\m3u8_To_MP4_init_.py:131, in multithread_download(m3u8_uri, customized_http_header, max_retry_times, max_num_workers, mp4_file_dir, mp4_file_name, tmpdir)
114 '''
115 Download mp4 video from given m3u uri.
116
(...)
122 :return:
123 '''
124 with m3u8_To_MP4.v2_multithreads_processor.MultiThreadsUriCrawler(m3u8_uri,
125 customized_http_header,
126 max_retry_times,
(...)
129 mp4_file_name,
130 tmpdir) as crawler:
--> 131 crawler.fetch_mp4_by_m3u8_uri(True)

File c:\Users\ATES-PC\anaconda3\lib\site-packages\m3u8_To_MP4\v2_abstract_crawler_processor.py:214, in AbstractCrawler.fetch_mp4_by_m3u8_uri(self, as_mp4)
211 self._resolve_DNS()
213 # resolve ts segment uris
--> 214 key_segments_pairs = self._create_tasks()
...
157 _encrypted_key = EncryptedKey(method=key.method,
158 value=encryped_value, iv=key.iv)
160 key_segments = m3u8_obj.segments.by_key(key)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 1: invalid continuation byte

The documentation can be improved.

There are many basic features that this lib provides, but the docs do not mention those.

filename error

It seems multithreaded_download with mp4_file_dir and mp4_file_name will always genreate random filename.
I think its because line 110 in v2_abstract_crawler_processor.py is wrong?

if os.path.exists(mp4_file_path):
    mp4_file_name = path_helper.random_name()
    mp4_file_path = os.path.join(self.mp4_file_dir, mp4_file_name)

should changed to

if not os.path.exists(mp4_file_path):
    mp4_file_name = path_helper.random_name()
    mp4_file_path = os.path.join(self.mp4_file_dir, mp4_file_name)

某些 *.ts 切片路径访问404出现卡死问题

此处只校验了200状态会引发其他系列问题可否增加自定义状态码忽略

Unable to download mp4 from m3u8 on Google Colab

I'm trying to download mo4 from m3u8 on Colab

import m3u8_To_MP4

m3u8_To_MP4.multithread_download('https://ww3.gogoanime2.org/playlist/MjYzMw==.m3u8')

Output

summary
m3u8_uri: https://ww3.gogoanime2.org/playlist/MjYzMw==.m3u8;
max_retry_times: 3;
tmp_dir: /tmp/m3u8_2184998115;
mp4_file_path: ./m3u8_To_MP4.mp4;

---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
[<ipython-input-30-677d5e98cc80>](https://localhost:8080/#) in <module>
    1 import m3u8_To_MP4
    2 
----> 3 m3u8_To_MP4.multithread_download('https://ww3.gogoanime2.org/playlist/MjYzMw==.m3u8')

4 frames
[/usr/lib/python3.7/socket.py](https://localhost:8080/#) in getaddrinfo(host, port, family, type, proto, flags)
    750     # and socket type values to enum constants.
    751     addrlist = []
--> 752     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    753         af, socktype, proto, canonname, sa = res
    754         addrlist.append((_intenum_converter(af, AddressFamily),

gaierror: [Errno -8] Servname not supported for ai_socktype

m3u8_To_MP4.async_download('https://ww3.gogoanime2.org/playlist/MjYzMw==.m3u8')

Output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-8723246d256a> in <module>
    1 import m3u8_To_MP4
    2 
----> 3 m3u8_To_MP4.async_download('https://ww3.gogoanime2.org/playlist/MjYzMw==.m3u8')

3 frames
/usr/lib/python3.7/genericpath.py in exists(path)
    17     """Test whether a path exists.  Returns False for broken symbolic links"""
    18     try:
---> 19         os.stat(path)
    20     except OSError:
    21         return False

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Error: cannot find file

I'm using multithread_download() with url.
Downloading goes well in console with progress %. When it's done, it shows error 'cannot find file path './m3u8_To_Mp4.mp4'.
I tried create a file './m3u8_To_Mp4.mp4' advancedly but it still doesn't work.
Did I miss something?

Implement support for downloading using proxies.

I would like to make a contribution to support downloading m3u8 files using a proxy for both multithreaded and async methods.

Installing Kivy in termux

Some one help me to install kivy in termux

"1 segments are failed to download, retry..."

This is related to #13, but I think it's still different enough to warrant another ticket.

I'm in "v2_multithreaded_processor" line +- 100: one of the "key_segments_pairs" failed, but the retry-counter doesn't jump in to stop the while-loop from retrying endlessly.

Grtz,
Steven

Issue with video link with query parameter in it

If the videos in side the m3u8 list look like
https://www.google.com/video0.ts?some=else
The path_helper.py will resolve the file name (function resolve_file_name_by_uri) to video0.ts?some=else, and it wouldn't work (at least on Windows, didn't try it on Linux).

The fix would be adding a split by ? to remove any query parameters.
name = uri.split('/')[-1].split('?')[0]

[Question]Is it possible download videos with authentication?

Hello everyone, Can I download videos with token?

Cheers.

Issues when playing downloaded media in VLC (audio cutting)

after downloading video media with this tool and playing it in VLC the audio cuts out for half a second or so every few seconds, it doesn't happen with any other files downloaded normally, here are the logs:

main warning: playback way too early (-128157): playing silence
main debug: inserting 6151 zeroes
main warning: playback too early (-66620): down-sampling
main debug: resampling stopped (drift: -12346 us)
main warning: playback too early (-86739): down-sampling
main warning: playback way too early (-121084): playing silence
main debug: inserting 5812 zeroes
main warning: playback too early (-81265): down-sampling
main warning: playback way too early (-151394): playing silence
main debug: inserting 7266 zeroes
main warning: playback too early (-67626): down-sampling
main warning: playback way too early (-135167): playing silence
main debug: inserting 6488 zeroes
main warning: playback too early (-66509): down-sampling
main debug: auto hiding mouse cursor
main warning: playback way too early (-121468): playing silence
main debug: inserting 5830 zeroes
main warning: playback too early (-68385): down-sampling
main warning: playback way too early (-139303): playing silence
main debug: inserting 6686 zeroes
main warning: playback too early (-68532): down-sampling```

Installing Kivy in pydroid3

I am using pydroid3. For installing kivy2.3.0 I uninstalled kivy2.2.0.But now neither kivy2.3 is installing nor kivy2.2.0 is installing again.Someone help me in this matter. Please

Exit code 1 and ffmpeg not found

I have already installed ffmpeg and all requitements but showing error code 1

"C:\Users\faisa\Documents\Pycharm Projects\Youtube Views\venv\Scripts\python.exe" "C:/Users/faisa/Documents/Pycharm Projects/Youtube Views/main.py"
2022-03-05 20:38:00,870 | WARNING | NOT FOUND FFMPEG!
2022-03-05 20:38:00,870 | INFO | Compressing into tar.bz2 is only supported

summary
m3u8_uri: http://videoserver.com/playlist.m3u8;
max_retry_times: 3;
tmp_dir: C:\Users\faisa\AppData\Local\Temp\m3u8_3881882585;
mp4_file_path: ./m3u8_To_Mp4.mp4;

2022-03-05 20:38:01,433 | INFO | Resolved available hosts:
2022-03-05 20:38:01,433 | INFO | 52.128.23.153:80
Traceback (most recent call last):
File "C:\Users\faisa\Documents\Pycharm Projects\Youtube Views\main.py", line 7, in
m3u8_To_MP4.multithread_download('http://videoserver.com/playlist.m3u8')
File "C:\Users\faisa\Documents\Pycharm Projects\Youtube Views\venv\lib\site-packages\m3u8_To_MP4_init_.py", line 75, in multithread_download
crawler.fetch_mp4_by_m3u8_uri(True)
File "C:\Users\faisa\Documents\Pycharm Projects\Youtube Views\venv\lib\site-packages\m3u8_To_MP4\v2_abstract_processor.py", line 250, in fetch_mp4_by_m3u8_uri
key_segments_pairs = self._filter_ads_ts(key_segments_pairs)
File "C:\Users\faisa\Documents\Pycharm Projects\Youtube Views\venv\lib\site-packages\m3u8_To_MP4\v2_abstract_processor.py", line 193, in _filter_ads_ts
self.longest_common_subsequence = path_helper.longest_common_subsequence([segment_uri for _, segment_uri in key_segments_pairs])
File "C:\Users\faisa\Documents\Pycharm Projects\Youtube Views\venv\lib\site-packages\m3u8_To_MP4\helpers\path_helper.py", line 40, in longest_common_subsequence
num_shortest_segment_absolute_url_length = min(len(url) for url in segment_absolute_urls)
ValueError: min() arg is an empty sequence

Process finished with exit code 1

save as

hi, thank you for the software,
but can you add save as like this :
m3u8_To_MP4.multithread_download('http://videoserver.com/playlist.m3u8' , 'new download.mp4')

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

when I use this code

import m3u8_To_MP4

if name == 'main':
# 1. Download videos from uri.
m3u8_To_MP4.multithread_download('http://videoserver.com/playlist.m3u8')

# 2. Download videos from existing m3u8 files.
m3u8_To_MP4.multithread_file_download('http://videoserver.com/playlist.m3u8',m3u8_file_path)

# For compatibility, i reserve this api, but i do not recommend to you again.
# m3u8_To_MP4.download('http://videoserver.com/playlist.m3u8')

Kivy in termux

An error occurred while decrypt and dump segments

SOURCE CODE

import m3u8_To_MP4
import os.path as path

url = 'https://example.com/video.m3u8'
headers = {
    "referer" : "https://example.com",
    "origin" : "https://example.com",
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}

m3u8_To_MP4.multithread_download(
    url, 
    customized_http_header=headers, 
    mp4_file_dir=path.dirname(__file__).replace('/', '\\'),
    mp4_file_name='output.mp4',
    max_retry_times=5
)

ERROR OUTPUT

Traceback (most recent call last):
  File "c:/Users/Username/Desktop/DownloadM3U8/m3u8ToMp4.py", line 14, in <module>
    m3u8_To_MP4.multithread_download(
  File "C:\Users\Username\AppData\Local\Programs\Python\Python38-32\lib\site-packages\m3u8_To_MP4\__init__.py", line 131, in multithread_download
    crawler.fetch_mp4_by_m3u8_uri(True)
  File "C:\Users\Username\AppData\Local\Programs\Python\Python38-32\lib\site-packages\m3u8_To_MP4\v2_abstract_crawler_processor.py", line 225, in fetch_mp4_by_m3u8_uri
    self._fetch_segments_to_local_tmpdir(key_segments_pairs)
  File "C:\Users\Username\AppData\Local\Programs\Python\Python38-32\lib\site-packages\m3u8_To_MP4\v2_multithreads_processor.py", line 140, in _fetch_segments_to_local_tmpdir
    encrypted_data = cryptor.decrypt(encrypted_data)
  File "C:\Users\Username\AppData\Local\Programs\Python\Python38-32\lib\site-packages\Crypto\Cipher\_mode_cbc.py", line 246, in decrypt
    raise ValueError("Data must be padded to %d byte boundary in CBC mode" % self.block_size)
ValueError: Data must be padded to 16 byte boundary in CBC mode

遇到错误

此外，doc的示例也有些许错误，直接使用import m3u8_to_mp4将会报错，改用了import m3u8_To_MP4

No Audio mp4

Hello.

Everything works well, but there are problems. There is no sound in the video if m3u8 contains a separate audio track.

#EXTM3U
#EXT-X-VERSION:4
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio_aac",NAME="English",DEFAULT=YES,AUTOSELECT=YES,CHANNELS="2",LANGUAGE="rus",URI="site.org"

How can this problem be corrected?