patrickenfuego / chapterize-audiobooks Goto Github PK

Split a single, monolithic mp3 audiobook file into chapters using Machine Learning and ffmpeg.

License: Apache License 2.0

Python 100.00%

audiobook-convertor audiobook-tracks audiobooks chapters machine-learning mp3-converter mp3-files mp3-tags speech-to-text

chapterize-audiobooks's People

Contributors

Stargazers

Watchers

Forkers

testxsubject linden-ryuujin lunawesley9 loganm123 frenchbeast squallium drsocket andrei27m

chapterize-audiobooks's Issues

Improve Chapter Parsing

Some audiobooks don't use normal keywords that can help identify the start of a chapter. For example, some don't say "chapter" before the identifier, but instead just say "One".

My goal is to help identify these section separators using the surrounding context, allowing for more accurate chapter breaks.

I found your script and I really liked the idea. But I tried to run it and I get stuck all the time!
At first I got the following error:
File "C:\FFOutput\Chapterize-Audiobooks-0.6.0\chapterize_ab.py", line 316, in parse_args
args.audiobook.with_suffix('.cue').exists()
AttributeError: 'NoneType' object has no attribute 'with_suffix'
So I went to line 316 and deleted the
or args.audiobook.with_suffix('.cue').exists()
Now the script started working. And I got the message:
ERROR: The script only works with .mp3 files (for now)

I tried different lines and got the same error:

chapterize_ab.py -h
chapterize_ab.py 'C:\FFOutput\a.mp3' --title 'aaa' --genre 'Fantasy'

I tried at first from the Windows command line, and then also from IDLE (3.11), but without success.
I tried to run older versions of your script, and I got the first error in version 0.5 as well, and the second error in all your versions...
Would appreciate help.
post Scriptum. I don't understand Python that much, so it is not unreasonable that I skipped a step that is obvious to you, simply due to lack of knowledge.

Add Additional Language Support

In a previous release, I modularized the project so it can leverage multiple different languages dynamically. I need help from people who speak those languages to fill out the excluded phrases and chapter separators so more people can use this tool.

upload demo gif

upload new gif

Inconsitent support for mp4a

I love that the script can detect and splice a big file into chapters but it would be nice it also supported mp4a encoding consistently. The script is able to analyze and generate SRT file from mp4a but it cannot splice the file. It would be nice if the script could detect that the source was encoded using mp4a and automatically convert it to a temporary mp3 file so it can splice it or let the user know before it starts to process it that the encoding is not supported.

I manually converted the file mp4a file to mp3 to confirm that the error reported was due to it being mp4 and not for another reason and it worked as expected.

ffmpeg_log.txtx

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55e69758ce40] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x55e6975e6680] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------

********************************************************
NEW LOG START
********************************************************

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x5641cafe2e80] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x5641cb03c180] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x556de9d23e80] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x556de9d7d180] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------
...

Experimental Chapter Separators

Add additional chapter separators:

Preface
Introduction
Foreword
Afterword

Initially these will not be used but can be enabled via a CLI switch until thorough testing is performed.

Failure downloading model

I got the error:

Traceback (most recent call last):
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 1078, in <module>
    main()
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 970, in main
    audiobook_file, in_metadata, lang, model_name, model_type, cue_file = parse_args()
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 316, in parse_args
    args.audiobook.with_suffix('.cue').exists()
AttributeError: 'NoneType' object has no attribute 'with_suffix'

Command: python3.10 chapterize_ab.py -dm -l pl

Executed inside venv
OS: debian 11
Installed dependencies using pip3.10 install -r requirements.txt

pip log:

Requirement already satisfied: rich>=12.6.0 in ./lib/python3.10/site-packages (from -r requirements.txt (line 1)) (13.5.3)
Requirement already satisfied: vosk>=0.3.44 in ./lib/python3.10/site-packages (from -r requirements.txt (line 2)) (0.3.45)
Requirement already satisfied: requests>=2.28.0 in ./lib/python3.10/site-packages (from -r requirements.txt (line 3)) (2.31.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./lib/python3.10/site-packages (from rich>=12.6.0->-r requirements.txt (line 1)) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./lib/python3.10/site-packages (from rich>=12.6.0->-r requirements.txt (line 1)) (2.16.1)
Requirement already satisfied: cffi>=1.0 in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (1.16.0)
Requirement already satisfied: tqdm in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (4.66.1)
Requirement already satisfied: srt in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (3.5.3)
Requirement already satisfied: websockets in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (11.0.3)
Requirement already satisfied: certifi>=2017.4.17 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (2023.7.22)
Requirement already satisfied: idna<4,>=2.5 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (3.2.0)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (2.0.5)
Requirement already satisfied: pycparser in ./lib/python3.10/site-packages (from cffi>=1.0->vosk>=0.3.44->-r requirements.txt (line 2)) (2.21)
Requirement already satisfied: mdurl~=0.1 in ./lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12.6.0->-r requirements.txt (line 1)) (0.1.2)
WARNING: You are using pip version 21.2.3; however, version 23.2.1 is available.
You should consider upgrading via the '/home/savant/Projects/Chapterize-Audiobooks/bin/python3.10 -m pip install --upgrade pip' command.

Any ideas why?