Giter Site home page Giter Site logo

patrickenfuego / chapterize-audiobooks Goto Github PK

View Code? Open in Web Editor NEW
66.0 66.0 8.0 40.07 MB

Split a single, monolithic mp3 audiobook file into chapters using Machine Learning and ffmpeg.

License: Apache License 2.0

Python 100.00%
audiobook-convertor audiobook-tracks audiobooks chapters machine-learning mp3-converter mp3-files mp3-tags speech-to-text

chapterize-audiobooks's People

Contributors

patrickenfuego avatar shrinkwrapper avatar tri-ler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

chapterize-audiobooks's Issues

Improve Chapter Parsing

Some audiobooks don't use normal keywords that can help identify the start of a chapter. For example, some don't say "chapter" before the identifier, but instead just say "One".

My goal is to help identify these section separators using the surrounding context, allowing for more accurate chapter breaks.

The script doesn't work...

I found your script and I really liked the idea. But I tried to run it and I get stuck all the time!
At first I got the following error:
File "C:\FFOutput\Chapterize-Audiobooks-0.6.0\chapterize_ab.py", line 316, in parse_args
args.audiobook.with_suffix('.cue').exists()
AttributeError: 'NoneType' object has no attribute 'with_suffix'

So I went to line 316 and deleted the
or args.audiobook.with_suffix('.cue').exists()
Now the script started working. And I got the message:
ERROR: The script only works with .mp3 files (for now)

I tried different lines and got the same error:

  • chapterize_ab.py -h
  • chapterize_ab.py 'C:\FFOutput\a.mp3' --title 'aaa' --genre 'Fantasy'

I tried at first from the Windows command line, and then also from IDLE (3.11), but without success.
I tried to run older versions of your script, and I got the first error in version 0.5 as well, and the second error in all your versions...
Would appreciate help.
post Scriptum. I don't understand Python that much, so it is not unreasonable that I skipped a step that is obvious to you, simply due to lack of knowledge.

Add Additional Language Support

In a previous release, I modularized the project so it can leverage multiple different languages dynamically. I need help from people who speak those languages to fill out the excluded phrases and chapter separators so more people can use this tool.

Inconsitent support for mp4a

I love that the script can detect and splice a big file into chapters but it would be nice it also supported mp4a encoding consistently. The script is able to analyze and generate SRT file from mp4a but it cannot splice the file. It would be nice if the script could detect that the source was encoded using mp4a and automatically convert it to a temporary mp3 file so it can splice it or let the user know before it starts to process it that the encoding is not supported.

I manually converted the file mp4a file to mp3 to confirm that the error reported was due to it being mp4 and not for another reason and it worked as expected.

ffmpeg_log.txtx

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55e69758ce40] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x55e6975e6680] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------

********************************************************
NEW LOG START
********************************************************

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x5641cafe2e80] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x5641cb03c180] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x556de9d23e80] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x556de9d7d180] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------
...

Experimental Chapter Separators

Add additional chapter separators:

  • Preface
  • Introduction
  • Foreword
  • Afterword

Initially these will not be used but can be enabled via a CLI switch until thorough testing is performed.

Failure downloading model

I got the error:

Traceback (most recent call last):
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 1078, in <module>
    main()
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 970, in main
    audiobook_file, in_metadata, lang, model_name, model_type, cue_file = parse_args()
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 316, in parse_args
    args.audiobook.with_suffix('.cue').exists()
AttributeError: 'NoneType' object has no attribute 'with_suffix'

Command: python3.10 chapterize_ab.py -dm -l pl

Executed inside venv
OS: debian 11
Installed dependencies using pip3.10 install -r requirements.txt

pip log:

Requirement already satisfied: rich>=12.6.0 in ./lib/python3.10/site-packages (from -r requirements.txt (line 1)) (13.5.3)
Requirement already satisfied: vosk>=0.3.44 in ./lib/python3.10/site-packages (from -r requirements.txt (line 2)) (0.3.45)
Requirement already satisfied: requests>=2.28.0 in ./lib/python3.10/site-packages (from -r requirements.txt (line 3)) (2.31.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./lib/python3.10/site-packages (from rich>=12.6.0->-r requirements.txt (line 1)) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./lib/python3.10/site-packages (from rich>=12.6.0->-r requirements.txt (line 1)) (2.16.1)
Requirement already satisfied: cffi>=1.0 in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (1.16.0)
Requirement already satisfied: tqdm in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (4.66.1)
Requirement already satisfied: srt in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (3.5.3)
Requirement already satisfied: websockets in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (11.0.3)
Requirement already satisfied: certifi>=2017.4.17 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (2023.7.22)
Requirement already satisfied: idna<4,>=2.5 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (3.2.0)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (2.0.5)
Requirement already satisfied: pycparser in ./lib/python3.10/site-packages (from cffi>=1.0->vosk>=0.3.44->-r requirements.txt (line 2)) (2.21)
Requirement already satisfied: mdurl~=0.1 in ./lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12.6.0->-r requirements.txt (line 1)) (0.1.2)
WARNING: You are using pip version 21.2.3; however, version 23.2.1 is available.
You should consider upgrading via the '/home/savant/Projects/Chapterize-Audiobooks/bin/python3.10 -m pip install --upgrade pip' command.

Generate timecode/cue file

After parsing, generate a file which can be used to edit chapter markers in situations where the split points are inaccurate.

Feature Request: Some indicator that generate_timecodes is working

It would be nice if there was some indicator that the ffmpeg subprocess was working (maybe a tail of the SRT file) so as a user we can see it's still working through the file and not that the process is hung.

I know we could modify the chapterize_ab.py#760 and remove the -loglevel quiet arg and see that it's working but if a prettier option was available it would be nice.

When I run the script, I get a syntax error in line 40, to do with vosk_link

Installed on my Windows 10 machine, with python3 and ffmpeg etc up to date. When I run any command involving chapterize_ab.py (including "-h"), I get the following error:

File "C:\dev\chapterize-audiobooks-main\chapterize_ab.py", line 40 vosk_link = f"[link={vosk_url}]this link[/link]" ^ SyntaxError: invalid syntax

Any ideas why?

Add GUI interface

Add an additional, simple GUI interface for users who are not as comfortable using the command line.

Convert to m4b

Option to convert an mp3 file to m4b with embedded chapter metadata.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.