p0n1 / epub_to_audiobook Goto Github PK

View Code? Open in Web Editor NEW

758.0 10.0 67.0 1.81 MB

EPUB to audiobook converter, optimized for Audiobookshelf

License: MIT License

Python 99.04% Dockerfile 0.96%

audiobooks audiobookshelf epub tts chatgpt openai

epub_to_audiobook's Introduction

EPUB to Audiobook Converter

Join our Discord server for any questions or discussions.

This project provides a command-line tool to convert EPUB ebooks into audiobooks. It now supports both the Microsoft Azure Text-to-Speech API (alternativly EdgeTTS) and the OpenAI Text-to-Speech API to generate the audio for each chapter in the ebook. The output audio files are optimized for use with Audiobookshelf.

This project is developed with the help of ChatGPT.

Audio Sample

If you're interested in hearing a sample of the audiobook generated by this tool, check the links bellow.

Azure TTS Sample
OpenAI TTS Sample
Edge TTS Sample: the voice is almost the same as Azure TTS

Requirements

Python 3.6+ Or Docker
For using Azure TTS, A Microsoft Azure account with access to the Microsoft Cognitive Services Speech Services is required.
For using OpenAI TTS, OpenAI API Key is required.
For using Edge TTS, no API Key is required.

Audiobookshelf Integration

The audiobooks generated by this project are optimized for use with Audiobookshelf. Each chapter in the EPUB file is converted into a separate MP3 file, with the chapter title extracted and included as metadata.

Chapter Titles

Parsing and extracting chapter titles from EPUB files can be challenging, as the format and structure may vary significantly between different ebooks. The script employs a simple but effective method for extracting chapter titles, which works for most EPUB files. The method involves parsing the EPUB file and looking for the title tag in the HTML content of each chapter. If the title tag is not present, a fallback title is generated using the first few words of the chapter text.

Please note that this approach may not work perfectly for all EPUB files, especially those with complex or unusual formatting. However, in most cases, it provides a reliable way to extract chapter titles for use in Audiobookshelf.

When you import the generated MP3 files into Audiobookshelf, the chapter titles will be displayed, making it easy to navigate between chapters and enhancing your listening experience.

Installation

Clone this repository:

git clone https://github.com/p0n1/epub_to_audiobook.git
cd epub_to_audiobook

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

Install the required dependencies:
```
pip install -r requirements.txt
```

Set the following environment variables with your Azure Text-to-Speech API credentials, or your OpenAI API key if you're using OpenAI TTS:

export MS_TTS_KEY=<your_subscription_key> # for Azure
export MS_TTS_REGION=<your_region> # for Azure
export OPENAI_API_KEY=<your_openai_api_key> # for OpenAI

Usage

To convert an EPUB ebook to an audiobook, run the following command, specifying the TTS provider of your choice with the --tts option:

python3 main.py <input_file> <output_folder> [options]

To check the latest option descriptions for this script, you can run the following command in the terminal:

python3 main.py -h

usage: main.py [-h] [--tts {azure,openai,edge}]
               [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--preview]
               [--no_prompt] [--language LANGUAGE]
               [--newline_mode {single,double}]
               [--chapter_start CHAPTER_START] [--chapter_end CHAPTER_END]
               [--output_text] [--remove_endnotes] [--voice_name VOICE_NAME]
               [--output_format OUTPUT_FORMAT] [--model_name MODEL_NAME]
               [--voice_rate VOICE_RATE] [--voice_volume VOICE_VOLUME]
               [--voice_pitch VOICE_PITCH] [--proxy PROXY]
               [--break_duration BREAK_DURATION]
               input_file output_folder

Convert text book to audiobook

positional arguments:
  input_file            Path to the EPUB file
  output_folder         Path to the output folder

options:
  -h, --help            show this help message and exit
  --tts {azure,openai,edge}
                        Choose TTS provider (default: azure). azure: Azure
                        Cognitive Services, openai: OpenAI TTS API. When using
                        azure, environment variables MS_TTS_KEY and
                        MS_TTS_REGION must be set. When using openai,
                        environment variable OPENAI_API_KEY must be set.
  --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Log level (default: INFO), can be DEBUG, INFO,
                        WARNING, ERROR, CRITICAL
  --preview             Enable preview mode. In preview mode, the script will
                        not convert the text to speech. Instead, it will print
                        the chapter index, titles, and character counts.
  --no_prompt           Don't ask the user if they wish to continue after
                        estimating the cloud cost for TTS. Useful for
                        scripting.
  --language LANGUAGE   Language for the text-to-speech service (default: en-
                        US). For Azure TTS (--tts=azure), check
                        https://learn.microsoft.com/en-us/azure/ai-
                        services/speech-service/language-
                        support?tabs=tts#text-to-speech for supported
                        languages. For OpenAI TTS (--tts=openai), their API
                        detects the language automatically. But setting this
                        will also help on splitting the text into chunks with
                        different strategies in this tool, especially for
                        Chinese characters. For Chinese books, use zh-CN, zh-
                        TW, or zh-HK.
  --newline_mode {single,double}
                        Choose the mode of detecting new paragraphs: 'single'
                        or 'double'. 'single' means a single newline
                        character, while 'double' means two consecutive
                        newline characters. (default: double, works for most
                        ebooks but will detect less paragraphs for some
                        ebooks)
  --chapter_start CHAPTER_START
                        Chapter start index (default: 1, starting from 1)
  --chapter_end CHAPTER_END
                        Chapter end index (default: -1, meaning to the last
                        chapter)
  --output_text         Enable Output Text. This will export a plain text file
                        for each chapter specified and write the files to the
                        output folder specified.
  --remove_endnotes     This will remove endnote numbers from the end or
                        middle of sentences. This is useful for academic
                        books.
  --voice_name VOICE_NAME
                        Various TTS providers has different voice names, look
                        up for your provider settings.
  --output_format OUTPUT_FORMAT
                        Output format for the text-to-speech service.
                        Supported format depends on selected TTS provider
  --model_name MODEL_NAME
                        Various TTS providers has different neural model names

edge specific:
  --voice_rate VOICE_RATE
                        Speaking rate of the text. Valid relative values range
                        from -50%(--xxx='-50%') to +100%. For negative value
                        use format --arg=value,
  --voice_volume VOICE_VOLUME
                        Volume level of the speaking voice. Valid relative
                        values floor to -100%. For negative value use format
                        --arg=value,
  --voice_pitch VOICE_PITCH
                        Baseline pitch for the text.Valid relative values like
                        -80Hz,+50Hz, pitch changes should be within 0.5 to 1.5
                        times the original audio. For negative value use
                        format --arg=value,
  --proxy PROXY         Proxy server for the TTS provider. Format:
                        http://[username:password@]proxy.server:port

azure specific:
  --break_duration BREAK_DURATION
                        Break duration in milliseconds for the different
                        paragraphs or sections (default: 1250). Valid values
                        range from 0 to 5000 milliseconds.

Example:

python3 main.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder

Executing the above command will generate a directory named output_folder and save the MP3 files for each chapter inside it using default TTS provider and voice. Once generated, you can import these audio files into Audiobookshelf or play them with any audio player of your choice.

Preview Mode

Before converting your epub file to an audiobook, you can use the --preview option to get a summary of each chapter. This will provide you with the character count of each chapter and the total count, instead of converting the text to speech.

Example:

python3 main.py examples/The_Life_and_Adventures_of_Robinson_Crusoe.epub output_folder --preview

Using with Docker

This tool is available as a Docker image, making it easy to run without needing to manage Python dependencies.

First, make sure you have Docker installed on your system.

You can pull the Docker image from the GitHub Container Registry:

docker pull ghcr.io/p0n1/epub_to_audiobook:latest

Then, you can run the tool with the following command:

docker run -i -t --rm -v ./:/app -e MS_TTS_KEY=$MS_TTS_KEY -e MS_TTS_REGION=$MS_TTS_REGION ghcr.io/p0n1/epub_to_audiobook your_book.epub audiobook_output --tts azure

For OpenAI, you can run:

docker run -i -t --rm -v ./:/app -e OPENAI_API_KEY=$OPENAI_API_KEY ghcr.io/p0n1/epub_to_audiobook your_book.epub audiobook_output --tts openai

Replace $MS_TTS_KEY and $MS_TTS_REGION with your Azure Text-to-Speech API credentials. Replace $OPENAI_API_KEY with your OpenAI API key. Replace your_book.epub with the name of the input EPUB file, and audiobook_output with the name of the directory where you want to save the output files.

The -v ./:/app option mounts the current directory (.) to the /app directory in the Docker container. This allows the tool to read the input file and write the output files to your local file system.

The -i and -t options are required to enable interactive mode and allocate a pseudo-TTY.

You can also check the this example config file for docker compose usage.

User-Friendly Guide for Windows Users

For Windows users, especially if you're not very familiar with command-line tools, we've got you covered. We understand the challenges and have created a guide specifically tailored for you.

Check this step by step guide and leave a message if you encounter issues.

How to Get Your Azure Cognitive Service Key?

Azure subscription - Create one for free
Create a Speech resource in the Azure portal.
Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.

Source: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech#prerequisites

How to Get Your OpenAI API Key?

Check https://platform.openai.com/docs/quickstart/account-setup. Make sure you check the price details before use.

✨ About Edge TTS

Edge TTS and Azure TTS are almost same, the difference is that Edge TTS don't require API Key because it's based on Edge read aloud functionality, and parameters are restricted a bit, like custom ssml.

Check https://github.com/p0n1/epub_to_audiobook/blob/main/audiobook_generator/tts_providers/edge_tts_provider.py#L17 for supported voices.

If you want to try this project quickly, Edge TTS is highly recommended.

Customization of Voice and Language

You can customize the voice and language used for the Text-to-Speech conversion by passing the --voice_name and --language options when running the script.

Microsoft Azure offers a range of voices and languages for the Text-to-Speech service. For a list of available options, consult the Microsoft Azure Text-to-Speech documentation.

You can also listen to samples of the available voices in the Azure TTS Voice Gallery to help you choose the best voice for your audiobook.

For example, if you want to use a British English female voice for the conversion, you can use the following command:

python3 main.py <input_file> <output_folder> --voice_name en-GB-LibbyNeural --language en-GB

For OpenAI TTS, you can specify the model, voice, and format options using --model_name, --voice_name, and --output_format, respectively.

More examples

Here are some examples that demonstrate various option combinations:

Examples Using Azure TTS

Basic conversion using Azure with default settings
This command will convert an EPUB file to an audiobook using Azure's default TTS settings.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts azure
```
Azure conversion with custom language, voice and logging level
Converts an EPUB file to an audiobook with a specified voice and a custom log level for debugging purposes.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts azure --language zh-CN --voice_name "zh-CN-YunyeNeural" --log DEBUG
```
Azure conversion with chapter range and break duration
Converts a specified range of chapters from an EPUB file to an audiobook with custom break duration between paragraphs.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts azure --chapter_start 5 --chapter_end 10 --break_duration "1500"
```

Examples Using OpenAI TTS

Basic conversion using OpenAI with default settings
This command will convert an EPUB file to an audiobook using OpenAI's default TTS settings.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts openai
```
OpenAI conversion with HD model and specific voice
Converts an EPUB file to an audiobook using the high-definition OpenAI model and a specific voice choice.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts openai --model_name "tts-1-hd" --voice_name "fable"
```
OpenAI conversion with preview and text output
Enables preview mode and text output, which will display the chapter index and titles instead of converting them and will also export the text.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts openai --preview --output_text
```

Examples Using Edge TTS

Basic conversion using Edge with default settings
This command will convert an EPUB file to an audiobook using Edge's default TTS settings.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts edge
```
Edge conversion with custom language, voice and logging level Converts an EPUB file to an audiobook with a specified voice and a custom log level for debugging purposes.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts edge --language zh-CN --voice_name "zh-CN-YunxiNeural" --log DEBUG
```
Edge conversion with chapter range and break duration Converts a specified range of chapters from an EPUB file to an audiobook with custom break duration between paragraphs.
```
python3 main.py "path/to/book.epub" "path/to/output/folder" --tts edge --chapter_start 5 --chapter_end 10 --break_duration "1500"
```

Troubleshooting

ModuleNotFoundError: No module named 'importlib_metadata'

This may be because the Python version you are using is less than 3.8. You can try to manually install it by pip3 install importlib-metadata, or use a higher Python version.

FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

Make sure ffmpeg biary is accessible from your path. If you are on a mac and use homebrew, you can do brew install ffmpeg, On Ubuntu you can do sudo apt install ffmpeg

Related Projects

Epub to Audiobook (M4B): Epub to MB4 Audiobook, with StyleTTS2 via HuggingFace Spaces API.
Storyteller: A self-hosted platform for automatically syncing ebooks and audiobooks.

License

This project is licensed under the MIT License. See the LICENSE file for details.

epub_to_audiobook's People

Contributors

Stargazers

Watchers

Forkers

orinocoz cmpscabral wenhhh alexanderwyss diarm36151 lowgun bhardwajrahul jczinger veiz brandonscript kylixtech ahfuforpmp vontainment portrman tangyiyong stamo-gochev descartes100 timgreen kevininspace bryksin dynm hendkai zy9306 mikemab fwuzju ugobruzadin grifone87 arvidjohansen k2m5t2 xtmu evdcush stanleychen leandrodaher lisharon87 haydonryan petercao sumsung007 fujohnwang pinfer lierscn789 neilchina jackblack369 miaohf yigmmk bigdoublesmallhg parinapatel phuchoang2603 jecky100000 cabeda hd890708 vicen4vicen dreammis ignorantsapient damarq001 caijh silverdivel deepthought007 omnipotentai jarvis657 deric-otech gamerclassn7 ryanboyd nonomal doiwriteacode lee-b android-awesomeroy

epub_to_audiobook's Issues

[Feature Request] Skip footnotes

It would be great to have a tag to skip both the numerical notes at the end of a sentence as well as the footnotes on the bottom of a page.

[Feature Request] Implement rough cost calculation beforehand, with prompt to confirm.

I was in the middle of writing my solution when by accident came across this project which already has almost everything implemented
So I'm planning to use your solution!

Thank you for your work!!!

However, what is missing - is cost estimation. When I want to convert a book to Audio I have no idea how big is it and how much would it cost
Would be nice if every tts_provider would implement a cost estimation function, and calculate roughly how much would it cost to translate the selected book

With manual command line prompt to confirm before final translation, like:

The approximate cost of the book voiceover would be XYZ$ 
Would you agree to proceed? [Y/N]: _

For example, OpenAI set the price of 0.015$ for 1k chars for the simple tts model and doubled it to 0.03$ for the tts-hd model
It should be easy to calculate by the formula: (whole_book_chars / 1k) * selected_tts_model_price

Additional suggestions:
Considering project evolution and further progress, I would suggest:

Reorganise the project from a single file into proper separate classes and packages and move TTS providers and the main interface TTSProvider into a separate Python package to simplify adding more providers
Add the cost_estimation method to the TTSProvider interface
Add more book type support ( *.fb2, *.mobi...) which would require also the creation of separate services implementing a global interface for each book type
Add more providers:
- AWS has TTS - called Polly. supports: standard (mechanical) voice and new neural voice (sounds much better), but not all languages are supported (what makes --language to be an obligatory arg for execution). Price
- Google has TTS Price
- I'm sure there are many more providers out there, especially considering the AI boom in the industry, however, if anyone decides to contribute, he would have to implement TTSProvider interface with basic the standard functionality and place it into an individual Python package.

P.S. Happy to help with the project, feel free to PM

(question) All chapters end up being 2kb large

Hello,

This is more of a question, then a bug report as I am not sure if this is due to me doing something wrong...

After chapter is converted to mp3 it end up with size of ~2kb, during conversion size is reported correctly (while refreshing directory). This was in wsl on both Ubuntu and Arch distros as well as on regular Linux Arch distro, using python virtual env with edge as tts provider with following command:
python main.py book_name.epub book_dir --tts edge --language hr-HR --voice_name hr-HR-SreckoNeural

Had the same issue with english language book, not providing language switch, which, if I am correct use english language by default...

Using conversion with docker image works as expected.

Thanks

Audio Is Longer In Duration Than What Is In ABS player

Hello! Thank you for your hard work. I used your program to create an audiobook from an EPUB file that was from my Calibre library.
It worked, but on Audiobookshelf it says it has a certain duration which is then exceeded, and it is listed as finished. So when I try to come back to it I'm forced to go to the official end of the audio, and I can't go forward or back using the player or it will start at the official end of the audio.

footnotes

So I just started playing around with TTS over the last week or so and have been using Piper to take individual OCR'ed png files downloaded from archive.org and convert them to speech. This pretty much sucks, but since the book I'm working on is not available as an ebook so far as I can tell (Service, John, Lost Chance in China) this is the only way to do it. On the other hand, there are a lot of other non-fiction books (and perhaps some journal articles) that are available as .epub (and certainly as .pdf for the journal articles), a better TTS solution that goes from epub to .opus (or .flac) directly would be preferable so I can simplify .epub >[web-based conversion] > .txt > [piper] > [ffmpeg] > .opus. However, one of the problems with this that requires a lot of manual processing so far is integrating foot/chapter/endnotes back into the text so that content is not lost.

I don't think this is exactly the right venue for this discussion, but I didn't see an e.g., Discord channel for this project (happy to discuss it on off-topic @ audiobookshelf discord), but I'd like to see what others are thinking about for integrating that content back into non-fiction work (as I have all the fiction I want and then some already on audio).

[Feature] Force mode

Can you please add functionality which detect if there are some chapters already processed in output folder by filename ?
in my use case i am not much time on one place so i am converting some books in 4-9 runs. and would be nice to save so money by automatically detecting the already converted chapters, if -f parameter would be user everything in output folder could be ignored :)

Program successfully completes but files do not show up?

Hey y'all,

I apologize if this is a stupid inquiry - I have absolutely no coding experience (although this project is motivating me to learn!). Using docker, I was able to successfully follow the instructions and run the program for 2 different audiobooks. In both cases, the program seemed to successfully complete, however the audio files do not appear in the specified outlook directory. I used separate directories for each book, making a new folder /Users/paulclancy/Desktop/Azure and then copying this pathname to specify where to upload.

After checking the folder post-completion of the program, no audio files are present. My code (with the middle portion appreviated) is shown below. Please let me know if there is any obvious solution to this. Thank you!

'paulclancy@Pauls-MacBook-Pro Azure % docker run -i -t --rm -v ./:/app -e MS_TTS_KEY=b5145331f062491e9e53b1d4e3da942d -e MS_TTS_REGION=eastus ghcr.io/p0n1/epub_to_audiobook lying.epub /Users/paulclancy/Desktop/Azure --tts azure
/usr/local/lib/python3.11/site-packages/ebooklib/epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
warnings.warn('In the future version we will turn default option ignore_ncx to True.')
/usr/local/lib/python3.11/site-packages/bs4/builder/init.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument ``features="xml"` into the BeautifulSoup constructor.
warnings.warn(
2024-02-26 05:28:06 [INFO] Chapters count: 12.
2024-02-26 05:28:06 [INFO] Converting chapters from 1 to 12.
2024-02-26 05:28:06 [INFO] ✨ Total characters in selected book: 126647 ✨
Estimate book voiceover would cost you roughly: $2.03

Do you want to continue? (y/n)
y
2024-02-26 05:28:24 [INFO] Converting chapter 1/12:
2024-02-26 05:33:39 [INFO] Processing chapter-12 <A_NOTE_ON_THE_TYPE_This_book_was_set_in_Minion_a_typeface_d>, chunk 1 of 1
...

2024-02-26 05:33:39 [INFO] Sending request to Azure TTS, data length: 576
2024-02-26 05:33:40 [INFO] Got response from Azure TTS, response length: 172512
paulclancy@Pauls-MacBook-Pro Azure %``

KeyError: "There is no item named 'page_styles.css' in the archive"

Hi, this tool is very useful, thanks for working on this!

I've encountered a bug with an epub that I'm putting in. Is it a case of a malformed epub?

Thanks

Stack trace:

Traceback (most recent call last):
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\main.py", line 102, in <module>
    main()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\main.py", line 98, in main
    AudiobookGenerator(config).run()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\core\audiobook_generator.py", line 37, in run
    book_parser = get_book_parser(self.config)
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\book_parsers\base_book_parser.py", line 42, in get_book_parser
    return EpubBookParser(config)
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\audiobook_generator\book_parsers\epub_book_parser.py", line 19, in __init__
    self.book = epub.read_epub(self.config.input_file)
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1768, in read_epub
    book = reader.load()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1410, in load
    self._load()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1722, in _load
    self._load_opf_file()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1679, in _load_opf_file
    self._load_manifest()
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1555, in _load_manifest
    ei.content = self.read_file(zip_path.join(self.opf_dir, ei.get_name()))
  File "C:\Users\USER\Documents\Projects\epub_to_audiobook\venv\lib\site-packages\ebooklib\epub.py", line 1417, in read_file
    return self.zf.read(name)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1475, in read
    with self.open(name, "r", pwd) as fp:
  File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1514, in open
    zinfo = self.getinfo(name)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1441, in getinfo
    raise KeyError(
KeyError: "There is no item named 'page_styles.css' in the archive"

Ability to set options via Environment Variables in Docker Compose?

Hi, this is a great project, thank you for creating it!

I am wondering if it is possible to set any of the configuration flags via Environment variables when using the docker compose file. I'm hoping to set things such as OPENAI_VOICE, OPENAI_MODEL, etc. via variables in the compose file or a .env file.

Thank you again!

Run in same compose file as Audiobookshel

Is possible to run this container without command, and automatically parse epubs to audio after uploading them to Audio-bookshelf directory ?

Error while converting text to speech (attempt 1): ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

Using on Mac (Sonoma - python 3.9.6) and able to run the enclosed example book without any issue. however, other epubs that I have gotten though the years do not work. All run but the message warning keep showing after several minutes as the attempt increase

I attached the full log.
error.txt

Amazon Polly

Thanks for developing this!

Have you considered integrating Amazon Polly? The neural voices are exceptionally good and the possibilities with SSML are unique!

voice

Great piece of software. Thanks! Could you please let me know how to make the voice sound more human? Is there any other option besides azure text-to-speech or any tweak that can make the audio ... audible?

EOFError: EOF when reading a line in v0.5.0 or v0.5.1

When I convert epub ebook to audiobook using v0.4.3 version, it works fine. In order to use edge tts, I tried to use v0.5.0 or v0.5.1 to convert the same epub e-book, and the error occurred as follows:

/usr/local/lib/python3.11/site-packages/ebooklib/epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
warnings.warn('In the future version we will turn default option ignore_ncx to True.')
/usr/local/lib/python3.11/site-packages/bs4/builder/init.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor.
warnings.warn(
2024-04-08 02:55:06 [INFO] Chapters count: 320.
2024-04-08 02:55:06 [INFO] Converting chapters from 1 to 320.
Estimate book voiceover would cost you roughly: $0.00
Do you want to continue? (y/n)
2024-04-08 02:55:06 [INFO] ✨ Total characters in selected book: 1595407 ✨
Traceback (most recent call last):
File "/app_src/main.py", line 139, in
main()
File "/app_src/main.py", line 135, in main
AudiobookGenerator(config).run()
File "/app_src/audiobook_generator/core/audiobook_generator.py", line 77, in run
confirm_conversion()
File "/app_src/audiobook_generator/core/audiobook_generator.py", line 14, in confirm_conversion
answer = input()
^^^^^^^
EOFError: EOF when reading a line

[Feature Request] Title Options / Filename Options

This is possibly just my OCD but would it be possible to add in some options to set the title of each file to something other than what the script decides?

Example :

Chapter 1
Book Name - Chapter 1

Would it also be possible to customise the filenames in a similar fashion?

Feature suggestion: overwrite protection

I like how ffmpeg protects previous output insofar as it prompts to overwrite, but gives you the option of passing -n to not overwrite (very helpful in scripts!) or -y to overwrite.

No breaks added between paragraphs when they don't end with a punctuation

When using Edge TTS to read an epub formatted book, the following sorts of paragraph won't be read correctly:

Chapter One

The Chapter Title

This is the first sentence of the chapter.

because it will be read as: "Chapter one the chapter title this is the first sentence of the chapter", as though it's all once sentence with no breaks. This can be especially confusing if there's a heading in a paragraph in the middle of a chapter, something like:

... This is the final sentence of a paragraph.

The Next Section

Here is another sentence.

since it'll be read as "The next section here is another sentence", making it easy to miss that the first half of that sentence was supposed to be a header.

I looked in the source code and the trouble seems to come from epub_book_parser.py, where the second text cleaning step replaces all groups of white space (including newlines) with a single space. So this might affect Azure and OpenAI TTS as well, but I haven't tested it.

At least in the case of Edge TTS, though, it's not sufficient to simply keep a newline in there, because it appears that the edge_tts module automatically replaces newlines with spaces as well. So I think the solution for it needs to include inserting periods where needed.

An even better solution for Edge TTS would be to insert longer pauses between such paragraphs, though since Microsoft prevents using SSML, it would require using something like this.

StyleTTS2

https://github.com/yl4579/StyleTTS2 StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Still experimental but looks promising.

[Feature Request] Support for Customizable Audio Formats

Great project, thank you.
Here’s a suggestion: Currently, the default output audio format is an mp3 with a bitrate of 48kbps, which has a relatively poor sound quality. It would be great if you could add support for other audio formats, allowing users to customize it using the parameter “X-Microsoft-OutputFormat”. Thank you once again.

[Bug] File overwrite every iteration

Hi, I suspect I spotted the bug, it is not affecting functionality, as eventually final file would be created successfully, however it overwrites it on every iteration

I believe this write operation must be tabbed outside of for loop and make single write when all audio segments are collected after the loop instead of overwriting over and over again file with +1 audio segment

Better chapter title handling

I'm a bit concerned about the possibility that items labeled as h1 h2 h3 could be non section title. However, it's not a big issue, and if there is indeed a problem, we can fix it later.

Originally posted by @p0n1 in #30 (comment)

I ran into a book with single numbers in h1 tag. So, the chapter titles would be just something like 01, 02... I prefer to keep more context/strings in title so I can know more about each chapter audio file.

OpenAI TTS Support

Hi! Awesome project :)

Any plans to support TTS by OpenAI as well?

[Bug] When there's illustration in the epub, the script starts the chapter from after the illustration

Hi, thank you for this great tool! It's been so useful for me. I've just converted my first book yesterday and realized that some chapters are missing texts. Turned out that when there's an illustration in the epub in the middle of the chapter, the tool only starts converting from after the illustration onwards.

Let me know if I can provide anymore details.

[Errno 2] No such file or directory when running python3 epub_to_audiobook.py -h`

When I run python3 epub_to_audiobook.py -h (either in or out of the venv), I get

/Library/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python: can't open file '/Users//epub_to_audiobook.py': [Errno 2] No such file or directory

I'm super confused by why this would be happening. I've deleted and rebuilt the directory a few times and it seems to still not work. I'm not sure what's going on. Any suggestions. I feel like I'm following the guidance correctly.

This happens with my conversion commands as well.

LocalAI Support

Split from the issue #9 (comment).

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "model":"it-riccardo_fasol-x-low.onnx",
  "backend": "piper",
  "input": "Ciao, sono Ettore"
}'

LocalAI TTS API https://localai.io/features/text-to-audio/ is defined even before the release of OpenAI. I think It's not full compatible with OpenAI TTS API https://platform.openai.com/docs/guides/text-to-speech because they are using different voices and models.

So changing the base url of OpenAI SDK to LocalAI instance will not work for TTS feature.

LocalAI supports bark , piper and vall-e-x

If we can support LocalAI, we can support many good local TTS engines at once.

Web Interface

Would it be a big ask if we could get (at some point) a web interface, with Readarr/Calibre integration (as well as Audiobook platforms) so we could even configure automated conversions based on Readarr tags or libraries? Bonus points if we could then have it notify a Readarr instance (or even a different audiobook app) that there's a new audio book for it to scan.

Program stuck on y/n Prompt

For some reason it gets stuck after asking for confirmation when running on my desktop. Did the exact same thing on laptop and it was able to run. Not sure what the difference is, no error message is given only below warnings

$ python3 main.py --tts edge --voice_name "en-US-RogerNeural" "C:\Users[Username]\Downloads\Min-Maxing My TRPG Build in Another World_ Volume 8.epub" output_folder C:\Users[Username]\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ebooklib\epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
warnings.warn('In the future version we will turn default option ignore_ncx to True.')
C:\Users[Username]\epub_to_audiobook\audiobook_generator\tts_providers\base_tts_provider.py:13: RuntimeWarning: coroutine 'EdgeTTSProvider.validate_config' was never awaited
self.validate_config()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
C:\Users[Username]\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\bs4\builder_init_.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor.
warnings.warn(
2024-03-06 19:44:28 [INFO] Chapters count: 23.
2024-03-06 19:44:28 [INFO] Converting chapters from 1 to 23.
2024-03-06 19:44:28 [INFO] \u2728 Total characters in selected book: 609648 \u2728
Estimate book voiceover would cost you roughly: $0.00

Do you want to continue? (y/n)

Error trying to convert epub to audio: "cannot execute binary file"

This is my first time really messing with something like this, so it's almost definitely my fault something's up. I followed the Windows step-by-step guide, and was able to figure everything out until I got to this point:

$ "C:\Users*NAME*\Downloads*BOOK*.epub" "C:\Users*NAME*\Downloads*EMPTY FOLDER*" --tts azure --voice_name en-US-JennyNeural --language en-US
bash: C:\Users*NAME*\Downloads*BOOK*.epub: cannot execute binary file: Exec format error
(venv)

(all text in bold is stuff I changed to share this)
I'm on Windows 10, using Python 3.12.1 and latest version of Git

Error when parsing book with nested chapters

I get error when trying to pase tis book: https://www.kosmas.cz/knihy/257693/ostre-stribro/

/usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
  warnings.warn(

Could it be connectet to nested chapters ?

Azure region info

Can you explain a little more how the Azure region code is obtained? This isn't clear at all.

I can not convert using OpenAI + Docker

Hello, and thank you for this great tool!! 🙌

I am trying to convert an EPUB to audiobook running the following command:

docker run --rm -v ./:/app -e OPENAI_API_KEY=my-openai-key ghcr.io/p0n1/epub_to_audiobook my_ebook.epub audiobook_output --tts openai

But I am getting this error:

/usr/local/lib/python3.11/site-packages/ebooklib/epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True. warnings.warn('In the future version we will turn default option ignore_ncx to True.') /usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor. warnings.warn( 2024-01-14 19:37:24 [INFO] Chapters count: 14. 2024-01-14 19:37:24 [INFO] Converting chapters from 1 to 14. 2024-01-14 19:37:24 [INFO] ✨ Total characters in selected book: 189325 ✨ Estimate book voiceover would cost you roughly: $2.85 Do you want to continue? (y/n) Traceback (most recent call last): File "/app_src/main.py", line 134, in <module> main() File "/app_src/main.py", line 130, in main AudiobookGenerator(config).run() File "/app_src/audiobook_generator/core/audiobook_generator.py", line 70, in run confirm_conversion(rough_price) File "/app_src/audiobook_generator/core/audiobook_generator.py", line 15, in confirm_conversion answer = input() ^^^^^^^ EOFError: EOF when reading a line

I can't even select Y or N when prompting if I want to continue. Do you now what could I be doing wrong?

Could we add a --no-prompt option to supress the prompt (Do you want to continue? (y/n))

Hi there,
Great work here!
I'm working on a project where I would be running this in a pipeline, so no text input is possible. Could we add a --no-prompt option?.

2024-01-13 13:34:10 [INFO] Chapters count: 33.
2024-01-13 13:34:10 [INFO] Converting chapters from 1 to 33.
2024-01-13 13:34:10 [INFO] ✨ Total characters in selected book: 554126 ✨
Estimate book voiceover would cost you roughly: $8.88
Do you want to continue? (y/n)

Another minor suggestion here- if you pass the --preview option idea here, it still asks you to confirm if you would like to continue. If you're just previewing ... it won't actually do the conversion - so can we skip the prompt, just process and exit?

Thanks! and have a great day.

Read in the audiobook in a different language

Thanks for putting together this program, it's been working really nicely for me so far and I've got a feature suggestion for you!

My epub was in english, but I wanted the audiobook to be danish. Since I'm somewhat familar with python I added an extra translation API call to gpt4 (turb would likely do as well) and used it's output for the speech generation. It's more expensive, but worked really nicely in my testing with openai. I was thinking it'd be nice to have it as a built-in option for people who aren't python-savvy.

Local TTS support

Amazing work, but would be even more amazing if we have alternatives to online-paid only TTS options such as PiperTTS, CoquiTTS, Bark...etc Thanks for the hard work and keep it up!

[Feature Request] Small sample of voice

It would be awesome if you had a function to convert a very short epub file with an option to select the different tts voices.. that way we could know what the voice will sound like before converting a full epub.

or maybe just include a single short chapter epub in the github that can be used as your example..