This tool converts PDF documents into spoken audio files using Google's Text-to-Speech (TTS) API. It's designed to be flexible and powerful, allowing batch processing, interactive selection, and resume capabilities.
- Batch Processing: Convert all PDFs in a directory to audio files automatically.
- Interactive Mode: Select specific PDFs to convert via a simple interactive prompt.
- Resume Capability: Pick up processing where you left off, skipping over already processed files.
- Segmentation: Split large PDFs into manageable segments based on chapters or sections.
- Customizable Output: Choose from multiple audio formats and specify metadata like title and author.
- Language Detection: Automatically detect the language of the text to optimize TTS.
- Logging: Detailed logs for monitoring and debugging the processing workflow.
Before you begin, ensure you have the following installed:
- Python 3.6 or higher
- Pip (Python package installer)
First, clone the repository:
git clone https://github.com/BAXTOR95/pdf-to-speech.git
cd pdf-to-speech
Then, install the required Python libraries:
pip install -r requirements.txt
Convert all PDFs in the input_files
directory to MP3 audio files:
python main.py --all
Selectively convert PDFs by choosing them interactively:
python main.py --interactive
Resume a previous processing session, skipping already processed files:
python main.py --resume
Convert specific PDF files:
python main.py file1.pdf file2.pdf file3.pdf
Convert PDFs and segment them by chapters:
python main.py --all --segment-by-chapter
Convert PDFs to a specific audio format (MP3, WAV, or OGG):
python main.py --all -f wav
Specify title and author metadata for the audio files:
python main.py --all --title "Example Title" --author "Author Name"
Override automatic language detection:
python main.py --all --language en
input_files/
: Place your PDFs here before processing.output_files/
: Audio files are saved in this directory after processing.progress.txt
: Tracks the progress of processed files (used with--resume
).
pdf_reader.py
: Contains functions to extract text and metadata from PDFs.text_to_speech.py
: Handles the conversion of text to speech using Google TTS.main.py
: The main script to run conversions, handling command-line arguments and processing logic.
You can modify the script to suit your needs:
- Change the
chapter_pattern
inpdf_reader.py
to match different chapter styles. - Adjust the TTS properties in
text_to_speech.py
for different voices or speeds.
- Missing PDFs: Ensure all PDFs are in the
input_files
directory. - Permission Issues: Make sure you have read and write permissions for the directories.
- Dependency Errors: Run
pip install -r requirements.txt
to ensure all dependencies are installed.
Feel free to fork the repository, make your changes, and create a pull request with your improvements.
- Brian Arriaga - Initial work - BAXTOR95