Giter Site home page Giter Site logo

pdf-to-speech's Introduction

PDF to Speech Converter

This tool converts PDF documents into spoken audio files using Google's Text-to-Speech (TTS) API. It's designed to be flexible and powerful, allowing batch processing, interactive selection, and resume capabilities.

Features

  • Batch Processing: Convert all PDFs in a directory to audio files automatically.
  • Interactive Mode: Select specific PDFs to convert via a simple interactive prompt.
  • Resume Capability: Pick up processing where you left off, skipping over already processed files.
  • Segmentation: Split large PDFs into manageable segments based on chapters or sections.
  • Customizable Output: Choose from multiple audio formats and specify metadata like title and author.
  • Language Detection: Automatically detect the language of the text to optimize TTS.
  • Logging: Detailed logs for monitoring and debugging the processing workflow.

Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.6 or higher
  • Pip (Python package installer)

Installation

First, clone the repository:

git clone https://github.com/BAXTOR95/pdf-to-speech.git
cd pdf-to-speech

Then, install the required Python libraries:

pip install -r requirements.txt

Usage

Basic Conversion

Convert all PDFs in the input_files directory to MP3 audio files:

python main.py --all

Interactive Mode

Selectively convert PDFs by choosing them interactively:

python main.py --interactive

Resume Processing

Resume a previous processing session, skipping already processed files:

python main.py --resume

Convert Specific Files

Convert specific PDF files:

python main.py file1.pdf file2.pdf file3.pdf

Segment by Chapters

Convert PDFs and segment them by chapters:

python main.py --all --segment-by-chapter

Specify Output Format

Convert PDFs to a specific audio format (MP3, WAV, or OGG):

python main.py --all -f wav

Set Metadata

Specify title and author metadata for the audio files:

python main.py --all --title "Example Title" --author "Author Name"

Set Language

Override automatic language detection:

python main.py --all --language en

Directory Structure

  • input_files/: Place your PDFs here before processing.
  • output_files/: Audio files are saved in this directory after processing.
  • progress.txt: Tracks the progress of processed files (used with --resume).

Modules and Functions

  • pdf_reader.py: Contains functions to extract text and metadata from PDFs.
  • text_to_speech.py: Handles the conversion of text to speech using Google TTS.
  • main.py: The main script to run conversions, handling command-line arguments and processing logic.

Customization

You can modify the script to suit your needs:

  • Change the chapter_pattern in pdf_reader.py to match different chapter styles.
  • Adjust the TTS properties in text_to_speech.py for different voices or speeds.

Troubleshooting

  • Missing PDFs: Ensure all PDFs are in the input_files directory.
  • Permission Issues: Make sure you have read and write permissions for the directories.
  • Dependency Errors: Run pip install -r requirements.txt to ensure all dependencies are installed.

Contributing

Feel free to fork the repository, make your changes, and create a pull request with your improvements.

Authors

  • Brian Arriaga - Initial work - BAXTOR95

pdf-to-speech's People

Contributors

baxtor95 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.