Transcribe video and add subtitles using Whisper AI.
- Split videos longer than 20 minutes into segments.
- Extract audio from video files.
- Transcribe audio using Whisper AI.
- Generate and burn subtitles into videos.
- Handle various video formats, including
.mkv
conversion to.mp4
.
See requirements.txt
for a list of dependencies.
torch
torch-audiomentations
torch-pitch-shift
torchaudio
torchmetrics
torchvision
openai-whisper
colorama
ffutils
ffmpeg
Note: Additionally, you'll need to install the following external dependencies:
-
FFmpeg: Download and install FFmpeg separately from here.
After downloading, extract the contents of the archive and add the FFmpeg binaries to your system PATH.
-
CUDA Toolkit (Optional, for users with dedicated GPUs): If you have a dedicated NVIDIA GPU and wish to enable GPU acceleration, you'll need to install the CUDA Toolkit. You can download the CUDA Toolkit from here and follow the installation instructions provided by NVIDIA.
It's recommended to use a virtual environment to manage dependencies for this project. Follow these steps to create and activate a virtual environment:
- Install
virtualenv
if you haven't already:
pip install virtualenv
- Navigate to the project directory:
cd /path/to/your/directory
- Create a virtual environment:
python -m venv my_env
- Activate the virtual environment:
-
On Windows:
my_env\Scripts\activate
-
On macOS and Linux:
source my_env/bin/activate
-
Once activated, you can install the project dependencies within the virtual environment without affecting your system-wide Python installation.
To install the package, clone the repository and install using pip
:
git clone https://github.com/jeromearellano/transcribe.git
cd transcribe
pip install . --extra-index-url https://download.pytorch.org/whl/cu118
To use the CLI tool, run the following command:
transcribe -i /path/to/video.mp4 --model small
-i, --input
: Path to the input video file. (required)--model
: Whisper model size to use (tiny
,base
,small
,medium
,large
). Default issmall
.
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.
- OpenAI Whisper for the transcription model.
- FFmpeg for video processing.
- Colorama for colored terminal output.
- Torch for the deep learning framework.
- ffutils for FFmpeg utilities.