Giter Site home page Giter Site logo

lukerbs / forcealign Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 2.82 MB

ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.

License: MIT License

Python 98.82% Shell 1.18%

forcealign's Introduction

ForceAlign

ForceAlign is a Python library for forced alignment of English text to English Audio. You can use this library to get word or phoneme-level text alignments to English audio. In short, forced alignment is the process of identifying the specific time a word (or words) was spoken within an audio recording. ForceAlign supports forced alignment for .mp3 and .wav audio file formats.

For phoneme level text alignments, ForceAlign currently only supports the ARPABET phonetic transcription encoding.

ForceAlign uses Pytorch's WAV2VEC2 pretrained model for acoustic feature extraction and can be ran on both CPU and CUDA GPU devices.

Features

  • Fast and accurate word and phoneme level forced alignment of text to audio.
  • Is optimized for both CPU and GPU.
  • OS independent! Use ForceAlign on Mac, Windows, and Linux.

Installation and Dependencies

  1. Pip Install ForceAlign
    • pip3 install forcealign
  2. Install ffmpeg
    • Mac: brew install ffmpeg
    • Linux: sudo apt install ffmpeg
    • Windows: Install from ffmpeg.org

Usage Examples

To use ForceAlign, instantiate a ForceAlign object instance with your specified audio file and corresponding text transcript.

Example 1: Getting Word-Level Text Alignments

from forcealign import ForceAlign

# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Runs prediction and returns alignment results
words = align.inference()

# Show predicted word-level alignments
for word in words:
	print(word.word) # The word spoken in audio at associated time
	print(word.time_start) # Time (seconds) the word starts in speech.mp3
	print(word.time_end) # Time (seconds) the word ends in speech.mp3w

Example 2: Getting Phoneme-Level Text Alignments

from forcealign import ForceAlign

# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Runs prediction and returns alignment results
words = align.inference() 

# Accessing predicted phenome-level alignments
for word in words:
	print(word.word)
	for phoneme in word.phonemes:
		print(phoneme.phoneme) # ARPABET phonome spoken in audio at associated time
		print(phoneme.time_start) # Time (seconds) the phoneme starts in speech.mp3
		print(phoneme.time_end) # Time (seconds) the phoneme ends in speech.mp3

Example 3: Reviewing Word Level-Alignments

You can use the review_alignment() method to check the quality of your alignment in real-time. The review_alignment() method will play the audio file and print the individual words at their predicted times. This is useful for heuristically checking the accuracy of the word-level alignment predictions.

from forcealign import ForceAlign

# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Runs prediction and returns alignment results
words = align.inference() 

# Plays audio and prints each word in real-time at predicted alignment time.
align.review_alignment()

Use Cases

Forced alignment can be useful for generating subtitles for video, and for generating automated lip-syncing of animated characters with phoneme-level forced alignments.

FAQ

1. Does ForceAlign have speech-to-text capabilities? No. This is a feature that I plan on adding soon when I have time.

2. Can ForceAlign be used with both CPU and GPU? Yes. Running with CPU is surprisingly fast, and it will be even faster with GPU.

Acknowledgements

This project is heavily based upon a demo from Pytorch by Moto Hira: FORCED ALIGNMENT WITH WAV2VEC2

forcealign's People

Contributors

lukerbs avatar

Stargazers

Slice avatar 爱可可-爱生活 avatar  avatar Ye Bai avatar 彭震东 avatar Alexanda avatar Jackson Ding avatar  avatar

Watchers

 avatar

forcealign's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.