Giter Site home page Giter Site logo

zirumandbigbro / audio-transcriptor-russian- Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 1.0 3.8 MB

[Russian] This script will split audio file on silence, transcript it with google recognition and save it in LJSpeech-1.1 dataset manner.

License: MIT License

Python 100.00%
python ljspeech transcriptor speech-to-text russian-language google-cloud audio-transcription

audio-transcriptor-russian-'s Introduction

Audio-transcriptor-russian-

This script will split audio file on silence, transcript it with google recognition and save it in LJSpeech-1.1 dataset manner.

  • This script will create the folder structure in the same manner as LJSpeech-1.1 dataset.
  • This script will splitt the audio files on silenses and send the audio chunks to google recognition service
  • Google will return the recognized text.
  • Text normalization will change integers and text written with latin letters into russian text.
  • Punctuation will set commas in text.
  • This text will be writen to metadata.csv in the same manner as in LJSpeech-1.1 dataset.
  • Audio chunks will be normalized on loudness.
  • Audio chunks will also be saved in the same manner as in LJSpeech-1.1 dataset.
  • See script for further description.

requirements:

pydub

SpeechRecognition

torch --> for normalizer

tqdm --> for normalizer

pytorch_pretrained_bert==0.6.2 --> for punctuation (bert)

pymorphy2==0.8 --> for punctuation (bert)

to work with mp3-files you will need to install ffmpeg and put it to PATH. https://github.com/FFmpeg/FFmpeg Windows installation instruction here http://blog.gregzaal.com/how-to-install-ffmpeg-on-windows/

Text normalization from https://github.com/snakers4/russian_stt_text_normalization is implemented here.

Punctuation (comma placement) from https://github.com/vlomme/Bert-Russian-punctuation dont forget to download pretrained model (see "how to use").

how to use

  • This script must be in the same folder with audio files that should be transcripted
  • The names of the audio files must be as follows: 01.mp3, 02.mp3, ..., 99.mp3 (or) 01.wav, 02.wav, ..., 99.wav
  • download pretrained bert model https://drive.google.com/file/d/190dLqhRjqgNJLKBqz0OxQ3TzxSm5Qbfx/view and place it in folder bert
  • start the audio_transcribe.py in IDLE

how to optimize for your audio

  • if you want to transcribe .wav-files instead of mp3 change

source_format = 'mp3' --> source_format = 'wav'

  • if you want to remove chunks with strange symbol rate (text symbols per second audio), set

symbols_gate = True

and set

symbol_rate_min and symbol_rate_max

you can use it to separate one speaker speaking with special speaking speed, and/or poor transcriptions. You will also get additional information about rate and audio length in metadata.csv.

  • if you want to remove chunks without speech, set

additional_clean = True

use it if you have audio with bad quality and chunks without speach.

  • assign an Speaker_id

Speaker_id = 'R001_'

  • change silence duration (in ms) for cut

min_silence_len = 500

  • change silence value (in dBFS)

silence_thresh = -36

  • change silense duration (in ms) to keep in audio

keep_silence = 100

  • change audio framerate (in Hz) in result audio

frame_rate = 16000

  • change length (in ms) of output audio file

target_length = 1000

  • if you want to set commas in yout text:

punctuation = True

  • if you want to use other language then russian:

change language in line

rec = r.recognize_google(audio_listened, language="ru-RU").lower()

as discribed here https://cloud.google.com/speech-to-text/docs/languages

replace line

rec = norm.norm_text(rec)

and set

punctuation = False

I wish you success

audio-transcriptor-russian-'s People

Contributors

zirumandbigbro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sfominx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.