ishine's Projects
用 OCR 提取视频硬字幕
Production First and Production Ready End-to-End Text-to-Speech Toolkit
wfst-based language model decoder
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
微调Whisper语音识别模型,支持无时间戳数据训练,有时间戳数据训练、无语音数据训练。加速推理,支持Web部署、Windows桌面部署和Android部署
[WIP] Scripts for fine-tuning a Whisper model
Zero-shot Punctuation Insertion using Whisper
Apple PodCast Transcription with OpenAI's Whisper
Port of OpenAI's Whisper model in C/C++
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
WhisperX: Automatic Speech Recognition with Accurate Word-level Timestamps.
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”
Massively multilingual pronunciation mining
速度更快、效果更好的中文新词发现
Word Discovery in Visually Grounded, Self-Supervised Speech Models
Quantized word vectors that take 8x-16x less space than regular word vectors
从互联网数据生成中文词库
HTML player for W3C Audiobooks
Implementation of our paper "A Hybrid Deep Feature Selection Framework for Emotion Recognition from Human Speeches" [Multimedia Tools and Applications, Springer]
Mocap Dataset of “Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation”
Web based transcription tool
wu-manber-algorithm-for-chinese