ishine's Projects
A PyTorch implementation of target speaker extraction.
This is an implementation of paper "End-to-end Speech Translation via Cross-modal Progressive Training" (Interspeech2021)
Multi-dimensional arrays with broadcasting and lazy computing
being a multi-speaker video-to-speech network
一个关于血色衣冠的对话机器人, 基于 Rasa, 可语音与机器人对话
Extremely fast non-cryptographic hash algorithm
Y-vector: Multiscale Waveform Encoder for Speaker Embedding
YaRN: Efficient Context Window Extension of Large Language Models
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (https://arxiv.org/pdf/2307.08621.pdf)
A series of large language models trained from scratch by developers @01-ai
优客服,是一个多渠道融合的客户支持服务平台(智能客服系统),和电话销售平台(电销系统),包含WebIM,微信,电话,邮件,短信等接入渠道 http://www.youkefu.cn
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
恋听网有声书爬虫, scrapy框架
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
A youtube-dl fork with additional features and fixes
Open tools and data for cloudless automatic speech recognition
Tacotron based speech synthesizer
A BERT-based Chinese Text Encoder Enhanced by N-gram Representations
Zero -- A neural machine translation system
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022
VQ-VAE for Acoustic Unit Discovery and Voice Conversion
Pushing the Limits of Zero-shot End-to-End Speech Translation
Zero-Shot Emotion Style Transfer
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering