This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language modeling.
-
[arXiv][demo][code] RepCodec: A Speech Representation Codec for Speech Tokenization
-
[arXiv] [demo] [code] SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
-
[arXiv][demo][code] HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
-
[ASRU][arXiv] W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
-
[TASLP][arXiv][demo] SoundStream: An End-to-End Neural Audio Codec
-
[TASLP][arXiv][code] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
-
[arXiv] Variable-rate discrete representation learning