cantts's Introduction

CanTTS

This is the description page of the CanTTS corpus, a single-peaker Cantonese speech dataset intended for Text-to-Speech (TTS).

To require access to CanTTS, please contact [email protected].

CanTTS consists of about 12000 utterances, which can be divided into three subsets. Below are the descriptions of CanTTS.

Examples

Samples of audios and transcripts are provided in examples.

attribute	description
speaker	native; female; aged 20+; of neutral emotion
content domain	reading of books and news
number of utterances	12010
audio format	16-bit PCM WAV
sampling rate	24 kHz
total duration	20.17 hours
averaged utterance duration	6.05 s
annotation	character-level transcriptions with punctuation
averaged sentence length	20 chars/words

CanTTS roughly divides the sentences into 3 subsetes according to the sentence type.

FN contains daily used sentences and to some extent reflects the distribution of different sentence types. Most sentences in FN are statements, but there are also some questions.
FQ contains normal questions where the interrogative intentions are reflected by the linguistic contents.
FU consists of declarative questions, i.e., questions with their linguistic contents being the same as statements. For those declarative questions, the speaker utters with rising intonation.

subset	content	utterances	duration (hours)
FN	daily used sentences, mostly statements	10010	18.37
FQ	normal questions without significant rising intonation	1000	0.92
FU	declarative questions expressed with rising intonation	1000	0.88

Recommend Projects