This is the description page of the CanTTS corpus, a single-peaker Cantonese speech dataset intended for Text-to-Speech (TTS).
To require access to CanTTS, please contact [email protected].
CanTTS consists of about 12000 utterances, which can be divided into three subsets. Below are the descriptions of CanTTS.
Samples of audios and transcripts are provided in examples.
attribute | description |
---|---|
speaker | native; female; aged 20+; of neutral emotion |
content domain | reading of books and news |
number of utterances | 12010 |
audio format | 16-bit PCM WAV |
sampling rate | 24 kHz |
total duration | 20.17 hours |
averaged utterance duration | 6.05 s |
annotation | character-level transcriptions with punctuation |
averaged sentence length | 20 chars/words |
CanTTS roughly divides the sentences into 3 subsetes according to the sentence type.
-
FN contains daily used sentences and to some extent reflects the distribution of different sentence types. Most sentences in FN are statements, but there are also some questions.
-
FQ contains normal questions where the interrogative intentions are reflected by the linguistic contents.
-
FU consists of declarative questions, i.e., questions with their linguistic contents being the same as statements. For those declarative questions, the speaker utters with rising intonation.
subset | content | utterances | duration (hours) |
---|---|---|---|
FN | daily used sentences, mostly statements | 10010 | 18.37 |
FQ | normal questions without significant rising intonation | 1000 | 0.92 |
FU | declarative questions expressed with rising intonation | 1000 | 0.88 |