This directory contains code to import and evaluate the Speaker Identification and Verification models pretrained on the VoxCeleb dataset as described in the paper:
A. Nagrani, J. S. Chung, A. Zisserman, VoxCeleb: a large-scale speaker identification dataset,
INTERSPEECH, 2017
To use the models first install the MatConvNet framework. Instructions can be found here.
The easiest way to use the code in this repo is with the vl_contrib
package
manager. To install, follow these steps:
-
Install and compile matconvnet by following instructions here.
-
Run:
vl_contrib install VGGVox
vl_contrib setup VGGVox
- You can then run the demo scripts provided to import and test the models. There are two short demo scripts, for both Identification and Verification. These demos demonstrate how to evaluate the models directly on
.wav
audio files:
demo_vggvox_identif
demo_vggvox_verif
These models have been pretrained on the VoxCeleb dataset. VoxCeleb contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube. The dataset is gender balanced, with 55% of the speakers male. The speakers span a wide range of different ethnicities, accents, professions and ages. The dataset can be downloaded directly from here.
If you use this code then please cite:
@InProceedings{Nagrani17,
author = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
title = "VoxCeleb: a large-scale speaker identification dataset",
booktitle = "INTERSPEECH",
year = "2017",
}