scripts:
-
train.py
- train segmentation model. The resulted model writes in folder \models. -
eval.py
- model evaluation. The result writes in \output and metrics_result.txt (Set trained_model and read_hyperparameters if you want to use trained model and new hyperparameters) -
infer.py
- processing data. The result writes in folder \output. (Set trained_model and read_hyperparameters if you want to use trained model and new hyperparameters) -
tune_hyperparameters.py
- the pipeline hyper-parameters optimizing -- segmentation.threshold and clustering.threshold
directoires:
hyperparameters
- hyperparameters (result oftune_hyperparameters.py
, using ininfer.py
)models
- trained model (result oftrain.py
)data_train
- put .wav and .csv for models trainingdata_test
- put .wav and .csv for models testingreference_audio
- put .wav of the therapist's speech examplesoutput
- results
-
Install Python 3.8+ (though it might work with Python 3.7)
-
Install libraries. There's two ways installation liberies:
- use
requirements.txt
:
Download code, open terminal in \therapist
directory and type:
pip install -r requirements.txt
- manually:
pip install torch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0
pip install pyannote.audio
pip install huggingface_hub
Primarily requirements are driven by the pyannote library.
Official pyannote.audio pipelines are open-source, but gated. It means that you have to first accept users conditions on their respective Huggingface page to access the pretrained weights and hyper-parameters.
- Register on HuggingFace
- Visit speaker-diarization page and accept the terms
- Visit segmentation page and accept the terms
- Go to Seetings (User Access Tokens), generate token and copy it
- Insert token in
token.txt
file
You can use the same token on different PC.
To get a result on a new data type in terminal:
$ python infer.py --trained_model --read_hyperparameters --no_merge --input_path <DATA_PATH> --path_reference_audio <REFERENCE_PATH>
Meaning of the flags:
--trained_model
-- use the trained model--read_hyperparameters
-- use optimized hyperparameter--no_merge
-- turn off merging audio from same speaker talk continuously--input_path
-- path to data folder, default\data
--path_reference_audio
-- path to reference data folder of therapist's audios, default\reference_audio
All flags are optional. For example $ python infer.py
-- default models from pyanote with default hyperparams and result merges by speaker.