D4AM: A General Denoising Framework for Downstream Acoustic Models

This is the official implementation of D4AM. We will set our repository to the public if our paper gets accepted. The demo page is locally provided in our supplementary materials, and our source code is provided in this anonymous Github repository. All the model checkpoints can be found in our drive link as follows so that users can choose whether to train the fine-tuning models from scratch and reproduce the table results more precisely. Next, we will describe how to run our experiments in the following description.

Installation
Steps and Usages

Installation

Note that our environment is in Python 3.8.12. To run the experiment of D4AM, you can copy our repository and install it by using the pip tool:

pip install -r requirements.txt

Steps and Usages

1. Specify paths of speech and noise datasets:

Before starting the training process, there are two parts that we need to set the data paths. The first part is the speech and noise path in config/config.yaml. This part is responsible for preparing noisy-clean paired data only. Note that all the noisy utterances for training are generated online. The noise dataset is provided by DNS-Challenge, and the clean utterances come from the training subsets of LibriSpeech (Libri-360/Libri-500). Check dataset in config/config.yaml:

data:
    ...
    speech_path: [../speech_data/LibriSpeech/train-clean-360, ../speech_data/LibriSpeech/train-other-500]
    corruptor:
      noise_path: ../noise_data/DNS_noise_pros
    ...

speech_path indicates which subsets that are only used as the clean speech for the regression objective, and noise_path is the noise dataset used to corrupt clean utterances. In the second part, we need to set the wav path in the manifest under ds/manifests/libri. This part is responsible for preparing both (noisy) speech-text and noisy-clean paired data for the classification and regression objectives, respectively. e.g. ../speech_data/LibriSpeech/train-clean-100 should be modified to your own root path of LibriSpeech. Note that dev-clean_wham.csv and test-clean_wham.csv are our own mixed validation sets to observe learning curves, which follow the same format of original manifests. Users can manually generate their own validation sets by preparing the corresponding manifests with the same format.

2. Download the checkpoints of SE models and downstream recognizers:

We keep our initial model and the checkpoints of other fine-tuning results in the drive link. Users can decide to train the fine-tuning models individually or directly use our provided checkpoints for inference. All the downstream recognizers described in Section 4.2 can be found here. Both of them should be put under the D4AM directory and execute tar zxvf Filename.tar.gz. After the file extraction, make sure the pth files of SE models have been put under the ckpt folder (e.g. ckpt/INIT.pth) and the downstream recognizers have been put under the ds folder (e.g. ds/models/conformer).

3. Train your own fine-tuning models locally:

Most checkpoints have been provided in the link mentioned in the previous step. This step can be skipped if the corresponding SE models have been prepared in ckpt. To derive our own model, please execute this command: python main.py --task train --method [assigned method] (e.g. python main.py --task train --method D4AM). Note that you need to specify an alpha value as you want to choose GRID (e.g. python main.py --task train --method GRID --alpha 0.1).

4. Writing enhanced results for evaluation:

As you have prepared the checkpoints of all methods, run bash generate_enhanced_results.sh to generate the corresponding enhanced results results/ in and their manifests in ds/manifests/chime and ds/manifests/aurora. Note that you need to first specify your own roots of CHIME-4 and Aurora-4 in the script (chime_root and aurora_root).

5. Evaluation with various downstream recognizers:

While you settle down all the enhanced results and downstream recognizers, you can run the following command to test the performance of enhancement methods:

python main.py --task test --method [NOIS/INIT/CLSO/SRPR/GCLB/D4AM] --model[CONF/TRAN/RNN/W2V2] --test_set [chime/aurora]

By specifying the methods, the downstream recognizers, and the testing corpus, users will get the table results like our paper.

changlee0903 / d4am Goto Github PK

d4am's Introduction

D4AM: A General Denoising Framework for Downstream Acoustic Models

Contents

Installation

Steps and Usages

1. Specify paths of speech and noise datasets:

2. Download the checkpoints of SE models and downstream recognizers:

3. Train your own fine-tuning models locally:

4. Writing enhanced results for evaluation:

5. Evaluation with various downstream recognizers:

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent