Giter Site home page Giter Site logo

d4am's Introduction

D4AM: A General Denoising Framework for Downstream Acoustic Models

This is the official implementation of D4AM. We will set our repository to the public if our paper gets accepted. The demo page is locally provided in our supplementary materials, and our source code is provided in this anonymous Github repository. All the model checkpoints can be found in our drive link as follows so that users can choose whether to train the fine-tuning models from scratch and reproduce the table results more precisely. Next, we will describe how to run our experiments in the following description.

Contents

Installation

Note that our environment is in Python 3.8.12. To run the experiment of D4AM, you can copy our repository and install it by using the pip tool:

pip install -r requirements.txt

Steps and Usages

1. Specify paths of speech and noise datasets:

Before starting the training process, there are two parts that we need to set the data paths. The first part is the speech and noise path in config/config.yaml. This part is responsible for preparing noisy-clean paired data only. Note that all the noisy utterances for training are generated online. The noise dataset is provided by DNS-Challenge, and the clean utterances come from the training subsets of LibriSpeech (Libri-360/Libri-500). Check dataset in config/config.yaml:

data:
    ...
    speech_path: [../speech_data/LibriSpeech/train-clean-360, ../speech_data/LibriSpeech/train-other-500]
    corruptor:
      noise_path: ../noise_data/DNS_noise_pros
    ...
    

speech_path indicates which subsets that are only used as the clean speech for the regression objective, and noise_path is the noise dataset used to corrupt clean utterances. In the second part, we need to set the wav path in the manifest under ds/manifests/libri. This part is responsible for preparing both (noisy) speech-text and noisy-clean paired data for the classification and regression objectives, respectively. e.g. ../speech_data/LibriSpeech/train-clean-100 should be modified to your own root path of LibriSpeech. Note that dev-clean_wham.csv and test-clean_wham.csv are our own mixed validation sets to observe learning curves, which follow the same format of original manifests. Users can manually generate their own validation sets by preparing the corresponding manifests with the same format.

2. Download the checkpoints of SE models and downstream recognizers:

We keep our initial model and the checkpoints of other fine-tuning results in the drive link. Users can decide to train the fine-tuning models individually or directly use our provided checkpoints for inference. All the downstream recognizers described in Section 4.2 can be found here. Both of them should be put under the D4AM directory and execute tar zxvf Filename.tar.gz. After the file extraction, make sure the pth files of SE models have been put under the ckpt folder (e.g. ckpt/INIT.pth) and the downstream recognizers have been put under the ds folder (e.g. ds/models/conformer).

3. Train your own fine-tuning models locally:

Most checkpoints have been provided in the link mentioned in the previous step. This step can be skipped if the corresponding SE models have been prepared in ckpt. To derive our own model, please execute this command: python main.py --task train --method [assigned method] (e.g. python main.py --task train --method D4AM). Note that you need to specify an alpha value as you want to choose GRID (e.g. python main.py --task train --method GRID --alpha 0.1).

4. Writing enhanced results for evaluation:

As you have prepared the checkpoints of all methods, run bash generate_enhanced_results.sh to generate the corresponding enhanced results results/ in and their manifests in ds/manifests/chime and ds/manifests/aurora. Note that you need to first specify your own roots of CHIME-4 and Aurora-4 in the script (chime_root and aurora_root).

5. Evaluation with various downstream recognizers:

While you settle down all the enhanced results and downstream recognizers, you can run the following command to test the performance of enhancement methods:

python main.py --task test --method [NOIS/INIT/CLSO/SRPR/GCLB/D4AM] --model[CONF/TRAN/RNN/W2V2] --test_set [chime/aurora]

By specifying the methods, the downstream recognizers, and the testing corpus, users will get the table results like our paper.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.