Giter Site home page Giter Site logo

hguimaraes / sewunet Goto Github PK

View Code? Open in Web Editor NEW
26.0 3.0 4.0 15.06 MB

[Research] Monaural Speech Enhancement through Wave-U-Net (SEWUNet)

Home Page: http://hguimaraes.me/SEWUNet/

License: MIT License

Python 100.00%
speech-enhancement deep-learning pytorch noise-reduction

sewunet's Introduction

SEWUNet

Speech Enhancement through Deep Wave-U-Net

Check the full paper here.

Introduction

In this paper we present an end-to-end approach to remove background context from speech signals on its raw waveform. The input of the network is an audio, with 16kHz of sample rate, corrupted by an additive noise within a signal-to-noise ratio between 5dB and 15dB, uniformly distributed. The system aims to produce a signal with clean speech content. Currently there are multiple deep learning architectures for this task, with encouraging results, from spectral-based frontends to raw waveform. Our method is based on the Wave-U-Net architecture with some adaptations to our problem, proposing a weight initialization through an autoencoder before initializing the training for the main task. We show that, through quantitative metrics, our method is prefered over the classical Wiener filtering.

How to use

The are two ways of use this repository: 1. To train your own model with your data 2. Only apply the the technique on your data with a pre-trained model

How to train

tl;dr: Steps to train the best model in the same way as show in the paper.

  1. Download the LibriSpeech dataset and the UrbanSound8K to your local machine.
  2. Extract the files under the folder: /data/raw_data/
  3. Execute the preprocess.py script in the utils folder
  4. Go to the nbs folder and start by executing the autoenconder notebook
  5. Move the "/models/checkpoint.pt" to the nbs folder and rename to ae_checkpoint.pt
  6. (optional) Check the results from the autoencoder in the logs folder. Delete the files before execute the next step or both will be saved on the same directory.
  7. Execute the model_4-L1 notebook

Testing with trained model

tl;dr: How to test the speech enhancement with our trained model

  1. Place your files under the /data/evaluate folder.
  2. Configure the config.json under the src folder.
  3. Run the test.py script under the src folder.
  4. Your files will be in the same directory of the input data but with a "_processed" suffix.

Results

Considering a set of corrupted signals by an additive noise within SNR of 10dB, our best model could achieve 15.8dB. Result examples can be seen on the assets/results folder.

Spectrogram of the 00FWQOXLMACK5HE sample

Cite

@article{GUIMARAES2020113582,
title = {Monaural speech enhancement through deep wave-U-net},
journal = {Expert Systems with Applications},
volume = {158},
pages = {113582},
year = {2020},
issn = {0957-4174},
doi = {https://doi.org/10.1016/j.eswa.2020.113582},
url = {https://www.sciencedirect.com/science/article/pii/S0957417420304061},
author = {Heitor R. Guimarães and Hitoshi Nagano and Diego W. Silva},
keywords = {Speech enhancement, Noise reduction, Wave-U-net, Deep learning, Signal to Noise Ratio (SNR), Word Error Rate (WER)},
abstract = {In this paper, we present Speech Enhancement through Wave-U-Net (SEWUNet), an end-to-end approach to reduce noise from speech signals. This background context is detrimental to several downstream systems, including automatic speech recognition (ASR) and word spotting, which in turn can negatively impact end-user applications. We show that our proposal does improve signal-to-noise ratio (SNR) and word error rate (WER) compared with existing mechanisms in the literature. In the experiments, network input is a 16 kHz sample rate audio waveform corrupted by an additive noise. Our method is based on the Wave-U-Net architecture with some adaptations to our problem. Four simple enhancements are proposed and tested with ablation studies to prove their validity. In particular, we highlight the weight initialization through an autoencoder before training for the main denoising task, which leads to a more efficient use of training time and a higher performance. Through quantitative metrics, we show that our method is prefered over the classical Wiener filtering and shows a better performance than other state-of-the-art proposals.}
}

sewunet's People

Contributors

hguimaraes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sewunet's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.