Giter Site home page Giter Site logo

isegan's Introduction

Improved SEGAN

Tricks to improve SEGAN performance. Eveything is re-implemented into Keras with Tensorflow backend.

Supporting document with evaluation results and other details can be found here.

Deepak Baby, iSEGAN: Improved Speech Enhancement Generative Adversarial Networks, Arxiv preprint, 2020.


Pre-requisites

  1. Install tensorflow and keras
  2. Install tqdm for profiling the training progress
  3. The experiments are conducted on a dataset from Valentini et. al., and are downloaded from here. The following script can be used to download the dataset. Requires sox for converting to 16kHz.
    $ ./download_dataset.sh

Running the model

  1. Prepare data for training and testing the various models. The folder path may be edited if you keep the database in a different folder. This script is to be executed only once and the all the models reads from the same location.
    python prepare_data.py
  2. Running the models. The training and evaluation of the various segan models are implemented in run_isegan.py. which offers several cGAN configurations. Edit the opts variable for choosing the cofiguration. The results will be automatically saved to different folders. The folder name is generated from files_ops.py and the foldername automatically includes different configuration options.

The options are:

  • Different normalizations
    • Instance Normalization
    • Batch Normalization
    • Batch Renormalization
    • Group Normalization
    • Spectral Normalization
  • One Sided Label Smoothing: Encouranging the discriminator to estimate soft probabilities (0.8, 0.9, etc.) on the real samples.
  • Trainable Auditory filter-bank layer: The first layer is initialized using a gammatone filterbank and use it as a trainable layer.
  • Pre-emphasis Layer : Incorporating the pre-emphasis operation as a trainable layer.
  1. Evaluation on testset is also done together with training. Set TEST_SEGAN = False for disabling testing.

Misc

  • This code loads all the data into memory for speeding up training. But if you dont have enough memory, it is possible to read the mini-batches from the disk using HDF5 read. In run_<xxx>.py
    clean_train_data = np.array(fclean['feat_data'])
    noisy_train_data = np.array(fnoisy['feat_data'])
    change the above lines to
    clean_train_data = fclean['feat_data']
    noisy_train_data = fnoisy['feat_data']
    But this can lead to a slow-down of about 20 times (on the test machine) as the mini-batches are to be read from the disk over several epochs.

References

[1] S. Pascual, A. Bonafonte, and J. Serra, SEGAN: speech enhancement generative adversarial network, in INTERSPEECH., ISCA, Aug 2017, pp. 3642โ€“3646.


Credits

The keras implementation of cGAN is based on the following repos

isegan's People

Contributors

deepakbaby avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.