Giter Site home page Giter Site logo

shinshoji01 / am_with_gan_for_melspectrogram Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 25.89 MB

This repository is to introduce the application of Activation Maximization for audio-domain data.

Jupyter Notebook 99.77% Python 0.23%
docker-environment gan activation-maximization mel-spectrogram

am_with_gan_for_melspectrogram's Introduction

Activation Maximization with a Prior in Speech Data

This repository is to introduce some applications for class-based Activation Maximization(AM) in audio domain, which was published at American Journal of Computer and Technology.


Introduction

Neural networks are predominant for various tasks including object detection, speech recognition, emotion detection, and so on. However, its process is, in general, not understandable for human beings. To understand how the models tackle the problems, some visualization techniques are invented such as feature visualizations. In this repository, I'm going to share the applications of Activation Maximization(AM) which is one of the feature visualization tactics.

Basically, in AM, the input data is optimized to the data that activates the selected neuron. It contains the filter of layers, the classification output, and so on. In our case, the output of the classifier is optimized to observe the result of being a certain class. That's why I called it class-based Activation Maximization, and this is mentioned in this paper. For further information, please visit this excellent explanation for AM

In this experiment, I'm going to optimize the noise of GAN which is employed as a prior as shown below. As for the form of audio data, 2 types of audio features are employed, which are raw audio and mel-spectrogram. We're going to observe the differences between the data form and the structure of the models. What's more, Conditional GAN is also experimented to figure out the importance of being a certain emotion. Lastly, the biggest advantage of this idea is that it can be used as an enhancer of the model output. For example, in our case, the model was not able to generate audio which we expected, but this concept allowed the model to enhance its output specific to our purpose.


Notebooks

This idea requires 2 models, including a classifier and a generator for GAN (or Conditional GAN). Some brief definition of the notebooks are as follows:

  • 01_audio_emotion_classifier.ipynb: emotion classification in audio domain
  • 02_GAN_training.ipynb: Training of GAN
  • 03_GAN_audio_AM: Activation Maximization in raw audio with GAN
  • 04_mel_emotion_classifier.ipynb: emotion classification in mel-spectrogram
  • 05_GAN_mel_AM: Activation Maximization in mel_spectrogram with GAN
  • 06_result_GAN_AM: Summary of the Activation Maximization in GAN
  • A_preprocessing_TESS_and_RAVDESS: Brief introduction and Preprocessing of Datasets
  • A-download_Download_TESS_RAVDESS: How to download TESS and RAVDESS datasets
  • B_WaveGlow_parameters: Obtaining the parameters of WaveGlow
  • C_Emotion_Recognition-Inception: emotion classification with Inception Model

Results


Since I'm not allowed to post any audio data in README, I've posted the audio on my blog.


Please visit GAN/notebook/06_result_GAN_AM.ipynb or GAN/notebook/06-A_result_cGAN_AM.ipynb for additional results and discussions.

neutral

GAN_models_neutral_sample_4

neutral

sad

GAN_models_sad_sample_0

sad

angry

GAN_models_angry_sample_0

angry

happy

GAN_models_happy_sample_3

happy

Further Research

  • AM while fixing the text information.
  • employ a model which is capable of adding emotion to audio, and use it as a prior.

Docker

In this repository, we share the environment that you can run the notebooks.

  1. Build the docker environment.
    • with GPU
      • docker build --no-cache -f Docker/Dockerfile.gpu .
    • without GPU
      • docker build --no-cache -f Docker/Dockerfile.cpu .
  2. Check the <IMAGE ID> of the created image.
    • docker images
  3. Run the docker environment
    • with GPU
      • docker run --rm --gpus all -it -p 8080:8080 -e LOCAL_UID=$(id -u $USER) -e LOCAL_GID=$(id -g $USER) -v ~/:/work <IMAGE ID> bash
    • without GPU
      • docker run --rm -it -p 8080:8080 -e LOCAL_UID=$(id -u $USER) -e LOCAL_GID=$(id -g $USER) -v ~/:/work <IMAGE ID> bash
  4. Run the jupyter lab
    • nohup jupyter lab --ip=0.0.0.0 --no-browser --allow-root --port 8080 --NotebookApp.token='' > nohup.out &
  5. Open the jupyter lab

Installation of some apps

Git LFS (large file storage)

Since this repository contains the parameters of the models. I used Git LFS to store a large file. The codes below are the recipe for this.

brew update
brew install git-lfs
  • then, navigate to this repository.
git lfs install
git lfs fetch --all
git lfs pull

Coming soon

Some are not explained which include:

  • explanations of some functions and models.

Contact

Feel free to contact me if you have any questions ([email protected]).

am_with_gan_for_melspectrogram's People

Contributors

shinshoji01 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.