Activation Maximization with a Prior in Speech Data

This repository is to introduce some applications for class-based Activation Maximization(AM) in audio domain, which was published at American Journal of Computer and Technology.

Introduction

Neural networks are predominant for various tasks including object detection, speech recognition, emotion detection, and so on. However, its process is, in general, not understandable for human beings. To understand how the models tackle the problems, some visualization techniques are invented such as feature visualizations. In this repository, I'm going to share the applications of Activation Maximization(AM) which is one of the feature visualization tactics.

Basically, in AM, the input data is optimized to the data that activates the selected neuron. It contains the filter of layers, the classification output, and so on. In our case, the output of the classifier is optimized to observe the result of being a certain class. That's why I called it class-based Activation Maximization, and this is mentioned in this paper. For further information, please visit this excellent explanation for AM

In this experiment, I'm going to optimize the noise of GAN which is employed as a prior as shown below. As for the form of audio data, 2 types of audio features are employed, which are raw audio and mel-spectrogram. We're going to observe the differences between the data form and the structure of the models. What's more, Conditional GAN is also experimented to figure out the importance of being a certain emotion. Lastly, the biggest advantage of this idea is that it can be used as an enhancer of the model output. For example, in our case, the model was not able to generate audio which we expected, but this concept allowed the model to enhance its output specific to our purpose.

Notebooks

This idea requires 2 models, including a classifier and a generator for GAN (or Conditional GAN). Some brief definition of the notebooks are as follows:

01_audio_emotion_classifier.ipynb: emotion classification in audio domain
02_GAN_training.ipynb: Training of GAN
03_GAN_audio_AM: Activation Maximization in raw audio with GAN
04_mel_emotion_classifier.ipynb: emotion classification in mel-spectrogram
05_GAN_mel_AM: Activation Maximization in mel_spectrogram with GAN
06_result_GAN_AM: Summary of the Activation Maximization in GAN
A_preprocessing_TESS_and_RAVDESS: Brief introduction and Preprocessing of Datasets
A-download_Download_TESS_RAVDESS: How to download TESS and RAVDESS datasets
B_WaveGlow_parameters: Obtaining the parameters of WaveGlow
C_Emotion_Recognition-Inception: emotion classification with Inception Model

Results

Since I'm not allowed to post any audio data in README, I've posted the audio on my blog.

Please visit GAN/notebook/06_result_GAN_AM.ipynb or GAN/notebook/06-A_result_cGAN_AM.ipynb for additional results and discussions.

neutral

sad

angry

happy

Further Research

AM while fixing the text information.
employ a model which is capable of adding emotion to audio, and use it as a prior.

Docker

In this repository, we share the environment that you can run the notebooks.

Build the docker environment.
- with GPU
  - docker build --no-cache -f Docker/Dockerfile.gpu .
- without GPU
  - docker build --no-cache -f Docker/Dockerfile.cpu .
Check the <IMAGE ID> of the created image.
- docker images
Run the docker environment
- with GPU
  - docker run --rm --gpus all -it -p 8080:8080 -e LOCAL_UID=$(id -u $USER) -e LOCAL_GID=$(id -g $USER) -v ~/:/work <IMAGE ID> bash
- without GPU
  - docker run --rm -it -p 8080:8080 -e LOCAL_UID=$(id -u $USER) -e LOCAL_GID=$(id -g $USER) -v ~/:/work <IMAGE ID> bash
Run the jupyter lab
- nohup jupyter lab --ip=0.0.0.0 --no-browser --allow-root --port 8080 --NotebookApp.token='' > nohup.out &
Open the jupyter lab
- Put http://localhost:8080/lab? to web browser.

Installation of some apps

Git LFS (large file storage)

Since this repository contains the parameters of the models. I used Git LFS to store a large file. The codes below are the recipe for this.

brew update
brew install git-lfs

then, navigate to this repository.

git lfs install
git lfs fetch --all
git lfs pull

Coming soon

Some are not explained which include:

explanations of some functions and models.

Contact

Feel free to contact me if you have any questions ([email protected]).

shinshoji01 / am_with_gan_for_melspectrogram Goto Github PK

am_with_gan_for_melspectrogram's Introduction

Activation Maximization with a Prior in Speech Data

Introduction

Notebooks

Results

Since I'm not allowed to post any audio data in README, I've posted the audio on my blog.

Further Research

Docker

Installation of some apps

Coming soon

Contact

am_with_gan_for_melspectrogram's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent