Giter Site home page Giter Site logo

aegan-ad_speech_anamoly_detection's Introduction

AEGAN-AD

Official pytorch implementation of AEGAN-AD.

Introduction

Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input spectrograms. It is pointed out that the denoising nature of reconstruction deprecates its capacity. Thus, the discriminator is redesigned to aid the generator during both training stage and detection stage. The performance of AEGAN-AD on the dataset of DCASE 2022 Challenge TASK 2 demonstrates the state-of-the-art result on five machine types. A novel anomaly localization method is also investigated.

Let $\mathbb{P}_r$ and $\mathbb{P}_g$ denote the real distribution and the generated distribution respectively. The discriminator is trained to minimize the following function:

$$\mathcal{L}D=\mathop{\mathbb{E}}\limits{\tilde{x}\sim{\mathbb{P}g}}\left[D(\tilde{x})\right]-\mathop{\mathbb{E}}\limits{x\sim\mathbb{P}r}\left[D(x)\right]+{\lambda}\mathop{E}{\hat{x}\sim\mathbb{P}{\hat{x}}}\left[\left(\Vert\nabla{\hat{x}}D(\hat{x})\Vert_2-1\right)^2\right]$$

where $\lambda$ is weight of gradient penalty. $\mathbb{P}_{\hat{x}}$ is a distribution of samples with the following linear combination of that drawn from $\mathbb{P}_r$ and $\mathbb{P}_g$:

$$\hat{x}=\alpha{\cdot}x+(1-\alpha){\cdot}{\tilde{x}}$$

where $x{\sim}\mathbb{P}_r$ and $\tilde{x}{\sim}\mathbb{P}_g$. $\alpha$ is a randomly selected parameter.

An alternative reconstruction-based loss function is adopted for the generator. Let $f(\cdot)$ denote the embedding of the discriminator. The loss function of the generator is formulated as follows:

$$\mathcal{L}G=\mathop{\mathbb{E}}\limits{x{\sim}\mathbb{P}r}\left[\Vert{x-G(x)}\Vert_2^2\right] +{\mu_1}{\Vert}\mathop{\mathbb{E}}\limits{x{\sim}\mathbb{P}r}\left[f(x)\right]-\mathop{\mathbb{E}}{\tilde{x}\sim\mathbb{P}_g}\left[f(\tilde{x})\right]{\Vert}_2^2$$

where the first term is the norm of the reconstruction error and the second term is the feature matching loss. A modified $\mathcal{L}_G$ is proposed for some machine types, which measures the feature matching loss via both mean and standard deviation:

$$\mathcal{L}G=\mathop{\mathbb{E}}\limits{x{\sim}\mathbb{P}r}\left[\Vert{x-G(x)}\Vert_2^2\right]+{\mu_1}{\Vert}\mathop{\mathbb{E}}\limits{x{\sim}\mathbb{P}r}\left[f(x)\right]-\mathop{\mathbb{E}}{\tilde{x}\sim\mathbb{P}_g}\left[f(\tilde{x})\right]{\Vert}2^2+{\mu_2}{\Vert}\mathop{\sigma}\limits{x{\sim}\mathbb{P}r}\left[f(x)\right]-\mathop{\sigma}{\tilde{x}\sim\mathbb{P}_g}\left[f(\tilde{x})\right]{\Vert}_2^2$$

where $\sigma[f(x)]$ is a vector, of which each element is the standard deviation of the corresponding dimension of the embedding.

Experiments were conducted on DCASE 20 datset and DCASE 22 dataset. Code for each dataset can be found in the corresponding directory. Please refer to the corresponding instructions below. Two sets of code are mainly the same.

DCASE 20

Preparation

Download the dataset from DCASE official website.

The directory should be organized in the following manner:

dataset_dir/
    dev_data/
        fan/
            train/
            test/
            ...
    eval_data/
        fan/
            train/
            test/
            ...

Then clone this repository. Modify the dataset_dir term in config.yaml to your dataset path.

Finally, install required packages:

    pip install -r requirements

Training

Hyper parameters are stored in config.yaml.

To train a model, please enter:

    python train.py --mt {machine type} -c {card_id} --seed {seed}

To test a model, please enter:

    python test.py --mt {machine type} -c {card_id}

Pretrained Dicts

Pretrained dicts are provided via Tsinghua cloud.

Place the dict under ./model (or modify in config.yaml to your custom path).

Substitute config.yaml by the corresponding config file in ./pretrain/.

To verify the performance, please enter:

    python test.py --mt {machine type} -c {card_id}

Result

fan pump slider ToyCar ToyConveyor valve
average 77.01 81.26 86.50 86.62 73.27 77.60
metric D-LOF D-LOF G-x-L2-mean D-KNN G-z-L2-min G-z-cos-mean
  • G-z means using the reconstruction error in the latent space of the generator. L2 stands for L2 norm.
  • D-LOF means using the LOF algorithm on the embedding of the discriminator.

DCASE 22

Preparation

Download the dataset from DCASE official website.

The directory should be organized in the following manner:

dataset_dir/
    dev_data/
        bearing/
            train/
            test/
            ...
    eval_data/
        bearing/
            train/
            test/
            ...

Then clone this repository. Modify the dataset_dir term in config.yaml to your dataset path.

Finally, install required packages:

    pip install -r requirements

Training

Hyper parameters are stored in config.yaml.

To train a model, please enter:

    python train.py --mt {machine type} -c {card_id} --seed {seed}

To test a model, please enter:

    python test.py --mt {machine type} -d -c {card_id}

Pretrained Dicts

Pretrained dicts are provided via Tsinghua cloud.

Place the dict under ./model (or modify in config.yaml to your custom path).

Substitute config.yaml by the corresponding config file in ./pretrain/.

To verify the performance, please enter:

    python test.py --mt {machine type} -d -c {card_id}

Result

bearing fan gearbox slider ToyCar
hmean 76.03 65.83 75.27 74.06 78.46
metric G-z-L1-sum G-z-cos-min G-z-cos-min D-LOF G-z-1-sum
  • G-z means using the reconstruction error in the latent space of the generator. 1 stands for L1 norm.
  • D-LOF means using the LOF algorithm on the embedding of the discriminator.

aegan-ad_speech_anamoly_detection's People

Contributors

jianganbai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.