Giter Site home page Giter Site logo

ashafaei / od-test Goto Github PK

View Code? Open in Web Editor NEW
62.0 5.0 11.0 3.59 MB

OD-test: A Less Biased Evaluation of Out-of-Distribution (Outlier) Detectors (PyTorch)

License: MIT License

Python 99.96% Shell 0.04%
deep-learning pytorch od-test outlier-detection evaluation-framework in-distribution anomaly-detection

od-test's Introduction

Publication Information

This repository accompanies the paper:

  • A. Shafaei, J. J. Little, Mark Schmidt. A Less Biased Evaluation of Out-of-distribution Sample Detectors. in BMVC 2019.
  • "Does Your Model Know the Digit 6 Is Not a Cat? A Less Biased Evaluation of Outlier Detectors." [ArXiv].

Keywords: out-of-distribution sample detection, outlier detection, anomaly detection, deep neural networks.

Bibtex:

@inproceedings{Shafaei2019,
    author = {Shafaei, Alireza and Schmidt, Mark and Little, James},
    booktitle = {BMVC},
    title = {{A Less Biased Evaluation of Out-of-distribution Sample Detectors}},
    year = {2019}
}
  • Raw Results with a list of experiments: [Google Sheets]
  • Experiment Files: Get the pretrained models [Document]

Introduction

Fig1

The problem of interest is out-of-distribution (OOD) sample detection. In our paper, we present an evaluation framework called OD-test for methods that address OOD sample detection. We show that the traditional evaluation strategy yields overly-optimistic results, hence the need for more reliable evaluation. In this repository, we implement the OD-test for image recognition problems with deep neural networks. You can replicate all the results of our paper here.

The OOD detection problem arises in settings where the input of the neural network in a deployed system is not guaranteed to follow a fixed distribution. OOD inputs can lead to unpredictable behaviour in neural network pipelines. For instance, the neural network might be trained to recognize the MNIST digits, but then when deployed, it might encounter a natural image which it has never seen. Counter-intuitively, the trained neural networks often fail silently and make over-confident predictions on previously unseen input. We need to develop methods that detect OOD samples to prevent unpredictable behaviour. Unfortunately, we cannot filter out these problematic instances by thresholding the output probability of the most likely class. In the above image, we show the output of several popular CNNs trained on ImageNet but tested on random benign images that do not belong to ImageNet.

The code in this repository allows

  1. Painless replication of all the results in the paper. You can run from scratch or use the pretrained models.
  2. Easy method development and testing for future work.
  3. Quickly adding more datasets for evaluation.

I have spent a long time refining this code. The final result is a modularized codebase that is reasonably efficient. I recommend that you take the time to understand the general architecture of the project before making substantial changes. I welcome pull-requests. In particular, if you wish to add a new method to the evaluation or improve a part of the code. You can ask questions in Issues.

What is next?

  • First step: setting up the project.
  • Replicating all the results of the paper.
    • Training the reference models with an explanation of the training mechanics here.
    • Get the pretrained models here.
    • Running the evaluations with a quick example here.
  • Learn about the code organization here.
  • How to add
    • A new network architecture here.
    • A new dataset for evaluation here.
    • A new method for evaluation here.

List of the Datasets

Index Name Train Valid Test Input Dim #Classes D1?
1 MNIST (50,000 10,000) 10,000 28x28 = 784 10 ✔️
2 FashionMNIST (50,000 10,000) 10,000 28x28 = 784 10 ✔️
3 NotMNIST 28x28 = 784 10
4 CIFAR 10 (40,000 10,000) 10,000 3x32x32 = 3,072 10 ✔️
5 CIFAR 100 (40,000 10,000) 10,000 3x32x32 = 3,072 100 ✔️
6 TinyImagenet 100,000 10,000 10,000 3x64x64 = 12,288 200 ✔️
7 STL10 5,000 (4,000 4,000) 3x96x96 = 27,648 10 ✔️
8 U(0,1) flexible
9 N(mu=0.5, sigma=0.25) flexible

List of Implemented Methods

Index Name Short Description Code
1 PbThreshold [1] A threshold on the maximum probability. link
2 ScoreSVM A SVM on the logits (pre-softmax). link
3 LogisticSVM A SVM on the logits of a network trained with k-way logistic loss function. link
4 MCDropout [2] MC-Dropout evaluation over 7 samples followed by a threshold on the entropy of average prediction. link
5 KNNSVM An SVM on the sorted Euclidean distance to K-nearest-neighbours. link
6 ODIN [3] A threshold on the scaled softmax outputs of the perturbed input. link
7 AEThreshold A threshold on the autoencoder reconstruction error of the given input. link
8 DeepEnsemble [4] Similar to MCDropout, except we average over the predictions of 5 networks that are trained independently with adversarial data augmentation. link
9 PixelCNN++ [5] A threshold on the log-likelihood of each input. link
10 OpenMax [6] Calibrated probability with additional unknown class and an SVM on top. link
11 K-MNNSVM, K-BNNSVM, K-VNNSVM Similar to KNNSVM, but uses the latent representation of different (variational)-autoencoders. link

Average Performance (Sept. 13, 2018)

Fig1

Setup

This project has been tested on:

  • Ubuntu 16.
  • CUDA 9.1, CudNN 7.
  • Python 2.7 + Virtual Env.
  • Titan X, Titan XP.
  • PyTorch 0.4

We recommend installing Anaconda with Python 2.7 and Virtual Env.

Depending on the experiment being run, you may need up to 12 GBs of GPU memory. A single GPU suffices to run all the experiments.

Setting up the Environment

The setup script creates a workspace folder within which all the subsequent project files will be put. To get started, you must first run the setup script.

> cd <root_folder>
> python setup/setup.py

The setup script will create a virtual environment at workspace/env and will install the required packages from requirements.txt. After a successful setup, you can activate the environment using:

source workspace/env/bin/activate

You must run all the scripts within the generated environment.

Visualization

In this project we use visdom for visualization mid-training. We use seaborn to generate the figures in the paper. I wrote this code assuming that visdom is available for visualization at first. I then rewrote the code to make it functional without visdom as well. However, some parts of the code may still require visdom to be accessible. To be on the safe side, I recommend having visdom in the background just in case until I make sure the entire project would function correctly without visdom as well. (pull request to resolve this issue are welcomed!)

In a new terminal tab run this:

> cd <root_folder>
> bash launch_visdom.sh

If you execute the code on a remote server, but you want to run visdom on your local machine, you can do a remote (reverse) port forwarding for port 8097 to your machine with SSH like this:

# Run this from the local machine to connect to the remote machine which will execute the scripts.
ssh -R 8097:localhost:8097 user@remote-machine

With this you can have a single visdom running locally, but having multiple machines reporting to the same visdom.

References

  1. D. Hendrycks and K. Gimpel, “A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks,” ICLR, 2017.
  2. Y. Gal and Z. Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” in ICML, 2016.
  3. S. Liang, Y. Li, and R. Srikant, “Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks,” ICLR, 2018.
  4. B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles,” in NIPS, 2017.
  5. T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, “Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications,” ICLR, 2017.
  6. A. Bendale and T. E. Boult, “Towards Open Set Deep Networks,” in CVPR, 2016.

od-test's People

Contributors

ashafaei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

od-test's Issues

Error in eval3d.py

Hi @ashafaei ,

Really impressive work! I am trying to reproduce the results from your paper.
When i run eval3d.py i get the following error:

Traceback (most recent call last):
File "C:/Users/Ramya/PycharmProjects/OSR/ODopenmax/OD/eval3d.py", line 109, in
BT.propose_H(d1_train)
File "C:\Users\Ramya\PycharmProjects\OSR\ODopenmax\OD\methods\base_threshold.py", line 113, in propose_H
trainer.run_epoch(0, phase='all')
File "C:\Users\Ramya\PycharmProjects\OSR\ODopenmax\OD\utils\iterative_trainer.py", line 125, in run_epoch
if criterion.size_average:
File "C:\Users\Ramya\Anaconda3\envs\osr\lib\site-packages\torch\nn\modules\module.py", line 535, in getattr
type(self).name, name))
AttributeError: 'NLLLoss' object has no attribute 'size_average'

Any help would be appreciated. Thank you!

[Paper] Color scale

The work done is impressive. My only major issue is about the use of the continuous color scale for the various bar plots. Although pretty, it makes for a hard reading of the plots when trying to find the bar from the legend. I believe a qualitative color palette would be more readable and more colorblind friendly.

Also, I did not quite get how the threshold of some methods (typically ODIN) were obtained. Is it optimized by grid search like the perturbation step size and the temperature ?

Hope it will get published soon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.