Giter Site home page Giter Site logo

pu-mil-ad's Introduction

PU-MIL-AD

PU-MIL-AD is a GitHub repository containing the PUMA [1] algorithm. It refers to the paper titled Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection.

Check out the pdf here: MISSING LINK -- Will be available as soon as the paper is online.

Abstract

In the multi-instance learning (MIL) setting instances are grouped together into bags. Labels are provided only for the bags and not on the level of individual instances. A positive bag label means that at least one instance inside the bag is positive, while a negative bag label restricts all the instances in the bag to be negative. MIL data naturally arises in many contexts, such as anomaly detection, where labels are rare and costly, and one often ends up annotating the label for sets of instances. Moreover, in many real-world anomaly detection problems, only positive labels are collected because they usually represent critical events. Such a setting, where only positive labels are provided along with unlabeled data, is called Positive and Unlabeled (PU) learning. Despite being useful for several use cases, there is no work dedicated to learning from positive and unlabeled data in a multi-instance setting for anomaly detection. Therefore, we propose the first method that learns from PU bags in anomaly detection. Our method uses an autoencoder as an underlying anomaly detector. We alter the autoencoder’s objective function and propose a new loss that allows it to learn from positive and unlabeled bags of instances. We theoretically analyze this method. Experimentally, we evaluate our method on 30 datasets and show that it performs better than multiple baselines adapted to work in our setting.

Contents and usage

The repository contains:

  • PUMA.py, a function that allows to use PUMA's algorithm;
  • Notebook.ipynb, a notebook showing how to use PUMA on an artificial 2D dataset;
  • create_ds.py, a function that generates the artificial 2D dataset for the Notebook;
  • build_bags.py, the algorithm that we used to create bags for benchmark datasets, as explained in the paper.

To use PUMA, import the GitHub repository or simply download the files. You can find the benchmark datasets at these links: [DAMI] and [ADBench].

Positive and Unlabeled Multi-instance Anomaly detector (PUMA)

Given a dataset with attributes X in bag shape (e.g., numpy array with 3 dimensions) and an array with the bag labels (1 for anomalous, 0 for unlabeled), PUMA works as follows. First, you need to specify the network structure as well as the key hyperparameters (# reliable negatives, learning rate, batch_size, epochs, ...). Second, using the fit function you can train PUMA. Finally, the decision function returns the anomaly probabilities for both bags and instances. Please, check out the Notebook for the details.

Dependencies

The PUMA function requires the following python packages to be used:

Contact

Contact the author of the paper: [email protected].

References

[1] Perini, L., Vercruyssen, V., Davis, J.: Learning from Positive and Unlabeled Multi-Instance Bags in Anomaly Detection. In: the 29TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2023.

pu-mil-ad's People

Contributors

lorenzo-perini avatar sqrhussain avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.