numenta / nab Goto Github PK

The Numenta Anomaly Benchmark

License: GNU Affero General Public License v3.0

Python 7.07% HTML 0.09% Java 0.30% Jupyter Notebook 92.47% R 0.07%

nab's Introduction

The Numenta Anomaly Benchmark (NAB)

Welcome. This repository contains the data and scripts which comprise the Numenta Anomaly Benchmark (NAB) v1.1. NAB is a novel benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is composed of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications.

Included are the tools to allow you to run NAB on your own anomaly detection algorithms; see the NAB entry points info. Competitive results tied to open source code will be posted on the Scoreboard. Let us know about your work by emailing us at [email protected] or submitting a pull request.

This readme is a brief overview and contains details for setting up NAB. Please refer to the following for more details about NAB scoring, data, and motivation:

Unsupervised real-time anomaly detection for streaming data - The main paper, covering NAB and Numenta's HTM-based anomaly detection algorithm
NAB Whitepaper
Evaluating Real-time Anomaly Detection Algorithms - Original publication of NAB

We encourage you to publish your results on running NAB, and share them with us at [email protected]. Please cite the following publication when referring to NAB:

Ahmad, S., Lavin, A., Purdy, S., & Agha, Z. (2017). Unsupervised real-time anomaly detection for streaming data. Neurocomputing, Available online 2 June 2017, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2017.04.070

Scoreboard

The NAB scores are normalized such that the maximum possible is 100.0 (i.e. the perfect detector), and a baseline of 0.0 is determined by the "null" detector (which makes no detections).

Detector	Standard Profile	Reward Low FP	Reward Low FN
Perfect	100.0	100.0	100.0
ARTime	74.9	65.1	80.4
Numenta HTM*	70.5-69.7	62.6-61.7	75.2-74.2
CAD OSE†	69.9	67.0	73.2
earthgecko Skyline	58.2	46.2	63.9
KNN CAD†	58.0	43.4	64.8
Relative Entropy	54.6	47.6	58.8
Random Cut Forest ****	51.7	38.4	59.7
Twitter ADVec v1.0.0	47.1	33.6	53.5
Windowed Gaussian	39.6	20.9	47.4
Etsy Skyline	35.7	27.1	44.5
Bayesian Changepoint**	17.7	3.2	32.2
EXPoSE	16.4	3.2	26.9
Random***	11.0	1.2	19.5
Null	0.0	0.0	0.0

As of NAB v1.0

* From NuPIC version 1.0 (available on PyPI); the range in scores represents runs using different random seeds.

** The original algorithm was modified for anomaly detection. Implementation details are in the detector's code.

*** Scores reflect the mean across a range of random seeds. The spread of scores for each profile are 7.95 to 16.83 for Standard, -1.56 to 2.14 for Reward Low FP, and 11.34 to 23.68 for Reward Low FN.

**** We have included the results for RCF using an AWS proprietary implementation; even though the algorithm code is not open source, the algorithm description is public and the code we used to run NAB on RCF is open source.

† Algorithm was an entry to the 2016 NAB Competition.

Please see the wiki section on contributing algorithms for discussion on posting algorithms to the scoreboard.

Corpus

The NAB corpus of 58 timeseries data files is designed to provide data for research in streaming anomaly detection. It is comprised of both real-world and artifical timeseries data containing labeled anomalous periods of behavior.

The majority of the data is real-world from a variety of sources such as AWS server metrics, Twitter volume, advertisement clicking metrics, traffic data, and more. All data is included in the repository, with more details in the data readme. Please contact us at [email protected] if you have similar data (ideally with known anomalies) that you would like to see incorporated into NAB.

The NAB version will be updated whenever new data (and corresponding labels) is added to the corpus or other significant changes are made.

Additional Scores

For comparison, here are the NAB V1.0 scores for some additional flavors of HTM.

Numenta HTM using NuPIC v.0.5.6: This version of NuPIC was used to generate the data for the paper mentioned above (Unsupervised real-time anomaly detection for streaming data. Neurocomputing, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2017.04.070). If you are interested in replicating the results shown in the paper, use this version.
HTM Java is a Community-Driven Java port of HTM.
nab-comportex is a twist on HTM anomaly detection using Comportex, a community-driven HTM implementation in Clojure. Please see Felix Andrew's blog post on experiments with this algorithm.
NumentaTM HTM detector uses the implementation of temporal memory found here.
Numenta HTM detector with no likelihood uses the raw anomaly scores directly. To run without likelihood, set the variable self.useLikelihood in numenta_detector.py to False.

Detector	Standard Profile	Reward Low FP	Reward Low FN
Numenta HTMusing NuPIC v0.5.6*	70.1	63.1	74.3
nab-comportex†	64.6	58.8	69.6
NumentaTM HTM*	64.6	56.7	69.2
HTM Java	56.8	50.7	61.4
Numenta HTM*, no likelihood	53.62	34.15	61.89

* From NuPIC version 0.5.6 (available on PyPI).

† Algorithm was an entry to the 2016 NAB Competition.

Installing NAB

Supported Platforms

OSX 10.9 and higher
Amazon Linux (via AMI)

Other platforms may work. NAB has been tested on Windows 10 but is not officially supported.

Initial requirements

You need to manually install the following:

Download this repository

Use the Github links provided in the right sidebar.

Install NAB

Pip:

From inside the checkout directory:

pip install -r requirements.txt
  pip install . --user

If you want to manage dependency versions yourself, you can skip dependencies with:

pip install . --user --no-deps

If you are actively working on the code and are familiar with manual PYTHONPATH setup:

  pip install -e . --install-option="--prefix=/some/other/path/"

Anaconda:

conda env create

Usage

There are several different use cases for NAB:

If you want to look at all the results we reported in the paper, there is no need to run anything. All the data files are in the data subdirectory and all individual detections for reported algorithms are checked in to the results subdirectory. Please see the README files in those locations.
If you want to plot some of the results, please see the README in the scripts directory for scripts/plot.py
If you have your own algorithm and want to run the NAB benchmark, please see the NAB Entry Points section in the wiki. (The easiest option is often to simply run your algorithm on the data and output results in the CSV format we specify. Then run the NAB scoring algorithm to compute the final scores. This is how we scored the Twitter algorithm, which is written in R.)
If you are a NuPIC user and want to run the Numenta HTM detector follow the directions below to "Run HTM with NAB".
If you want to run everything including the bundled Skyline detector follow the directions below to "Run full NAB". Note that this will take hours as the Skyline code is quite slow.
If you want to run NAB on one or more data files (e.g. for debugging) follow the directions below to "Run a subset of NAB".

Run a detector on NAB

cd /path/to/nab
python run.py -d expose --detect --optimize --score --normalize

This will run the EXPoSE detector only and produce normalized scores. Note that by default it tries to use all the cores on your machine. The above command should take about 20-30 minutes on a current powerful laptop with 4-8 cores. For debugging you can run subsets of the data files by modifying and specifying specific label files (see section below). Please type:

python run.py --help

to see all the options.

Running non-Python 3 detectors

NAB is a Python 3 framework, and can only integrate Python 3 detectors. The following detectors must be run outside the NAB runtime and integrated for scoring in a later step. These detectors include:

numenta (Python 2)
numentaTM (Python 2)
htmjava (Python 2 / Java)
twitterADVec (R)
random_cut_forest (AWS Kinesis Analytics)

Instructions on how to run the each detector in their native environment can be found in the nab/detectors/${name} directory. The Python 2 HTM detectors are also provided within a docker image, available with docker pull numenta/nab:py2.7.

Run full NAB

cd /path/to/nab
python run.py

This will run all detectors available in this repository and produce results files. To run non-Python3 detectors see "Running non-Python3 detectors" above.

Note: this option may take many many hours to run.

Run subset of NAB data files

For debugging it is sometimes useful to be able to run your algorithm on a subset of the NAB data files or on your own set of data files. You can do that by creating a custom combined_windows.json file that only contains labels for the files you want to run. This new file should be in exactly the same format as combined_windows.json except it would only contain windows for the files you are interested in.

Example: an example file containing two files is in labels/combined_windows_tiny.json. The following command shows you how to run NAB on a subset of labels:

cd /path/to/nab
python run.py -d expose --detect --windowsFile labels/combined_windows_tiny.json

This will run the detect phase of NAB on the data files specified in the above JSON file. Note that scoring and normalization are not supported with this option. Note also that you may see warning messages regarding the lack of labels for other files. You can ignore these warnings.

nab's People

Contributors

Stargazers

Watchers

Forkers

rhyolight chetan51 simjega evelynmitchell scottpurdy subutai imclab sambitgaan boltzmannbrain tomsilver plexzhang cbaranski bertdg mauricioroman metaml mikev saganbolliger allisonmorgan ywcui1990 rtvt123 niyaziso marionleborgne dylan-fan kaynewest whatragnar ieasydevops umabuj umahub monte3card jonathanstrong fangzheng354 mrchristophrivera linearregression rcrowder nathanielc wakamori contextswitchwang chen0031 morning-wind tupol loubens480 weaver-viii caidongyun mrcslws mathematixy bnkcolon latuji paulhendricks neuroidss lscheinkman balasista leotam yanweifu valentinpuente littlefoxhome pepsalehi suchivora oxtopus zuhaagha nudtchengqing all-seeing-code isandeep jjsong ishvlad trahasch nishanthvanand smirmik aleyasen cogmission rahul100 balajeeu yuyincug redshit shizifan realsightapm ytakashina ccasado go-bears faducoder sahaana rajpurkar dsanno khonegger beeva-pelayogonzalez mahmoud-elshaer zuyangyang gjmulder shou-fujita yilmazyasin taecjee ivannz tobbetripitaka klogo chickenbestlover wangxiaowang allenbunny skibish dantodor wtw358 mkozel92

nab's Issues

Numenta detector parameter optimization

Test detector performance with parameter optimization -- resolution, averaging window for anomaly likelihood, etc.

Includes #93

Better command line UX

Users should be able to easily set parameters, particularly which individual detectors and profiles to run because they take a while. They can do this manually in the code, but it would be much better from command line (with supporting instructions and examples in the readme).

Newbie installation problems

Trying to run NAB, I ran into a problem:

nupic_py27mmm@mmm-U2442:~/nupic/NAB$ python run.py
{'dataDir': 'data',
'detect': False,
'detectors': ['numenta'],
'labelDir': 'labels',
'numCPUs': None,
'optimize': False,
'probationaryPercent': 0.15,
'profilesPath': 'config/user_profiles.yaml',
'resultsDir': 'results',
'score': False,
'thresholdPath': 'config/threshold_config.yaml'}
Proceed? (y/n): y
/home/mmm/nupic/env_py27/local/lib/python2.7/site-packages/pandas/core/frame.py:1771: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
"DataFrame index.", UserWarning)
Traceback (most recent call last):
File "run.py", line 152, in
main(args)
File "run.py", line 65, in main
runner.initialize()
File "/home/mmm/nupic/NAB/nab/runner.py", line 90, in initialize
self.corpusLabel.initialize()
File "/home/mmm/nupic/NAB/nab/labeler.py", line 124, in initialize
self.getLabels()
File "/home/mmm/nupic/NAB/nab/labeler.py", line 158, in getLabels
labels["label"].values[indices] = 1
IndexError: unsupported iterator index

I've tried pulling #15 which mentions some bugfix to labeling, but that gives me another error (so I can't tell w/o a vanilla running)

Possibly remove data file

Because of anomalies close to the probationary period, we may consider removing "realAWSCloudwatch/iio_us-east-1_i-a2eb1cd9_NetworkIn.csv" from the benchmark corpus in the future. For now (NAB v0.1) we're manually setting the windows so they don't overlap the probationary period.

Scoreboard

Need to have a scoreboard of detector results posted.

Cleanup needed

Cleaning up the code throughout NAB is needed. This includes, but not limited to:

Consistent use of variable names and labels
Remove unnecessary code
CSV files -- column headers, names
Comments

There should be no variation between results on OSX and EC2

As a reproducible case:

python run_anomaly.py --inputFile data/artificialWithAnomaly/art_daily_flatmiddle.csv --detector cla

The results file for that command differs between the two platforms. They should be identical.

Output warning for datafiles that don't have entries in the label file

Output warning if there are datafiles without any corresponding entries in the label file. Code should not crash in this case, nor should it silently ignore those files. After giving the warning the code should just ignore that datafile.

Merge overlapping windows

If combined labels overlap, merge them into one anomaly. Justification is (i) multiple anomalies don't occur in the same window, and (ii) although anomalies close together in time may appear distinct, they're likely(?) correlated.

More user profiles needed

Currently we only have the standard profile, where all metrics (TP, FP, FN) are set to 1. There should be at least two more -- cover Type I and Type II errors.

Label combiner should handle overlapping relaxed windows

Label combiner should handle overlapping relaxed windows. If two relaxed windows overlap, they should be merged together (per discussion with Jeff).

run.py doesn't cover all detectors

Running run.py only uses the Numenta detector, but should include the Skyline and random detectors too.

Detector shouldn't get entire dataset

Currently detectors get the entire dataset so that min/max can be computed. Instead it should get min/max passed into it. We can put the min/max values in the label files themselves, as another field. Or we could have the Datafile compute it when it reads in a data file.

Need tests for LabelCombiner class

NAB email address

Create [email protected] contact, as well as mailing list.

Identify raw labels that don't make it into combined labels

For debugging the labeler, it would be helpful to print out the raw labels that do not qualify (beat the threshold) for the combined labels.

Include F1 score metric

F1 score is regarded the standard metric for evaluating algorithms. It would be nice to have nab calculate this as well.

Normalize scores

Scale/normalize the final scores such that adding files to the corpus will not (necessarily) decrease the scores, i.e. appear worse. E.g. the current best DUT may have a NAB score of -7, and then with a new datafile added to the corpus, score a -8, and appear to be performing worse.

We may also want to modify the scoring profiles such that a perfect TP detection actually yields a 1; currently it is 0.98661.

Add customer data

This is a subtask of #132
Pull some examples of customer data and add to NAB (needs to be hand labeled as well).

Adjust probationary period

Because detectors are getting FPs shortly following the probationary period, investigate a larger probationary period for the effect on scores and on overlapping anomaly windows.

Issues #77 #36 are blockers.

Labels overlapping with probationary period

As of PR #97 checkWindows() in labeler.py deletes a window which overlaps with the probationary period; code for throwing a ValueError is commented out. We should decide between throwing out the overlapping window, throwing out the datafile, or some other means of solving the issue.

Details should be added to the NAB writeup.

Scoreboard breakdown

Of the detectors posted to the scoreboard, it may be a nice feature to include a breakdown of which do best/worst for the scoring metrics TP, FP, and FN.
Similarly, we may wish to offer several scoreboards, one for each scoring profile.

NAB versioning

Implement a versioning system for NAB. This will regulate NAB updates including additions to the benchmark dataset, the scoreboard, and any code changes.

Add results

The results directory in the repo is empty. Once we have finalized scoring, this should be populated with results for the three detectors and three profiles.

Data analysis tools

Plotting the anomaly detection results is helpful for debugging, and could also be included in a "NAB Data Analysis" section of the repo, along with the data visualizer. The visualizer does give results plots, but they're not very intuitive (confusing) and not too useful (don't show the ground truth windows).

Need tests for optimizer routines

Add feature to report statistics for each datafile

Add feature to run.py to report statistics for each datafile as well as aggregates: number of records, number of anomalies, min, max, etc.

scores.csv file needs to have a bit more information

After running NAB you get a _scores.csv file with some useful information. This file should contain most (if not all) the information needed to manually calculate the score. For example, the fn column contains the total number of records. Instead it should contain the number of anomaly windows that were unlabeled because that is what the score is based on.

Add end to end tests in python

These tests should replicate the last parts of run_tests.sh. They should output everything to temporary directories and check the results. Then they should explicitly pass or fail.

Setting parameters for specific profiles

We may wish to set Numenta detector parameters specific to scoring profiles.

Scale FPs

All FPs are scored against the total, yet only one TP per window is added to the total. While we are estimating x% of a data file is anomalous (i.e. the anomaly windows), and (1-x)% of the data file is normal data, then TPs are limited to x% of the data, while FPs exist in the other (1-x)%. Thus scaling the FPs by 1/x will level out the contributions made to the total score by TPs and FPs. E.g. if x=10, then the FP weight should be multiplied by 0.10.

Also add this to the writeup.

Description for adding a detector to test

Add to the writeup instructions (flowchart, diagram) as to how a new detector can be added to NAB -- the three methods.
The readme should point to this diagram.

Modify aggregate report to add number of missed windows

Modify aggregate report to add number of windows that were missed. Currently it outputs false negatives as the total number of records, however this value is not used in the scoring function. We only count the number of missed windows.

Final writeup and labeling instructions

The wiki should have the final versions of the NAB writeup and labeling instructions docs.

Consistent use of JSON

Need to be consistent using either simplejson or JSON in the nab files. Convention is a try/except clause.

Incorporate larger windows

Larger window sizes (>10%) are preferred, and windows centered about the true anomaly. This is to reward early detection, even if humans can't see the anomaly visually, and still reward late detection b/c it's better to identify an anomaly late rather than not at all.

numenta / nab Goto Github PK

nab's Introduction

The Numenta Anomaly Benchmark (NAB)

Scoreboard

Corpus

Additional Scores

Installing NAB

Supported Platforms

Initial requirements

Download this repository

Install NAB

Pip:

Anaconda:

Usage

Run a detector on NAB

Running non-Python 3 detectors

Run full NAB

Run subset of NAB data files

nab's People

Contributors

Stargazers

Watchers

Forkers

nab's Issues

Recommend Projects

Recommend Topics

Recommend Org