Giter Site home page Giter Site logo

stream-ad / midas Goto Github PK

View Code? Open in Web Editor NEW
750.0 750.0 92.0 30.79 MB

Anomaly Detection on Dynamic (time-evolving) Graphs in Real-time and Streaming manner. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.

License: Apache License 2.0

C++ 83.26% Python 12.05% CMake 3.77% Dockerfile 0.92%
aaai2020 anomaly-detection denial-of-service fraud-detection intrusion-detection

midas's Issues

Any recommendation to normalize score?

Hi,

Thank you for implementing this wonderful AD method!

I've read through your paper and the score is calculated as
image

We usually use Unix timestamp to represent time, therefore the score we get is usually very large. Do you have any recommendations to narrow the value range?

Thank you!

Tagged Releases

Hey, it'd be great to add MIDAS to Homebrew so Mac users can do brew install midas. However, this requires tagged versions. What do you think of tagging releases on GitHub?

Ruby Library

Hey, thanks for this project and research! Just wanted to let you know there are now Ruby bindings for it. If you have any feedback, let me know or feel free to create an issue on the project.

How to decide whther edge is anomalous ?

In the Algorithm, how (on what basis ) you are deciding whether an edge is anomalous or not, given the anomaly score?
(I've read the paper but couldn't find it )

Segmentation fault: 11

Hello, I am currently trying to use MIDAS-R on a dataset however I have this error right after running it:

$ ./midas -i ../Wednesday-14-02-2018_4GRAPH.csv -o ../scores.txt
Finished Loading Data from ../Wednesday-14-02-2018_4GRAPH.csv
Segmentation fault: 11

Here is a sample of Wednesday-14-02-2018_4GRAPH.csv file:

source,destination,time
1451698946054,901943132206,352877
1451698946054,901943132206,628353
1451698946054,901943132206,973076
1451698946054,901943132206,980110
1451698946054,901943132206,981852
103079215137,1460288880642,1518566400
1322849927169,1047972020228,1518566400
1322849927169,1047972020228,1518566400
1322849927169,1047972020228,1518566400
687194767395,1640677507073,1518566400
1236950581249,1700807049228,1518566400
1322849927169,1047972020228,1518566400
1700807049228,712964571136,1518566400
1322849927169,1047972020228,1518566400
1632087572482,1477468749825,1518566400
1597727834115,94489280524,1518566400
1236950581249,979252543497,1518566400
1580547964930,979252543497,1518566400
1322849927169,1047972020228,1518566400
1116691496960,1047972020228,1518566401
1374389534736,163208757249,1518566401
1116691496960,1047972020228,1518566401
1520418422807,575525617668,1518566401

What is wrong?

Thanks

ground truth labels for TwitterworldCup2014 dataset

I want to run MIDAS on the TwitterWorldCup2014 dataset,
but in the given dataset, the ground truth does not include the label as 0 or 1,
instead, it shows the following

1 | Arena de Sao Paulo, Sao Paulo, Brazil | Brazil, Croatia | Marcelo | Own Goal | 6-12-2014 20:11:00 | High importance events.

please suggest, how to generate labels as 0 or 1 i.e anomalous or not.
Have you already prepared ground truth labels for this, if yes could you please share that?

Here in this dataset , there are three events such as

  1. goal
    2.penalty
    3, Injury.
    what could be the anomaly in these events.

Thanks.

Why source and dest must be int?

Hello,

I was wondering why do we need to consider source and dest are int and not strings. Indeed, it would make more sense (to me) because usually, source and dest are IP addresses.
Thanks

SyntaxError : print(f"ROC-AUC{indexRun} = {auc:.4f}")

When I run the Demo.py, I got the following error which I coulnt resolve after trying much. Why is that so? ( I dont think it is a syntax error also I dont find such syntax as well ) :-
Seed = 1606470101 // In case of reproduction #Records = 4554344 // Dataset is loaded Time = 826ms // Algorithm is finished // Raw anomaly scores are exported to // /home/rohit/MIDAS/MIDAS/temp/Score.txt File "/home/rohit/MIDAS/MIDAS/util/EvaluateScore.py", line 33 print(f"ROC-AUC{indexRun} = {auc:.4f}") ^ SyntaxError: invalid syntax
although output result is there in Score.txt

Unclear Docker volume binds for Demo

When running the Demo code on Docker, it took me a while before noticing that I needed to bind both $PWD/data and $PWD/temp (if I want the raw scores) when running the container. I would suggest adding a section to the README about executing the Demo on Docker and include something like the following snippet:

docker run -it \
	--rm \
	--name midas \
	--volume $PWD/data:/MIDAS/data \
	--volume $PWD/temp:/MIDAS/temp \
	midas

Any thoughts?

Threshold Used For Experimental Results

Hi there, I was attempting to replicate your results on the Darpa dataset, but realized you didn't specify the threshold you used. I understand the threshold is user defined, but would like to know what value was used in the experimental setup. Could you please clarify how you calculate the MIDAS(R) ROC and what threshold you used?

Thanks!

Should either Dockerize or better specify dependencies

I'm running Ubuntu 18.04 and so created the following initial Dockerfile to get around the cmake version requirements that prevent my following the steps listed in the Demo section of the README:

FROM ubuntu:20.04

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update \
    && apt-get install --yes \
      build-essential \
      cmake \
      python-is-python3 \
    && apt-get clean \
    && rm --recursive --force \
      /var/lib/apt/lists/* \
      /tmp/* \
      /var/tmp/*

RUN mkdir /src
WORKDIR /src

COPY CMakeLists.txt ./
RUN mkdir --parents build/release \
    && cp CMakeLists.txt build/release/

COPY example ./example
COPY src ./src
COPY temp ./temp
COPY util ./util

RUN cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release \
    && cmake --build build/release --target Demo

I then build it via

# Wouldn't need to use `sudo` on macOS
sudo docker build . --tag midas

and run the compile Demo app via

sudo docker run \
  --tty \
  --interactive \
  --rm \
  --volume $PWD/data:/src/data \
  midas \
  build/release/Demo

which, when shelling out to the Python scripts, aborts with the following

Traceback (most recent call last):
  File "/src/util/EvaluateScore.py", line 20, in <module>
    from pandas import read_csv
ModuleNotFoundError: No module named 'pandas'

since pandas is not available.


To better avoid the need for local environment debugging, my personal preference would be for a known-working Dockerfile.

Implement question: Should I fill in for the absent data?

Hi,
Thank you for implement this amazing anomaly detection method!
In the implementation, I'm wondering if I should fill in for the absent data,
for example, if the directional IP pair A to B appears at 10:00, but is absent at 11:00 and 12:00.
Should I fill A to B count 0 in 11:00 and 12:00?

image

Thank you

1.0 Changes

Hey, I tried to summarize the changes with 1.0 I encountered while upgrading the Ruby gem. It may be worth adding some version to the readme to make it easier for others to upgrade. Demo.cpp was really helpful. Assuming src, dst, and times are std::vector<int>:

Version 0.1.0

#include <anom.hpp>

vector<double>* result;
result = midasR(src, dst, times, num_rows, num_buckets, factor);

Version 1.0.0

#include <RelationalCore.hpp>

size_t n = src.size();
std::vector<float> result;
result.reserve(n);

MIDAS::RelationalCore midas(num_rows, num_buckets, factor);
for (size_t i = 0; i < n; i++) {
  result.push_back(midas(src[i], dst[i], times[i]));
}

Use NormalCore for the no relations version.

Other changes:

  • the midas function takes float input and returns a float score (previously took int input and returned a double score)
  • factor is now a float instead of a double
  • there's a new FilteringCore

Production implementation

Hi first off this is really cool, Im a novice coder and for research I would like to implement this on Netflow data in real time, the only thing is Im unsure how this can be integrated into a live environment and not on some local dataset, but maybe its a dumb question, but how should or could this be implemented?

Go package

Hey @bhatiasiddharth, very nice project and research.

Just letting you know that there is now implementation of MIDAS in golang.

If you have any feedback, let me know or feel free to create an issue on the project.

I've bench-marked the AUC.py against this project using the darpa dataset and it's similar. :)

how to detect anomaly edges

hello, I have a question.
The output is anomaly score of edges,
but how to detect which edge is anomaly

And how to define the threashod of anomaly score

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.