marknzed / artimenab.jl Goto Github PK
View Code? Open in Web Editor NEWARTime detector for the Numenta Anomaly Benchmark
License: GNU Affero General Public License v3.0
ARTime detector for the Numenta Anomaly Benchmark
License: GNU Affero General Public License v3.0
Hello is there a version of ARTime that can be used outside of NAB. I am looking for online structural break detection algorithm and not necessarily use NAB
Thanks
Hi @markNZed
Firstly, congratulations on taking 1st place on the NAB scoreboard, that is quite an achievement 🎉 and great contribution.
It is quite amazing that julia code can run directly in Python, a fine testament to a community effort.
I have a few questions that perhaps you could answer for me.
The implementation in here or in https://github.com/markNZed/NAB/tree/ARTimeNAB more specifically, is aimed at running through the dataset in an iterative manner (as per NAB) to score each data point. Is it possible to process the data set in large batches? For example, could one process 90% of data in one shot for training/learning, not being concerned with anomaly scores (p) in this phase and then iterate the last 10% of the data set as per the ARTimeNAB method and determine anomaly scores (p).
If so, would that be quicker than the iteration method?
I did a test passing a values
list to jl.ARTime
rather than a single value and it returned an object with all the expected data, just as if it was one value and then iterated the final part of the data and did not get the expected result (an anomaly which is present in the iterative method), so that method I tried does not work, so I am wondering if there is a way to do it that will work.
Hi @markNZed
When I tried to merge ARtime into the NAB library I found that it would not work and produced some error messages》
The output from the IDE is:
The error in the code is reported as:
Does the above error report have anything to do with the python version? Or am I missing something
After I added some printing to the above code, the output of the IDE was:
Hi, I have a quick question about how to decide the threshold for the predicted anomaly. I see there is always a pre-defined threshold in thresholds.json for NAB. How do you decide the threshold in ARTime? Do you find it with the ROC curve or something? What do you think is a good threshold strategy (e.g., a dynamic threshold) for online settings?
Hi @markNZed
Congrats on taking the top spot in NAB! Your contribution pointed my interest towards this entirely new (to me) area of neuroscience, for which you have my sincerest thanks.
I took ARTime for a spin, and wanted to see how it fares in a streaming scenario (i.e. ~infinite series), but I noticed something which slightly worries me:
It seems that the internal DVFA structures are growing without any limits.
Please take a look at this snippet:
julia> using ARTime, Random, Distributions
julia> Random.seed!(123)
julia> p = ARTime.P(); ARTime.init(-2,2,210000,p)
julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2] + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
0 #Size before any processing
# Let's say we have a slightly noise sine wave
julia> for x in range(0, 200π, length=10000)
y = sin(x) + 0.1 * randn()
ARTime.process_sample!(y, p)
end
julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2] + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
2788 # Internal struct size after 10K points
# Now there's a longer period where the noise more pronounced
julia> for x in range(0, 2000π, length=100000)
y = sin(x) + 0.2 * randn()
ARTime.process_sample!(y, p)
end
julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2] + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
27336 # Internal struct size after 110K points
# Noise-levels are down, but the frequency of sine wave has changed
julia> for x in range(0, 200π, length=100000)
y = sin(x) + 0.1 * randn()
ARTime.process_sample!(y, p)
end
julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2] + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
237422 # Internal struct size after 210K points
julia> p.cs.art.n_categories
6983
julia> p.cs.art.n_clusters
739
(I'm aware that there are more internal state variables than W, M, Me, but they grow at similar pace so I omitted them here)
I know that this example is a bit nasty, but this is just to illustrate something that I also see on my real data i.e., that with enough time, the ARTime process will eventually run out of memory and crash (which is not the case for e.g., HTM). It seems that the DVFA never ceases to create new clusters and categories.
Is this an intentional behavior (or maybe some sort of optimization for NAB)?
Is there any way to limit the memory usage (or e.g., somehow compact the current state) without forgetting catastrophically (i.e. full state reset)?
I would like the algorithm to keep on adapting to the stream (rather than use learned state) - but it seems to have infinite appetite for memory.
I'm really impressed by this method and am going to use it as a compared method in my experiment.
Can I interpret it as an advanced version of the Hierarchical Temporal Memory method? Also, is this method an unsupervised and online learning method?
Sorry to bother you, I got an error after run python run.py -d ARTime --detect --optimize --score --normalize --skipConfirmation
.
ERROR: LoadError: setfield!: const field .name of type TypeName cannot be changed Stacktrace: [1] setproperty!(x::Core.TypeName, f::Symbol, v::Symbol) @ Base ./Base.jl:39 [2] top-level scope @ ~/.julia/packages/RedefStructs/JMYNd/src/RedefStructs.jl:138 [3] include @ ./Base.jl:419 [inlined] [4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing) @ Base ./loading.jl:1554 [5] top-level scope @ stdin:1
Hi, I really like your method and want to use it as a compared method. Currently, I'm running the ARTime under NAB. I think I have set everything correctly and tried to run the following command:
python run.py -d ARTime --detect --optimize --score --normalize --windowsFile labels/combined_windows_new.json
However, I encounter the following error, which I have spent a lot of time on it and don't know how to fix it. Hope you can provide some guidance.
Running detection step
0: Beginning detection with ARTime for realAWSCloudwatch/ec2_cpu_utilization_77c1ca.csv
2: Beginning detection with ARTime for realAWSCloudwatch/ec2_network_in_5abac7.csv
1: Beginning detection with ARTime for realAWSCloudwatch/ec2_disk_write_bytes_1ef3de.csv
signal (11): Segmentation fault
in expression starting at none:0
Allocations: 2494562 (Pool: 2493386; Big: 1176); GC: 2
Segmentation fault (core dumped)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.