netmanaiops / donut Goto Github PK
View Code? Open in Web Editor NEWWWW 2018: Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
WWW 2018: Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
Hi, thank you so much for providing this implementation for your paper.
Could you please explain in layman's terms what exactly the test_scores mean in regards to the original timeseries input?
Hi,
I am trying to run your API, but error "TypeError: 'ellipsis' object is not iterable" occurs at
"# Read the raw data.
timestamp, values, labels = ..."
I searched and found that usually this error occurs at Python 3.5 and no more at Python 3.5.3 or above. My python is 3.5.5. Am i using the API in wrong way or other hints?
Thanks in advance.
What license is this under? Might this be patented?
Hello, I'm trying to run the unit test in test_prediction.py
. However, it doesn't seem to pass the unittest.
The enviroment of my computer is Python2.7 + Tensorflow 1.9.
Thank you very much and looking foward to your reply.
2018-07-20 08:44:39.242085: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
E.
======================================================================
ERROR: test_prediction (__main__.DonutPredictorTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_prediction.py", line 37, in test_prediction
res = pred.get_score(values=np.arange(5, dtype=np.float32))
File "/usr/local/lib/python2.7/dist-packages/donut/prediction.py", line 145, in get_score
feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
FailedPreconditionError: Error while reading resource variable donut/p_x_given_z/forward_1/std/dense_5/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/donut/p_x_given_z/forward_1/std/dense_5/kernel/N10tensorflow3VarE does not exist.
[[Node: donut/p_x_given_z/forward_1/std/dense_5/Tensordot/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](donut/p_x_given_z/forward_1/std/dense_5/kernel)]]
Caused by op u'donut/p_x_given_z/forward_1/std/dense_5/Tensordot/ReadVariableOp', defined at:
File "tests/test_prediction.py", line 88, in <module>
tf.test.main()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/test.py", line 64, in main
return _googletest.main(argv)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/googletest.py", line 100, in main
benchmark.benchmarks_main(true_main=main_wrapper)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/benchmark.py", line 344, in benchmarks_main
true_main()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/googletest.py", line 99, in main_wrapper
return app.run(main=g_main, argv=args)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/googletest.py", line 70, in g_main
return unittest_main(argv=argv)
File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__
self.runTests()
File "/usr/lib/python2.7/unittest/main.py", line 232, in runTests
self.result = testRunner.run(self.test)
File "/usr/lib/python2.7/unittest/runner.py", line 151, in run
test(result)
File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
return self.run(*args, **kwds)
File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
test(result)
File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
return self.run(*args, **kwds)
File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
test(result)
File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
return self.run(*args, **kwds)
File "/usr/lib/python2.7/unittest/case.py", line 329, in run
testMethod()
File "tests/test_prediction.py", line 37, in test_prediction
res = pred.get_score(values=np.arange(5, dtype=np.float32))
File "/usr/local/lib/python2.7/dist-packages/donut/prediction.py", line 144, in get_score
b_r = sess.run(self._get_score_without_y(),
File "/usr/local/lib/python2.7/dist-packages/donut/prediction.py", line 80, in _get_score_without_y
last_point_only=self._last_point_only
File "/usr/local/lib/python2.7/dist-packages/donut/model.py", line 198, in get_score
p_net = self.vae.model(z=q_net['z'], x=x, n_z=n_z) # notice: x=x
File "/usr/local/lib/python2.7/dist-packages/tfsnippet/utils/reuse.py", line 179, in wrapper
return method(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/auto_encoders/vae.py", line 314, in model
x_params = self.h_for_p_x(z)
File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/base.py", line 89, in __call__
return self._forward(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/container/sequential.py", line 78, in _forward
outputs = c(outputs)
File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/base.py", line 89, in __call__
return self._forward(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tfsnippet/modules/container/branch.py", line 126, in _forward
ret[k] = v(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/donut/model.py", line 68, in <lambda>
)(x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 703, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/core.py", line 910, in call
[0]])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 2898, in tensordot
b = ops.convert_to_tensor(b, name="b")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1011, in convert_to_tensor
as_ref=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1107, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 1031, in _dense_var_to_tensor
return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref) # pylint: disable=protected-access
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 982, in _dense_var_to_tensor
return self.value()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 659, in value
return self._read_variable_op()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 742, in _read_variable_op
self._dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 507, in read_variable_op
"ReadVariableOp", resource=resource, dtype=dtype, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
FailedPreconditionError (see above for traceback): Error while reading resource variable donut/p_x_given_z/forward_1/std/dense_5/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/donut/p_x_given_z/forward_1/std/dense_5/kernel/N10tensorflow3VarE does not exist.
[[Node: donut/p_x_given_z/forward_1/std/dense_5/Tensordot/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](donut/p_x_given_z/forward_1/std/dense_5/kernel)]]
----------------------------------------------------------------------
Ran 2 tests in 0.263s
FAILED (errors=1)
hi, Haowen Xu,
i am running donut with g.csv of sample_data, and i use default parameters value, but i got the fscore with 0.04. The result is too bad. But i cant understand where its wrong. Can you help me?
It's seems to be a huge number,we train model with n_z = 1
The result will be unstable with a smaller n_z in DonutPredictor?
why I am facing this
why the realization of M-ELBO loss function just (1-label)*model['x'].log_prod rather than cross-entory loss?
When calling the reconstruct
function in vae.py
it only returns a single (averaged) value, the reconstruction of the input. Is it possible to access the reconstructions before the averaging is done? I want to have the reconstruction for each sampled latent variable separately.
Looking at the reconstruct function
:
def reconstruct(self, x, n_z=None, n_x=None):
"""
Sample reconstructed `x` from :math:`p(x|h(z))`, where `z` is (are)
sampled from :math:`q(z|h(x))` using the specified observation `x`.
Args:
x: The observation `x` for :math:`q(z|h(x))`.
n_z: Number of intermediate `z` samples to take for each input `x`.
n_x: Number of reconstructed `x` samples to take for each `z`.
Returns:
StochasticTensor: The reconstructed samples `x`.
"""
with tf.name_scope('VAE.reconstruct'):
q_net = self.variational(x, n_z=n_z)
model = self.model(z=q_net['z'], n_z=n_z, n_x=n_x)
return model['x']
Could this be done from here?
hi is there any demo or more detailed tutorial to show us how to run Donut? Thanks, I am a beginner.
I have 72 test_values
but the len(test_score)
is 63(in this case there is no missing values). Why is that? thanks!
test score返回的是每个窗口的重构概率,即如果我有10个数据点,窗口长度为5,那么将返回6个test score。问题是如何把这6个test score分配到10个数据点上?
您好:
我在用基于vae的donut算法做异常检测,我想问一下,donut里现在集成了rocka聚类算法了吗?
When I try to import donut
.
I'm using virtualenv.
Python 3.11.3.
tf 2.14.0.
While this paper mainly focus on single dimension time series, are there extensions to deal with multi dimensional time series? For example apply VAE on the feature vector of each time slice.
Hi, I am interested in your project, I'd like to know is there any demo datasets which I can apply on this model ? Would you please add some datasets to this project?
Dear author,
I'm trying to run Donut for the datasets in the sample_data diretory.
I found it was quite tricky to get the training porcess converged. Would you please share the best parameter setup for these dataset? Or would please shield some light on the parameter tunning?
Best regards,
flyingkid
I am now reproducing your model and my loss(average over batches) is different from yours. I check the code, but can't figure out how you loss is calculated.
Hi Haowen Xu,
I am trying to run the Donut sample dataset cpu4.csv
I have done these following things for invoking cpu4.csv
df = pd.read_csv("sample_data/cpu4.csv")
timestamp, values, labels = df.timestamp, df.value, df.label
import tensorflow as tf
from donut import Donut
from tensorflow import keras as K
from tfsnippet.modules import Sequential
model_vs
,model
, includingwith tf.variable_scope('model') as model_vs:
model = Donut(
h_for_p_x=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
h_for_q_z=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
x_dims=120,
z_dims=5,
)
'''
Training of Donut model
'''
from donut import DonutTrainer, DonutPredictor
trainer = DonutTrainer(model=model, model_vs=model_vs)
predictor = DonutPredictor(model)
with tf.Session().as_default():
trainer.fit(train_values, train_labels, train_missing, mean, std)
test_score = predictor.get_score(test_values, test_missing)
I am not able to understand how to set input_x and input_y
I am getting this error message:
FailedPreconditionError: Error while reading resource variable model/sequential_1/forward/_1/dense_3/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/sequential_1/forward/_1/dense_3/bias)
Hi,
I tried to use Donut for an anomaly detection project. For some reasons, I separate the processes of restoring model and prediction, and problem happened while restoring model. Every time I create a Donut model and a Donut Trainer to restore a new model from saved file, there will be a 'Graph' instance left in memory with 1,400+ unknown back references, even if I have already cleared Donut and all other possible instances, and done garbage collection after that. This will lead memory keeps increasing until the process is shut down, when I called restore multiple times.
I used objgraph.show_growth() to monitor memory, and got this after completely finishing calling restore model.
These instances left in memory until process terminated. objgraph cannot output detailed graph of references, since the amount of reference could be too large. I tried to check the Donut code and did't find any suspicious part. Are there any possible reasons for this problem? Thanks.
你好 Haowen学长,我最近也在用donut算法做异常检测,现在的情况是我已经训练出来了多个模型,但是对于实时数据我需要跟据数据的类别分发至对应的模型进行检测,所以我需要让多个模型并行化,我在每个session里加载一个模型,然后在启用这个session对数据进行异常检测,但是会在最后的预测中获取predictor.get_score时报错:
我的代码如下,能否帮忙看一下是哪里出了问题~~~
g1 = tf.Graph()
g2 = tf.Graph()
sess1 = tf.Session(graph=g1)
sess2 = tf.Session(graph=g2)
with sess1.as_default():
with g1.as_default():
with tf.variable_scope('model') as model_vs:
model = Donut(
h_for_p_x=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
h_for_q_z=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
x_dims=120,
z_dims=5,
)
trainer = DonutTrainer(model=model, model_vs=model_vs)
predictor = DonutPredictor(model)
save_dir = parent_path_model + path_model
saver=VariableSaver(get_variables_as_dict(model_vs), save_dir,\
filename="cluster_" + "1" +"_data" + "_" + 'variables.dat',\
latest_file='latest_' + "cluster_" + "1" +"_data")
saver.restore()
with sess2.as_default():
with g2.as_default():
with tf.variable_scope('model') as model_vs:
model = Donut(
h_for_p_x=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
h_for_q_z=Sequential([
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
activation=tf.nn.relu),
]),
x_dims=120,
z_dims=5,
)
trainer = DonutTrainer(model=model, model_vs=model_vs)
predictor = DonutPredictor(model)
save_dir = parent_path_model + path_model
saver=VariableSaver(get_variables_as_dict(model_vs), save_dir,\
filename="cluster_" + str(2) +"_data" + "_" + 'variables.dat',\
latest_file='latest_' + "cluster_" + str(2) +"_data")
saver.restore()
with sess1.as_default():
with sess1.graph.as_default():
df = pd.read_csv("E:\\智能告警\\测试集-expand5min\\expand5min_obj_0data.csv")
tmp_timestamps = [time.mktime(time.strptime(t,"%Y-%m-%d %H:%M:%S")) for \
t in list(df['time'])]
df['timestamp'] = tmp_timestamps
df.sort_values(by='timestamp',axis=0,ascending=True)
values = [float(x) for x in list(df['data'])]
timestamps = list(df['timestamp'])
label = np.zeros_like(values, dtype=np.int32)
#缺失值补充
timestamps,missing,(interp_data,label) = linear_interpolation(\
timestamps,(values,label),mode = False)
#标准化至均值0、标准差1
std_data,mean,std = standardize_obj(interp_data)
scores = predictor.get_score(std_data,missing)
from donut import DonutTrainer, DonutPredictor
from construct import model, model_vs
from prepare_data import train_values, train_labels, train_missing, mean, std, test_missing, test_values, values, missing, labels
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
trainer = DonutTrainer(model=model, model_vs=model_vs)
predictor = DonutPredictor(model)
with tf.Session().as_default():
trainer.fit(train_values, train_labels, train_missing, mean, std)
test_score = predictor.get_score(test_values, test_missing))
print(test_score)
Hello,
Would it be complicated to add the possibility of using Donut's abilities in a higher dimensional space ?
In theory I do not see a strong reason why, but I wondered if you think that it is feasible, and if so, if you have an idea how.
Hello I am trying to interpret the severity of anomalies using the sample data cpu4.csv and following:
take the negative of the score, if you want something to directly indicate the severity of anomaly.
test_score.size = 5151
test_values.size = 5270
I noticed that the size of test_values doesn't equal test_score. How can I correlate my test data to the score?
Thank you kindly!
您好
我在用基于vae的donut算法做异常检测,但是这个模型只能对于每一个曲线训练一个模型,而我有很多曲线需要训练,这样我就需要训练许多模型。我能写一个循环对于每一条曲线训练一个模型吗
我如何将提前训练好的模型加载直接用来预测,而不用再训练
In the paper: 《Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications》
It says that at $4 "We obtain 18 well-maintained business KPIs ... from a large Internet company."
What's that ?
I hope I could know which dataset is used in this paper . Thank you very much.
Idk what is this
Dear haowen-xu,
"All the algorithms evaluated in this paper compute one anomaly score for each point. A threshold can be chosen to do the decision: if the score for a point is greater than the threshold, an alert should be triggered."
"We may also enumerate all thresholds, obtaining all F-scores, and use the best F-score as the metric."
Finally, I get the test score, so, my question is how to choose thresholds?
thanks!!!
Hi, how can I set or change the sliding window length? thank you.
demo中的predict部分如果单独运行无法成功,必须接在train过程之后,单独预测时model_vs是缺失的,请问如何加载
When importing from the donut package, it gets this error while importing the tfsnippet.utils.
dear author,
I'm trying to run Donut with the sample_data "cpu4.csv", and the training losses of 256 epoches are minus which range from -68 to -75. I couldn't find out the primary cause of this phenomenon, could you help me?
如题,看了好久的论文和解读,然后分析了源码也没搞明白,请解答下。
Hi Haowen,
My data set has continuous points , where each point repesent a day, and not minutes that you have shown in the paper/ sample_data.
I had provided a list of 240 points with a window size of 120 for final evaluation after training for 1000 points. After calling the DonutPredictor.get_score function on these set of points, I am getting the final list of 121 scores - where all scores are negative numbers. How do I interpret the anomaly part here?
You had mentioned in the codes :
The larger reconstruction probability
, the less likely a point
is anomaly. You may take the negative of the score, if you want
something to directly indicate the severity of anomaly.
assume there are only 2 scores : -2.3, -0.5,
So please help me interpret the results
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.