wxdang / mscred Goto Github PK
View Code? Open in Web Editor NEWtensorflow implement the paper A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data
tensorflow implement the paper A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data
1 It is an implement of the following paper by tensorflow: A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. 2 How to use: First, run the file generation_signature_matrice.py to generate signature matrice. Second, run the file convlstm.py to train and test the model. Finally, run the file evalution.py to evalute the result. 3 The demo code provide by the author: https://github.com/7fantasysz/MSCRED.
From what I see in the data of the repository and the repository of the author the data have the following shape :
(number of time series * lenght of time series)
However a multivatiate time serie should have the following shape :
(number of time series * lenght of time series * number of features)
Is it me who have missed something or a problem in the data?
Can you add a license to this repo?
Some thought about a license:
Stack Excahnge
Chose a License
For line 44 at generation_signature_matrice.py
for t in range(win, self.signature_matrices_number):
This does not make sense since your are just looping the first 2000 data. Is the result correct if you lose so much information?
gap time between each segment is set to 10, i.e. the (start) timestamps between two adjacent signature matrices should be 10,
but when you generate each of the signature matrix, the gap time is 1
in generation_signature_matrice.py:
for t in range(win, self.signature_matrices_number):
raw_data_t = raw_data[:, t - win:t]
signature_matrices[t] = np.dot(raw_data_t, raw_data_t.T) / win
return signature_matrices
i think when computing raw_data_t, we should consider that
e.g.
for t in range(signature_matrices_number):
raw_data_t = raw_data[:, t*gap_time:(t*gap_time + window_size)]
signature_matrices[t] = np.dot(raw_data_t, raw_data_t.T) / window_size
return signature_matrices
I suppose your code forget to add [0] into the np.where.
"num_anom = len(np.where(error > util.threhold)) " ,this will lead to all valid_anomaly_score being 2. Maybe you should modify it with num_anom = len(np.where(error > util.threhold)[0])
Hi,
Firstly very nice paper and I quite like the idea, really like to apply it to other areas. Thus Im hoping that you guys are still monitoring this repo and open for discussion.
Although I have a pretty straight forward question about the way the model was used and trained:
In Figure-2 in the paper and in the code, more specifically:
loss = tf.reduce_mean(tf.square(data_input[-1] - deconv_out))
It seems that you guys are using the tensor in the last time step of the input as the model's output tensor?
Maybe I have missed something obvious, but doesn't that simply imply that the inputs contains complete information of the output, i.e. the model can directly "see" the output in the input? Which means that by "selecting the last tensor in input" (like by setting weights for those input images to 1 and rest 0), we get a perfect estimator?
So my point is when reconstruct something shouldnt the input contain a very lossy or at least an "incomplete info version of the output", instead of containing complete information of what its suppose to reconstruct?
Im doing experiments with random walks on my own implementation of the network, and by using the last step of input as the model's output, I was still able to get very small losses ("reconstructed perfectly"). So Im suspecting that its exactly what the model is doing, i.e. by selecting one step of input as output.
In that case, my guess about why it still worked is, by "half training the model" the trainer were able to adjust the weights for the most common sample patterns, but the learning rate is not fast enough to make the model an simple "input selecting model" yet. However if you would have let the model training to converge, then this ability would be lost since the model will end up "selecting" input from inputs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.