This readme consists of four main parts to briefly describe the workflow of training a Recurrent Neural Network with attention layer model to classify anomaly events in a sequence based embeded software log. These four parts are Data Loading and Preprocessing, Model building, Model training, Results Analysis and Visualization. Check the python notebook for details.
- Load in all 15 .csv data files, and save as pandas dataframes.
- Groupby
class
andevent
column in the dataframe to get the occurrence count of different events under different class.
clean-01 | clean-02 | clean-03 | clean-04 | clean-05 | clean-06 | clean-07 | clean-08 | clean-09 | clean-10 | fifo-ls-01 | fifo-ls-02 | fifo-ls-sporadic | full-while | half-while | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
class | event | |||||||||||||||
COMM | MSG_ERROR | 6.0 | 6.0 | 8.0 | 6.0 | 6.0 | 6.0 | 6.0 | 6.0 | 6.0 | 6.0 | 5715 | 5589.0 | 6006.0 | 509.0 | 422.0 |
REC_MESSAGE | 17968.0 | 17969.0 | 17963.0 | 18135.0 | 18134.0 | 18147.0 | 18216.0 | 18213.0 | 18260.0 | 18347.0 | 65232 | 66973.0 | 65666.0 | 44802.0 | 45072.0 | |
REC_PULSE | 24710.0 | 24226.0 | 24173.0 | 24871.0 | 24849.0 | 24358.0 | 24390.0 | 24397.0 | 24442.0 | 24644.0 | 28312 | 28349.0 | 25631.0 | 39342.0 | 39529.0 | |
REPLY_MESSAGE | 17947.0 | 17950.0 | 17938.0 | 18098.0 | 18103.0 | 18131.0 | 18190.0 | 18180.0 | 18248.0 | 18329.0 | 59477 | 61336.0 | 59627.0 | 44202.0 | 44565.0 | |
SIGNAL | NaN | 1.0 | 2.0 | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | 2.0 | 37 | 36.0 | 39.0 | NaN | 1.0 | |
SND_MESSAGE | 18089.0 | 18077.0 | 18073.0 | 18234.0 | 18235.0 | 18247.0 | 18300.0 | 18286.0 | 18373.0 | 18447.0 | 65378 | 67122.0 | 65808.0 | 45149.0 | 45426.0 | |
SND_PULSE | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 882 | 884.0 | 943.0 | 11226.0 | 11289.0 | |
SND_PULSE_DIS | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 877 | 880.0 | 958.0 | NaN | NaN | |
SND_PULSE_EXE | 36701.0 | 48219.0 | 60157.0 | 72854.0 | 84809.0 | 96321.0 | 108339.0 | 120360.0 | 132390.0 | 144575.0 | 181312 | 193406.0 | 202078.0 | 172297.0 | 160365.0 | |
CONTROL | BUFFER | 2161.0 | 2158.0 | 2170.0 | 2224.0 | 2242.0 | 2243.0 | 2264.0 | 2278.0 | 2298.0 | 2329.0 | 4845 | 4938.0 | 4751.0 | 4326.0 | 4334.0 |
From above table, it could be seen that the clean and anomalous files are quite different based on the occurrence counts of different events. For example, normally event COMM-SND_MESSAGE
occurced around 18000 times, while in the anomalous files it occured around 45000~67000 times. This may not be seen as an effective way to detect anomalous activity, however, it can show a general picture of the data where the anomaly could be residing.
- Load the encoder and decoder model
The architecture of this model is:
input of events sequence
------>>
encoder(GRU unit)
------>>
attention layer
----->>
decoder (GRU unit)
------->>
output layer
The input is a small segment of the log file, in this case, 5 continuous events, and the target output is the next 5 continuous events following the input one. The general idea is that using this proposed NN model to train inputs and predicting the following outputs. Assuming the event sequences patterns between the clean and anomalous ones are different, then the preciting/test accuracy should be different using the same model and trained weights.
Check model.py
file for the details of encoder, attention, and decoder models.
Check anomaly_detection_NN_train.ipynb
for details.
The next step is to predict results using the above model and trained weights of each layer (saved in sumitmodel_checkpoint
folder).
The test inputs are processed using event sequence length Tx = 5
, same as the trained data, while using stride stride = 5
instead of 2.
Save all the predicted result into .npy files for further analysis use.
- Set anomaly creteria
As mentioned above, Assuming the event sequences patterns between the clean and anomalous ones are different, then the preciting/test accuracy should be different using the same model and trained weights.
In the following code, I use squence length of 1000 as one input sample, and use the above trained model to precited output, and then compare the precited output with target values to get the misclassification accuracy.
After predict outputs on all the 10 clean files, calculate the mean and variance of the misclassification accuracy. Finally, I set the criteria to be (mean + 3* standard_deviation).
Any 1000 events long sequence with misclassification rate higher than the criteria will be deemed as anomaly segment.
In this case, any misclaasification rate higher than 0.365 will be classified as anomaly event.
- Visualize anomalous events
Normal sequences
Abnormal sequences A
Abnormal sequences B
-
O. M. Ezeme, Q. H. Mahmoud and A. Azim, "DReAM: Deep Recursive Attentive Model for Anomaly Detection in Kernel Events," in IEEE Access, vol. 7, pp. 18860-18870, 2019.
-
https://www.tensorflow.org/tutorials/text/nmt_with_attention