Comments (3)
Can you give an example for what you try to achieve?
from drain3.
I am working on a log anomaly detection project based on Deep Learning.
The training set is created by parsing corpus of log sequences. Based on that training set, a model creates vocabulary - a set of known log templates.
After training phase, model moves on to anomaly detection phase. Logs are provided in an online, streaming fashion. Based on previous logs, model starts to predict which log templates are most probable to occur, which makes it possible to evaluate if incoming log (after going through Drain) should be treated as anomaly.
The point is that the vocabulary, after training phase, should have constant size. By executing add_log_template
on incoming logs, the number of clusters might increase, which could spoil anomaly detection.
A possible solution is to create a new cluster containing wildcard template (template = '<*>'), so that every log that will not be assigned to any other cluster will be assigned to the wildcard one.
from drain3.
Few options:
- Just add logs to Drain as usual, but after reaching n clusters, ignore the returned cluster-ID and use some constant ID
- After reaching n clusters, stop ingesting to Drain and build some regex-rules based on Drain templates, and use those instead.
- Modify Drain (send a PR) - add a configuration of max_clusters and in
add_log_message()
if no match for an existing cluster + cluster count reached limit, return the fallback cluster.
from drain3.
Related Issues (20)
- Delete cluster from drain dict id_to_cluster | Impact | procedure HOT 1
- Skip Masking/cluster particular tokens HOT 4
- I cannot get value vector from match HOT 1
- Previously trained messages cannot always be matched HOT 7
- `get_parameters_list` can return incorrect parameters HOT 4
- Some questions about drain_bigfile_demo HOT 3
- Can i change the state saving name and path HOT 1
- Error when running the example. HOT 1
- parallel log ingestions HOT 4
- Extra delimiters in config HOT 1
- Only mask_name * is used HOT 2
- Issue with match method in Drain class HOT 1
- specify a log file HOT 1
- Saving log template/cluster and ID for each log HOT 2
- Error parsing logs: "ZeroDivisionError: float division by zero" HOT 4
- Restrictions on matching mode HOT 2
- About parameter `full_search_strategy` in drain match method HOT 12
- Windows regular expression HOT 1
- Drain3 deprecation warning with pip install command. HOT 2
- visualize drain parse tree (feature) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drain3.