Comments (4)
@Impelon Many thank you for reporting this with a detailed report and for your pull request.
I am not able to accept your PR #50 yet, as it seems to fail on the examples you provided yourself.
Instead, as a temporary solution, I attempted to improve the current code and now I believe it is able to handle the cases you provided. Please see PR #51 and the test cases I added and tell me what you think.
Basically, I changed the mask matching from non-greedy to greedy. I am aware that it is not a full solution, as some counterexamples in which non-greedy matching can be presented. However, I believe that it's better suited for the common case.
BTW I tried to use the code from your PR and the first 2 test cases from test_get_param_list_direct()
failed.
I think that it is not possible to provide a full solution for this without having Drain3 use the actual masking instructions when matching.
For example:
template = "<float>.<*>.<float>"
content = "0.15.Test.0.2"
params = template_miner.get_parameter_list(template, content)
expected_params = ["0.15", "Test", "0.2"]
self.assertListEqual(params, expected_params)
Unless Drain3 knows what is float
, it cannot extract parameters correctly.
I agree with the two long-term solution approaches you mentioned yourself.
I think that the second one (extracting parameters while mining) is better as it should be more efficient not having to match regexes twice. If you can contribute either of those it would be extremely welcome.
For the first approach, the issue of non-unique mapping from mask name e.g. <NUM>
to a regex can be resolved using an or ('
) operator in the mask matching regex, and the user may use unique mask names if he/she wants to avoid that.
from drain3.
@davidohana Thanks as always for the quick response!
Basically, I changed the mask matching from non-greedy to greedy. I am aware that it is not a full solution, as some counterexamples in which non-greedy matching can be presented. However, I believe that it's better suited for the common case.
I believe you are right in the sense that the solution in #51 may be better suited in the most common cases.
Unfortunately for my application I need to introduce quite a few complex masking-patterns, some of which do contain spaces.
I believe that will not work with the change from #51.
I think that it is not possible to provide a full solution for this without having Drain3 use the actual masking instructions when matching.
[...]
Unless Drain3 knows what isfloat
, it cannot extract parameters correctly.
I agree, indeed this is what my proposal in #50 tries to do.
I've included your tests from #51 and also added the MaskingInstruction
-objects required for the new method to work.
BTW I tried to use the code from your PR and the first 2 test cases from
test_get_param_list_direct()
failed.
You are right; even with the correct MaskingInstruction
-objects added, the method from 879593d failed to extract the correct parameters, because the temporary masks added did interfere with other masking-patterns.
Edit: I've run into multiple problems using the approach from #50 and decided to scrap it after all.
With your tests and feedback I've now been able to improve upon #50.
The method in 26399e4 is able to pass all new (and old) testcases with valid templates.
Please see my comments in #50 for more information on what changed.
For the first approach, the issue of non-unique mapping from mask name e.g.
<NUM>
to a regex can be resolved using an or (|
) operator in the mask matching regex, and the user may use unique mask names if he/she wants to avoid that.
I think this is also a good idea and I believe it to be a good alternative to the proposed changes in #50.
#50 or This would be partial solutions for the problem that will work in most cases.
I agree with you that the best long-term solution is to extract parameters while mining.
from drain3.
@davidohana I appologize for the many revisions of #50.
The idea behind #50 was too fragile and led to confusing code.
I actually found another situation in which get_parameters_list
performs poorly:
>>> parser.get_parameter_list("<memory:8>", "<memory:<number>>")
[]
I came to the conclusion that your proposed solution to use |
to join multiple patterns with the same mask will be an easier and more elegant solution.
I've implemented this solution in #52, added the tests from #51 and a few new tests.
The great news is that this solution handles all test-cases and problems I found so far!
from drain3.
#52 merged, will be included in 0.9.9 release of Drain3.
from drain3.
Related Issues (20)
- What is the difference between drain3 with logstash in elasticsearch ?
- visualize drain parse tree (feature) HOT 1
- Hi, I've been trying to use drain for running log anomaly detection on some logs.
- Log Matching on new data HOT 2
- Chinese and English hybrid log template mining HOT 5
- Some DRAIN templates with <*> do not have parameters extracted HOT 7
- PermissionError when running with Persistance
- Is it possible to freeze templates when trainning? HOT 2
- Add a py.typed marker file
- `extra_delimiters` does not account for prefixed/suffixed delimiters
- Drain3 in golang HOT 2
- Masking Prefix and Suffix should not be escaped HOT 1
- A interesting issues. HOT 1
- big_file demo result's first cluster content is empty
- masking question,if i want to output the masking such as real info of the date,how can i putput
- Avoid creating many redis connections when you want to have seperate buckets of templates. HOT 2
- [Question] HOT 2
- one question, how to do Incremental learning in drain3 training?
- one question, how to do Incremental learning in drain3 training? HOT 6
- Release a new https://pypi.org/ version to update dependencies HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drain3.