Giter Site home page Giter Site logo

Comments (4)

davidohana avatar davidohana commented on August 22, 2024

@Impelon Many thank you for reporting this with a detailed report and for your pull request.
I am not able to accept your PR #50 yet, as it seems to fail on the examples you provided yourself.
Instead, as a temporary solution, I attempted to improve the current code and now I believe it is able to handle the cases you provided. Please see PR #51 and the test cases I added and tell me what you think.

Basically, I changed the mask matching from non-greedy to greedy. I am aware that it is not a full solution, as some counterexamples in which non-greedy matching can be presented. However, I believe that it's better suited for the common case.

BTW I tried to use the code from your PR and the first 2 test cases from test_get_param_list_direct() failed.

I think that it is not possible to provide a full solution for this without having Drain3 use the actual masking instructions when matching.

For example:

template = "<float>.<*>.<float>"
content = "0.15.Test.0.2"
params = template_miner.get_parameter_list(template, content)
expected_params = ["0.15", "Test", "0.2"]
self.assertListEqual(params, expected_params)

Unless Drain3 knows what is float, it cannot extract parameters correctly.

I agree with the two long-term solution approaches you mentioned yourself.
I think that the second one (extracting parameters while mining) is better as it should be more efficient not having to match regexes twice. If you can contribute either of those it would be extremely welcome.
For the first approach, the issue of non-unique mapping from mask name e.g. <NUM> to a regex can be resolved using an or (') operator in the mask matching regex, and the user may use unique mask names if he/she wants to avoid that.

from drain3.

Impelon avatar Impelon commented on August 22, 2024

@davidohana Thanks as always for the quick response!

Basically, I changed the mask matching from non-greedy to greedy. I am aware that it is not a full solution, as some counterexamples in which non-greedy matching can be presented. However, I believe that it's better suited for the common case.

I believe you are right in the sense that the solution in #51 may be better suited in the most common cases.
Unfortunately for my application I need to introduce quite a few complex masking-patterns, some of which do contain spaces.
I believe that will not work with the change from #51.

I think that it is not possible to provide a full solution for this without having Drain3 use the actual masking instructions when matching.
[...]
Unless Drain3 knows what is float, it cannot extract parameters correctly.

I agree, indeed this is what my proposal in #50 tries to do.
I've included your tests from #51 and also added the MaskingInstruction-objects required for the new method to work.

BTW I tried to use the code from your PR and the first 2 test cases from test_get_param_list_direct() failed.

You are right; even with the correct MaskingInstruction-objects added, the method from 879593d failed to extract the correct parameters, because the temporary masks added did interfere with other masking-patterns.

Edit: I've run into multiple problems using the approach from #50 and decided to scrap it after all.
With your tests and feedback I've now been able to improve upon #50.
The method in 26399e4 is able to pass all new (and old) testcases with valid templates.
Please see my comments in #50 for more information on what changed.

For the first approach, the issue of non-unique mapping from mask name e.g. <NUM> to a regex can be resolved using an or (|) operator in the mask matching regex, and the user may use unique mask names if he/she wants to avoid that.

I think this is also a good idea and I believe it to be a good alternative to the proposed changes in #50.
#50 or This would be partial solutions for the problem that will work in most cases.
I agree with you that the best long-term solution is to extract parameters while mining.

from drain3.

Impelon avatar Impelon commented on August 22, 2024

@davidohana I appologize for the many revisions of #50.
The idea behind #50 was too fragile and led to confusing code.

I actually found another situation in which get_parameters_list performs poorly:

>>> parser.get_parameter_list("<memory:8>", "<memory:<number>>")
[]

I came to the conclusion that your proposed solution to use | to join multiple patterns with the same mask will be an easier and more elegant solution.
I've implemented this solution in #52, added the tests from #51 and a few new tests.
The great news is that this solution handles all test-cases and problems I found so far!

from drain3.

davidohana avatar davidohana commented on August 22, 2024

#52 merged, will be included in 0.9.9 release of Drain3.

from drain3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.