Giter Site home page Giter Site logo

java-grok's Introduction

thekrakken

Web site

More comming next

java-grok's People

Contributors

anthonycorbacho avatar fbacchella avatar hayatbehlim avatar joschi avatar keitaf avatar leemoonsoo avatar libujacob avatar m-rogers avatar manuelprinz avatar msathaia avatar noord avatar ottobackwards avatar palmerabollo avatar paulwellnerbou avatar retoo avatar ruanwenjun avatar samuelbucheliz avatar sergueik avatar sherzberg avatar trixpan avatar wouterdb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

java-grok's Issues

Issue with HTTPDATE pattern.

Hi ,
I came ac-cross using the HTTPDATE pattern for one of the log parsing and im getting the following error. Seems like java-grok is not accepting that pattern at all. Can you add a fix for this? The error which im referring to is

Caused by: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 1488 (?<name0>(?:(?<name1>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))|(?<name2>(?:(?<name3>((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|(?<name4>(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])))))) - (?<name5>.*?) \\[(?<name6>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b)/(?:(?>\d\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+))))\\] \"-\" (?<name15>(?:(?<name16>(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))) -

And the original pattern which i used is

"%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \\\\[%{HTTPDATE:[apache2][access][time]}\\\\] \\\"-\\\" %{NUMBER:[apache2][access][response_code]} -"

[question] Match a log against two grok expressions

Imagine you want to match a log against two grok expressions, to see if the log matches any of the grok expressions.

Is that possible? The following code doesn't work:

        Grok g = new Grok();
        g.addPattern("DATA", ".*?");
        g.addPattern("NONNEGINT", "\b(?:[0-9]+)\b");
        g.compile("%{DATA}");
        g.compile("%{NONNEGINT}");
        Match m = g.match("hello");
        m.captures();
        System.out.println(m.isNull()); // true

But it works with a single call to addPattern:

        Grok g = new Grok();
        g.addPattern("DATA", ".*?");
        g.compile("%{DATA}");
        Match m = g.match("hello");
        m.captures();
        System.out.println(m.isNull()); // false

I think I'm missing something.
Thanks

Matched json is blank in case any of one regex out multiple does not match

java grok is return matched json as blank when one regex out of multple does not match.
Example
grok = Grok.create("patterns/patterns");
grok.compile("%{NUMBER:hits} %{USER:word}");
String s = "234\n";
Match gm = grok.match(s);
gm.captures();
System.out.println(gm.toJson());

in this case only %{NUMBER:hits} matches and not %{USER:word}. I expect java grok to return json {"hits":234} but it is returning {}.

Broken test

Commit e6cbed4 introduced ApacheDataTypeTest. The test seems to be broken, I get a build failure for every commit after (and including) this one (git bisect). I have not looked deeper into the issue yet.

Best,
Manuel

Match is a singleton, which creates very weird behavior.

Hi,

When I have multiple instances of grok, or even just use grok in a loop grok behaves very odd, because of Match being a singleton.

For example, If I use the same grok pattern twice and grok matches the first time, but not the second time, then the second time returns a match nontheless.

in the code (Grok.java: 250-260)

Match match = Match.getInstance();
if (m.find()) {
match.setSubject(text);
match.setGrok(this);
match.setMatch(m);
match.setStart(m.start(0));
match.setEnd(m.end(0));
}
return match;

So, the first time m.find is true and the singleton is filled with values
the second time m.find is false, so, even tough there is no match it remains filled with values

I can see no reason why Match should be a singleton and propose that it becomes a normal class.

I can submit a patch is required.

GROK Multiline log parsing

I am trying to parse multiline logs using GROK.. but the result omitting new line. Example code below.

String log = "a|b|c|d"+"\n"+"e";
Pattern = (?m)(?<ErrMsg>.*)

Output is = ErrMsg = a|b|c|d

Any help would be appreicated!!!

Grok compiler

We are using this library to parse messages in our spark job, as grok comipler is not seilizable we end up creating new instance each for processing each message. can it be possible to mark implement serilizable for grok compiler class?

NamedRegexCollection with namedOnly

Is there a way to get NamedRegexCollection with namedOnly?

Can this code works if I modify like this ?

if (namedOnly && group.get("subname") == null) {
    replacement = String.format("(?:%s)", definitionOfPattern);
    namedRegexCollection.put("name" + index, group.get("name"));
}
namedRegex =
    StringUtils.replace(namedRegex, "%{" + group.get("name") + "}", replacement,1);

Feature request: Maintain order for captures?

I have a rare requirement to obtain the captures in strict order. This can be easily accomplished by replacing:

capture = new HashMap<>();

With

capture = new LinkedHashMap<>();

in Match.java

Kindly consider including this in the next release.

Discovery feature doesn't work properly

I tried to use discovery feature, I am not sure that I used it in the correct way, but the method always returned the exact input string. I'll be thankful if you guide how I should use it. Please consider a simple test in the following, the out variable will be 1234 in this test.

    @Test
    public void dtest() {
        try {
            Grok g = Grok.create(ResourceManager.PATTERNS);
            Discovery disc = new Discovery(g);

            final String out = disc.discover("1234");
            System.out.println("discovered type=" + out);

        } catch (GrokException ex) {
            Logger.getLogger(DiscoveryTest.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

Grok is truncating data after new line

Hi I have a log like this
17/11/28 08:49:32 ERROR ApplicationMaster: User class threw exception: java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)....

on running a test code using pattern (?(ERROR))(?(?m:.*))

Output is only 17/11/28 08:49:32 ERROR ApplicationMaster: User class threw exception: java.lang.reflect.InvocationTargetException

Expected output is complete log from ERROR.

Please help me on this.

Changing Grok.compile() boolean namedOnly changed matchin pattern

I tried the following code:

@Test
public void TestBug() throws ProcessorException, GrokException {
    Grok grok = new Grok();
    grok.addPattern("BOM", "\\xEF\\xBB\\xBF");
    grok.addPattern("GREEDYDATA", ".*");
    grok.addPattern("LINE", "(%{BOM}?%{GREEDYDATA:message})?");
    grok.compile("%{LINE}", false);
    Match gm = grok.match("themessage");
    gm.captures();
    System.out.println(gm.toMap());
    System.out.println("    " + gm.getMatch().pattern());
}

And get what I expected:

{BOM=null, LINE=themessage, message=themessage}
    (?<name0>((?<name1>\xEF\xBB\xBF)?(?<name2>.*))?)

But if I switch the line

    grok.compile("%{LINE}", false);

to

    grok.compile("%{LINE}", true);

The matching failed, and I get:

{message=null}
   (\xEF\xBB\xBF?(?<name2>.*))?

The matching changed, but I just changed the group wanted. But Look at the regex generated, it goes from (?<name1>\xEF\xBB\xBF)? to \xEF\xBB\xBF?. In the first case, the whole word \xEF\xBB\xBF is optional. In the second case, only the F is. Changing the value of namedOnly in compile should not changed the returned values. The pattern in the second case should be
(?:((?:\xEF\xBB\xBF)?(?<name2>.*))?). The unwanted names should be replaced by ?:, not just dropping the grouping.

Mach class bug

I'd like to porpose simple code fix:

/**

remove from the string the quote and double quote. *
@param value string to pure: "my/text"

@return unquoted string: my/text
*/
private String cleanString(String value) {
if (value == null) {
return null;
}
if (value.isEmpty()) {
return value;
}
char[] tmp = value.toCharArray();
if ((tmp[0] == '"' && tmp[value.length() - 1] == '"')
|| (tmp[0] == '\'' && tmp[value.length() - 1] == '\'')) {

if (value.length() != 1) {
  value = value.substring(1, value.length() - 1);
} else {
  value = "";
}

}
return value;
}

Now, method throws string out of boundary exception in case of string with single quote sinse like ' or "

Grok is slow

I'm using Code Tools: jmh to bench grok against java's regex.

The result for the following simple code:

    private io.thekraken.grok.api.Grok grok;
    private Pattern syslog;

    @Setup
    public void prepare() throws GrokException {
        grok = new io.thekraken.grok.api.Grok();
        grok.addPattern("NONNEGINT", "\\b(?:[0-9]+)\\b");
        grok.addPattern("GREEDYDATA", ".*");
        grok.compile("^<%{NONNEGINT:syslog_pri}>%{GREEDYDATA:message}", false);
        syslog = Pattern.compile("<(?<syslogpri>\\b(?:[0-9]+)\\b)>(?<message>.*)");
    }

    @Benchmark
    public Match grokSpeed() {
        Match gm = grok.match("<1>totor");
        gm.captures();
        assert gm.toMap().get("syslog_pri") != null;
        assert gm.toMap().get("message") != null;
        return gm;
    }

    @Benchmark
    public Matcher javaRegexSpeed() {
        Matcher m = syslog.matcher("<1>totor");
        m.matches();
        assert m.group("syslogpri") != null;
        assert m.group("message") != null;
        return m;
    }

returns, on a Intel Xeon E312xx:

GrokSpeed.grokSpeed       avgt    5  1.952 ± 0.053  us/op
GrokSpeed.javaRegexSpeed  avgt    5  0.165 ± 0.005  us/op

That's 11 time slower !

The full maven project for running tests is: fbacchella/grokspeed. It's run with mvn clean package && java -jar target/grokspeed.jar

Drop Maven support?

The project builds well with Gradle. Travis CI is using Gradle to build it. My IDE gets confused about the two build systems (due to the POM file). Both build systems have to be kept in sync manually. So my question is: Should we drop the POM file / Maven support? Is it really worth maintaining both?

"Duplicate key" error when using repeated matches with the same name and type

When I try to compile the following with 0.1.9:

"%{NUMBER:myValue:int} %{NUMBER:myValue:int}";

I get:

java.lang.IllegalStateException: Duplicate key myValue (attempted merging values INT and INT)
	at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
	at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
	at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
	at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1675)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at io.krakens.grok.api.Converter.getGroupTypes(Converter.java:85)
	at io.krakens.grok.api.Grok.<init>(Grok.java:72)
	at io.krakens.grok.api.GrokCompiler.compile(GrokCompiler.java:197)
	at io.krakens.grok.api.GrokCompiler.compile(GrokCompiler.java:124)
	at io.krakens.grok.api.GrokCompiler.compile(GrokCompiler.java:120)

However, when I try:

"%{NUMBER:myValue} %{NUMBER:myValue}"

It works just fine.

So maybe a different version of toMap needs to be called, maybe the one that takes BinaryOperator<U> mergeFunction as the third argument? 🤔 Not sure what merge function to pass it, though.

Custom date format doesn't work with hyphens

Custom date formats (e. g. %{PATTERN:my_date;datetime;dd/MM/yyyy}) don't seem to work if hyphens ('-') are being used in the date pattern.

If the date pattern contains a hyphen, Grok fails with the following exception:

java.util.regex.PatternSyntaxException: Illegal repetition near index 4
Foo %{DATA:result;date;yyyy-MM-dd} Bar
    ^
    at java.util.regex.Pattern.error(Pattern.java:1955)
    at java.util.regex.Pattern.closure(Pattern.java:3157)
    at java.util.regex.Pattern.sequence(Pattern.java:2134)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.compile(Pattern.java:1696)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1054)
    at com.google.code.regexp.Pattern.buildStandardPattern(Unknown Source)
    at com.google.code.regexp.Pattern.<init>(Unknown Source)
    at com.google.code.regexp.Pattern.compile(Unknown Source)
    at oi.thekraken.grok.api.Grok.compile(Grok.java:376)
    at io.thekraken.grok.api.GrokTest.test020_datetime_pattern_with_with_hyphens(GrokTest.java:572)

I've added some tests in joschi/java-grok@a84a1c0efce25885bfc8cb68d30954540bd7a2b8 to illustrate the issue (18 and 19 are working, 20 will fail with the exception above).

package name is not compliant with JAVA naming conventions

I noticed this today by accident:

You maven and gradle code suggest your name space is:

<dependency>
  <groupId>io.thekraken</groupId>
  <artifactId>grok</artifactId>
  <version>0.1.4</version>
</dependency>

Yet, your package names are:

oi.thekraken.grok.api

Note how your packages start with oi, while your groupId starts with io.

Traditionally your packages would start with 'io', same as Apache Spark is org.apache.spark instead of gro.apache.spark.

Valid patterns failing with "Deep recursion pattern"

The valid grok patterns for parsing Postfix logs available at https://github.com/whyscream/postfix-grok-patterns/blob/f0ec34dcc6250463a30ba2077d8afa89ee1a17a1/postfix.grok fail to work with this grok library if the "only named captures" option is being used.

Logstash seems to have no problems with these patterns.

Exception (for the pattern %{POSTFIX_SMTPD}):

io.thekraken.grok.api.exception.GrokException: Deep recursion pattern compilation of %{POSTFIX_SMTPD}
	at io.thekraken.grok.api.Grok.compile(Grok.java:355)

Graylog Grok Pattern Extractor issue

Hi,

I have installed graylog1.1.5-1 through the vagrant image. I have set up a syslog input to grab log entries from linux iptables running on my linux firewall. All of that is working perfectly fine and the entries are being successfully captured.

I have tried to set up a Grok Pattern Extractor to capture some of the fields from the iptables log entries so I can do some indexing and searches on specific fields and i ran into some issues.

An entry looks like this:

[19348602.294727] New_Connection -- ACCEPT IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:00:1d:7d:0c:03:db:08:00 SRC=192.168.1.11 DST=192.168.1.255 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=13989 PROTO=UDP SPT=137 DPT=137 LEN=58

and the Grok Pattern looks like this:

%{SYSLOG5424SD:time_stamp}.*%{WORD:action} IN=%{WORD:int_eth} OUT= MAC=%{IP}:%{MAC} SRC=%{IP:src_ip} DST=%{IP:dst_ip} LEN=%{INT:length}.*PROTO=%{WORD:proto} SPT=%{INT:src_port} DPT=%{INT:dst_port}.*

I have run these through the Grok debugger and it successfully captures the right fields, but when I set up an Extractor with these and try it on the web interface, it doesn't work.
The first time I tried it, it gave me a timeout message, so I increased the timeout to 10 seconds.

After that, I tried again and it gave me another error. I googled it, but can't find any information on it. I found this error in the /var/log/graylog/server/current log file:

2015-08-07_20:41:34.80474 ERROR [AnyExceptionClassMapper] Unhandled exception in REST resource
2015-08-07_20:41:34.80476 oi.thekraken.grok.api.exception.GrokException: Deep recursion pattern compilation of %{SYSLOG5424SD:time_stamp}.*%{WORD:action} IN=%{WORD:int_eth} OUT= MAC=%{IP}:%{MAC} SRC=%{IP:src_ip} DST=%{IP:dst_ip} LEN=%{INT:length}.*PROTO=%{WORD:proto} SPT=%{INT:src_port} DPT=%{INT:dst_port}.*
2015-08-07_20:41:34.80477    at oi.thekraken.grok.api.Grok.compile(Grok.java:344)
2015-08-07_20:41:34.80478    at org.graylog2.rest.resources.tools.GrokTesterResource.doTestGrok(GrokTesterResource.java:83)
2015-08-07_20:41:34.80478    at org.graylog2.rest.resources.tools.GrokTesterResource.testGrok(GrokTesterResource.java:72)
2015-08-07_20:41:34.80479    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2015-08-07_20:41:34.80479    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2015-08-07_20:41:34.80480    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2015-08-07_20:41:34.80480    at java.lang.reflect.Method.invoke(Method.java:497)
2015-08-07_20:41:34.80481    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
2015-08-07_20:41:34.80482    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:164)
2015-08-07_20:41:34.80483    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:181)
2015-08-07_20:41:34.80483    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:203)
2015-08-07_20:41:34.80484    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:101)
2015-08-07_20:41:34.80484    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
2015-08-07_20:41:34.80485    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
2015-08-07_20:41:34.80485    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
2015-08-07_20:41:34.80486    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:305)
2015-08-07_20:41:34.80486    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
2015-08-07_20:41:34.80487    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
2015-08-07_20:41:34.80488    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
2015-08-07_20:41:34.80489    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
2015-08-07_20:41:34.80489    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
2015-08-07_20:41:34.80490    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
2015-08-07_20:41:34.80490    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:288)
2015-08-07_20:41:34.80491    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1110)
2015-08-07_20:41:34.80493    at org.graylog2.jersey.container.netty.NettyContainer.messageReceived(NettyContainer.java:356)
2015-08-07_20:41:34.80493    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
2015-08-07_20:41:34.80494    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
2015-08-07_20:41:34.80495    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
2015-08-07_20:41:34.80495    at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
2015-08-07_20:41:34.80497    at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
2015-08-07_20:41:34.80497    at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
2015-08-07_20:41:34.80498    at org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor$MemoryAwareRunnable.run(MemoryAwareThreadPoolExecutor.java:622)
2015-08-07_20:41:34.80498    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2015-08-07_20:41:34.80499    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2015-08-07_20:41:34.80499    at java.lang.Thread.run(Thread.java:745)

I have done some googling on the deep recursion error message, but couldn't find anything, hence why I am posting this here.

I have opened an issue on github.com/graylog2/graylog2-server, but they said that it might be a bug in java-grok, hence why I am opening an issue here.

Any idea on how to solve this would be much appreciated.
Thanks a lot in advance,
Bertrand.

Dropped support for named group captures with underscore

With #53 we lost the ability to have named group captures with underscore like (?<test_field>test).

java-grok had the support as long as we used the com.google.code.regexp.Pattern.
Now with java.util.regex.Pattern we use the java regex engine which does not support underscores:

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#groupname

A capturing group can also be assigned a "name", a named-capturing group, and then be back-
referenced later by the "name". Group names are composed of the following characters. The first
character must be a letter.
The uppercase letters 'A' through 'Z' ('\u0041' through '\u005a'),
The lowercase letters 'a' through 'z' ('\u0061' through '\u007a'),
The digits '0' through '9' ('\u0030' through '\u0039'),

This broke backward compatibility with already stored patterns.

Preferable fix was to bring back com.google.code.regexp.Pattern.

Don't like match that start with a "

The following code:
String pattern = "(?client id): (?.*)";
String input = "client id: "name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc."";

    // Validate the search is good
    Pattern p = Pattern.compile("(?<message>client id): (?<clientid>.*)");
    Matcher m = p.matcher(input);
    if (m.matches()) {
        System.out.println(m.group("clientid"));
    }

    io.thekraken.grok.api.Grok grok = new io.thekraken.grok.api.Grok();
    grok.compile(pattern, false);

    Match gm = grok.match(input);
    gm.captures();
    System.out.println(gm.toMap().get("clientid"));
    System.out.println(gm.getMatch().group("clientid"));

output:

"name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc."
name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc.
"name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc."

Notice who gm.toMap().get("clientid") eats the first " although the java matcher is good

Java 7 version?

I'm building on travis using openjdk7 and I'm getting Caused by: java.lang.UnsupportedClassVersionError: io/krakens/grok/api/GrokCompiler : Unsupported major.minor version 52.0.

I see there's a lot of "lambda-style" code in 0.1.9, is there any plan on backporting to java7?

Pile is deprecated, what to do now?

Hi,
Since the Pile class is marked as deprecated, how should I match against several different patterns?

Something ideal would probably have been something like:

        Grok grok = new Grok();

        grok.addPatternFromFile("src/main/resources/grok-patterns");
        grok.compile("%{CISCOFW305011}");
        grok.compile("%{CISCOFW313001_313004_313008}");
        grok.compile("%{CISCOFW313005}");
        grok.compile("%{CISCOFW402117}");
        grok.compile("%{CISCOFW402119}");

         Match gm = grok.match(message);
         gm.captures();
         System.out.println(gm.toJson());

This would match any of the given patterns.

Now I would have to do something like:


 List<Grok> groks = new ArrayList<Grok>();
 List<String> grokPatterns = new ArrayList<String>();
 grokPatterns.add("%{CISCOFW106023}");
 grokPatterns.add("%{CISCOFW313005}");
 grokPatterns.add("%{CISCOFW106001}");
 grokPatterns.add("%{CISCOFW106006_106007_106010}");
 grokPatterns.add("%{CISCOFW106014}");
 grokPatterns.add("%{CISCOFW106015}");
 grokPatterns.add("%{CISCOFW106021}");
for (String grokPattern : grokPatterns) {
            try {
                Grok grok = new Grok();
                grok.addPatternFromFile("resources/grok-patterns");
                grok.compile(grokPattern);
                groks.add(grok);

            } catch (GrokException e) {
                e.printStackTrace();
            }
}
 for (Grok grok : groks ) {
            Match gm = grok.match(message);
            if (gm.isNull())
                continue;
            gm.captures();
            System.out.println(gm.toJson());
        }

I find this somewhat sub-optimal.

cannot parse this log

1 The log is as follows and grok cannot parse this log.

10.192.1.47 - - [23/May/2013:10:47:40] "GET /flower1_store/category1.screen?category_id1=FLOWERS HTTP/1.1" 200 10577 "http://mystore.abc.com/flower1_store/main.screen&JSESSIONID=SD1SL10FF3ADFF3" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070223 CentOS/1.5.0.10-0.1.el4.centos Firefox/1.5.0.10" 3823 404

2 I add a new pattern in base: "HTTPDATE1 %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME}"
errors are as follows:
java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 371
(((?:(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(.?|\b))|((?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})...)(?![0-9]))))(?::(\b(?:[1-9][0-9])\b))?) - - (((?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b)/((?>\d\d){1,2}):((?!<[0-9])((?:2[0123]|[01][0-9])):((?:[0-5][0-9]))(?::((?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?)))(?![0-9]))) 200 10577 "(([A-Za-z]+(+[A-Za-z+]+)?)://(?:(([a-zA-Z0-9_-]+))(?::[^@])?@)?(?:(((?:(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(.?|\b))|((?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})...)(?![0-9]))))(?::(\b(?:[1-9][0-9])\b))?))?(?:(.?\S+))?) "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070223 CentOS/1.5.0.10-0.1.el4.centos Firefox/1.5.0.10" 3823 404
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.escape(Pattern.java:2177)
at java.util.regex.Pattern.range(Pattern.java:2338)
at java.util.regex.Pattern.clazz(Pattern.java:2268)
at java.util.regex.Pattern.sequence(Pattern.java:1818)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at com.google.code.regexp.Pattern.buildStandardPattern(Unknown Source)
at com.google.code.regexp.Pattern.(Unknown Source)
at com.google.code.regexp.Pattern.compile(Unknown Source)
at com.nflabs.Grok.Grok.compile(Grok.java:203)
at com.nflabs.Grok.LzbGrokeTest.testGrok(LzbGrokeTest.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

Travis is broken

It looks like the jdk download is broken in travis, prob. because of the age and coming before the new jdk licensing.
This needs to get changed to a newer travis image with openjdk targets

Include patterns as resource, and load by default

  1. Move /patterns to /src/main/resources/io/thekraken/grok/patterns (or similar).
  2. Add create(InputStream) and/or create(URL) methods to allow loading patterns from not just files.
  3. Add a create() method that loads all the packaged patterns.

Custom date format doesn't work with commas

Custom date formats (e. g. %{PATTERN:my_date;datetime;dd/MM/yyyy}) don't seem to work if comma characters are being used in the date pattern.

If the date pattern contains a comma, Grok fails with the following exception:

java.util.regex.PatternSyntaxException: Illegal repetition near index 4
Foo %{DATA:result;date;yyyy,MM,dd} Bar
    ^

	at java.base/java.util.regex.Pattern.error(Pattern.java:2010)
	at java.base/java.util.regex.Pattern.closure(Pattern.java:3307)
	at java.base/java.util.regex.Pattern.sequence(Pattern.java:2196)
	at java.base/java.util.regex.Pattern.expr(Pattern.java:2051)
	at java.base/java.util.regex.Pattern.compile(Pattern.java:1773)
	at java.base/java.util.regex.Pattern.<init>(Pattern.java:1422)
	at java.base/java.util.regex.Pattern.compile(Pattern.java:1082)
	at io.krakens.grok.api.Grok.<init>(Grok.java:69)
	at io.krakens.grok.api.GrokCompiler.compile(GrokCompiler.java:197)

Multiple duplicate named matches doesn't work like logstash

In logstash, if you have multiple of the same named patterns, logstash will return an array of matched values. Is this something this library should support?

logstash

If I have a logstash config of:

input {
    stdin{}
}
filter {
    grok {
        match => { "message" => "%{INT:id} %{INT:id}" }
    }
}
output {
    stdout { codec => json }
}
123 456
{"message":"123 456 678","@version":"1","@timestamp":"2015-11-17T17:05:10.292Z","host":"spencerherzberg-mbp","id":["123","456"]}

java-grok

patterns.txt

INT (?:[+-]?(?:[0-9]+))

grokissue.groovy

@Grab(group='io.thekraken', module='grok', version='0.1.1')
import oi.thekraken.grok.api.Grok
import oi.thekraken.grok.api.Match

Grok g = Grok.create("patterns.txt")
g.compile("%{INT:id} %{INT:id}");

String log = "123 456";
Match gm = g.match(log);
gm.captures();

System.out.println(gm.toJson());

output

{"id":123}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.