Giter Site home page Giter Site logo

fluent-plugin-grok-parser's Introduction

Grok Parser for Fluentd

Testing on Ubuntu Testing on macOS

This is a Fluentd plugin to enable Logstash's Grok-like parsing logic.

Requirements

fluent-plugin-grok-parser fluentd ruby
>= 2.0.0 >= v0.14.0 >= 2.1
< 2.0.0 >= v0.12.0 >= 1.9

What's Grok?

Grok is a macro to simplify and reuse regexes, originally developed by Jordan Sissel.

This is a partial implementation of Grok's grammer that should meet most of the needs.

How It Works

You can use it wherever you used the format parameter to parse texts. In the following example, it extracts the first IP address that matches in the log.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    grok_pattern %{IP:ip_address}
  </parse>
</source>

If you want to try multiple grok patterns and use the first matched one, you can use the following syntax:

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    <grok>
      pattern %{HTTPD_COMBINEDLOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
    </grok>
    <grok>
      pattern %{IP:ip_address}
    </grok>
    <grok>
      pattern %{GREEDYDATA:message}
    </grok>
  </parse>
</source>

Multiline support

You can parse multiple line text.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type multiline_grok
    grok_pattern %{IP:ip_address}%{GREEDYDATA:message}
    multiline_start_regexp /^[^\s]/
  </parse>
</source>

You can use multiple grok patterns to parse your data.

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type multiline_grok
    <grok>
      pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details})
    </grok>
  </parse>
</source>

Fluentd accumulates data in the buffer forever to parse complete data when no pattern matches.

You can use this parser without multiline_start_regexp when you know your data structure perfectly.

Configurations

  • See also: Config: Parse Section - Fluentd

  • time_format (string) (optional): The format of the time field.

  • grok_pattern (string) (optional): The pattern of grok. You cannot specify multiple grok pattern with this.

  • custom_pattern_path (string) (optional): Path to the file that includes custom grok patterns

  • grok_failure_key (string) (optional): The key has grok failure reason.

  • grok_name_key (string) (optional): The key name to store grok section's name

  • multi_line_start_regexp (string) (optional): The regexp to match beginning of multiline. This is only for "multiline_grok".

  • grok_pattern_series (enum) (optional): Specify grok pattern series set.

    • Default value: legacy.

<grok> section (optional) (multiple)

  • name (string) (optional): The name of this grok section
  • pattern (string) (required): The pattern of grok
  • keep_time_key (bool) (optional): If true, keep time field in the record.
  • time_key (string) (optional): Specify time field for event time. If the event doesn't have this field, current time is used.
    • Default value: time.
  • time_format (string) (optional): Process value using specified format. This is available only when time_type is string
  • timezone (string) (optional): Use specified timezone. one can parse/format the time value in the specified timezone.

Examples

Using grok_failure_key

<source>
  @type dummy
  @label @dummy
  dummy [
    { "message1": "no grok pattern matched!", "prog": "foo" },
    { "message1": "/", "prog": "bar" }
  ]
  tag dummy.log
</source>

<label @dummy>
  <filter>
    @type parser
    key_name message1
    reserve_data true
    reserve_time true
    <parse>
      @type grok
      grok_failure_key grokfailure
      <grok>
        pattern %{PATH:path}
      </grok>
    </parse>
  </filter>
  <match dummy.log>
    @type stdout
  </match>
</label>

This generates following events:

2016-11-28 13:07:08.009131727 +0900 dummy.log: {"message1":"no grok pattern matched!","prog":"foo","message":"no grok pattern matched!","grokfailure":"No grok pattern matched"}
2016-11-28 13:07:09.010400923 +0900 dummy.log: {"message1":"/","prog":"bar","path":"/"}

Using grok_name_key

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    grok_name_key grok_name
    grok_failure_key grokfailure
    <grok>
      name apache_log
      pattern %{HTTPD_COMBINEDLOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
    </grok>
    <grok>
      name ip_address
      pattern %{IP:ip_address}
    </grok>
    <grok>
      name rest_message
      pattern %{GREEDYDATA:message}
    </grok>
  </parse>
</source>

This will add keys like following:

  • Add grok_name: "apache_log" if the record matches HTTPD_COMBINEDLOG
  • Add grok_name: "ip_address" if the record matches IP
  • Add grok_name: "rest_message" if the record matches GREEDYDATA

Add grokfailure key to the record if the record does not match any grok pattern. See also test code for more details.

How to parse time value using specific timezone

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    <grok>
      name mylog-without-timezone
      pattern %{DATESTAMP:time} %{GREEDYDATE:message}
      timezone Asia/Tokyo
    </grok>
  </parse>
</source>

This will parse the time value as "Asia/Tokyo" timezone.

See Config: Parse Section - Fluentd for more details about timezone.

How to write Grok patterns

Grok patterns look like %{PATTERN_NAME:name} where ":name" is optional. If "name" is provided, then it becomes a named capture. So, for example, if you have the grok pattern

%{IP} %{HOST:host}

it matches

127.0.0.1 foo.example

but only extracts "foo.example" as {"host": "foo.example"}

Please see patterns/* for the patterns that are supported out of the box.

How to add your own Grok pattern

You can add your own Grok patterns by creating your own Grok file and telling the plugin to read it. This is what the custom_pattern_path parameter is for.

<source>
  @type tail
  path /path/to/log
  <parse>
    @type grok
    grok_pattern %{MY_SUPER_PATTERN}
    custom_pattern_path /path/to/my_pattern
  </parse>
</source>

custom_pattern_path can be either a directory or file. If it's a directory, it reads all the files in it.

FAQs

1. How can I convert types of the matched patterns like Logstash's Grok?

Although every parsed field has type string by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information.

The syntax is

grok_pattern %{GROK_PATTERN:NAME:TYPE}...

e.g.,

grok_pattern %{INT:foo:integer}

Unspecified fields are parsed at the default string type.

The list of supported types are shown below:

  • string
  • bool
  • integer ("int" would NOT work!)
  • float
  • time
  • array

For the time and array types, there is an optional 4th field after the type name. For the "time" type, you can specify a time format like you would in time_format.

For the "array" type, the third field specifies the delimiter (the default is ","). For example, if a field called "item_ids" contains the value "3,4,5", types item_ids:array parses it as ["3", "4", "5"]. Alternatively, if the value is "Adam|Alice|Bob", types item_ids:array:| parses it as ["Adam", "Alice", "Bob"].

Here is a sample config using the Grok parser with in_tail and the types parameter:

<source>
  @type tail
  path /path/to/log
  format grok
  grok_pattern %{INT:user_id:integer} paid %{NUMBER:paid_amount:float}
  tag payment
</source>

Notice

If you want to use this plugin with Fluentd v0.12.x or earlier, you can use this plugin version v1.x.

See also: Plugin Management | Fluentd

License

Apache 2.0 License

fluent-plugin-grok-parser's People

Contributors

ashie avatar cosmo0920 avatar dependabot[bot] avatar kiyoto avatar okkez avatar repeatedly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fluent-plugin-grok-parser's Issues

multiline_flush_interval does not seem to work with multiline_grok

While using the multiline_grok parser the last in my log file only seems to get output after another log entry is appended. I would have expected setting multiline_flush_interval to have resolved this issue but did not seem to work. Please find my config below.

@type tail @id portal1_log_tail @type multiline_grok pattern %{WEBLOGIC_LOG_PATTERN} time_format %d-%b-%Y %H:%M:%S localtime true multiline_start_regexp /^(####<)/ multiline_flush_interval 4s custom_pattern_path /fluentd/etc/my-grok-patterns time_key logtime localtime true path /u01/config/domains/wcp001_domain/servers/WC_CustomPortal1/logs/WC_CustomPortal1.log pos_file /var/log/td-agent/WC_CustomPortal1.log.pos tag portal.WC_CustomPortal1.log

[Question] Handle multiline with empty line from kubernetes/docker with --log-driver=json-file

I've a java app that generate stack trace with empty line:

{"log":"2017-11-27 16:39:07.263 'auniqueid' ERROR 8 --- [io-8080-exec-31] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.web.client.ResourceAccessException: I/O error on GET request for \"http://myapidomain:80/uri/go/193\": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out] with root cause\n","stream":"stdout","time":"2017-11-27T15:39:07.263735593Z"}
{"log":"\n","stream":"stdout","time":"2017-11-27T15:39:07.263774072Z"}
{"log":"java.net.SocketTimeoutException: Read timed out\n","stream":"stdout","time":"2017-11-27T15:39:07.263778472Z"}
{"log":"\u0009at java.net.SocketInputStream.socketRead0(Native Method)\n","stream":"stdout","time":"2017-11-27T15:39:07.263781625Z"}
{"log":"\u0009at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)\n","stream":"stdout","time":"2017-11-27T15:39:07.263784714Z"}
{"log":"\u0009at java.net.SocketInputStream.read(SocketInputStream.java:171)\n","stream":"stdout","time":"2017-11-27T15:39:07.263787383Z"}
{"log":"\u0009at java.net.SocketInputStream.read(SocketInputStream.java:141)\n","stream":"stdout","time":"2017-11-27T15:39:07.263789999Z"}

I retrieve container's log in json format dockerd --log-driver=json-file:

    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/es-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%N
      tag kubernetes.*
      format json
      read_from_head true
      keep_time_key true
    </source>
    <filter kubernetes.var.log.containers.myapp**>
        @type parser
        format multiline_grok
        key_name log
        reserve_data true
        reserve_time true
        grok_pattern /%{TIMESTAMP_ISO8601:log_time}%{SPACE}('%{DATA:correlationId}')?%{SPACE}%{LOGLEVEL:log_level}%{SPACE}%{NUMBER:pid}?%{SPACE}---%{SPACE}\[%{DATA:threadname}\]%{SPACE}%{DATA:classname}%{SPACE}:%{SPACE}%{GREEDYDATA:log_message}/
        multiline_start_regexp /^\\u0009/
        inject_key_prefix grok
    </filter>

This filter doesn't work.

How to handle/skip these two lines (2nd and 3rd):
1/ an empty line
2/ java. starting

{"log":"\n","stream":"stdout","time":"2017-11-27T15:39:07.263774072Z"}
{"log":"java.net.SocketTimeoutException: Read timed out\n","stream":"stdout","time":"2017-11-27T15:39:07.263778472Z"}

How multiline_start_regexp handle unicode ? How to skip \n or java.net.SocketTimeoutException

multiple multiline pattern w/ and w/o multiline_start_regexp

You can use multiple grok patterns to parse your data.

  @type tail
  path /path/to/log
  format multiline_grok
  <grok>
    pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details})
  </grok>
  tag grokked_log
</source>

You can use this parser without multiline_start_regexp when you know your data structure perfectly.

Actually, you can't use this pattern with multiline_start_regexp.

parameter 'multiline_start_regexp' in <grok> ... </grok> is not used

How do I handle combined (multi and single line) java logs? Single lines might be generated by a custom log4j pattern.

help needed using multiline javastacktracepart pattern

Hi there,

Could someone shed some light on how to parse multiline java stack traces using the javastacktracepart pattern (or any other pattern) in order to place the whole stacktrace in one field (to be able to view them in Kibana)?

Trying to use the following but doesn't seem to do the trick:

<filter kubernetes.var.log.containers.app**>
      @type parser
      key_name log
        <parse>
          @type multiline_grok          
          <grok>
            pattern %{JAVASTACKTRACEPART:javastacktrace}
          </grok>
        </parse>
</filter>

but instead the javastacktrace field contain only one line of the stacktrace:

{
  "javastacktrace": "at java.io.foobar.javamethod(foobar.java:900)",
  "file": "foobar.java",
  "method": "javamethod",
  "line": "900",
  "class": "java.io.foobar"
}

Grok Parser Upgrades FluentD Version

I am running into an issue where I am using Docker to build a specific version of FluentD. However it appears that when I have your plugin install it kicks off a automatic update of FluentD to the latest release.

Here is my docker file as per the fluentD docker page.

FROM fluent/fluentd:v0.12.28-onbuild
MAINTAINER YOUR_NAME Test
WORKDIR /home/fluent
ENV PATH /home/fluent/.gem/ruby/2.3.0/bin:$PATH

# cutomize following "gem install fluent-plugin-..." line as you wish

USER root
RUN apk --no-cache --update add sudo build-base ruby-dev && \

    sudo -u fluent gem install fluent-plugin-elasticsearch \
            fluent-plugin-record-reformer \
            fluent-plugin-record-modifier \
            fluent-plugin-parser \
            fluent-plugin-rewrite-tag-filter \
            fluent-plugin-forest \
#            fluent-plugin-grok-parser \
            fluent-plugin-rewrite \
            && \

    rm -rf /home/fluent/.gem/ruby/2.3.0/cache/*.gem && sudo -u fluent gem sources -c && \
    apk del sudo build-base ruby-dev && rm -rf /var/cache/apk/*

EXPOSE 24284

USER fluent
CMD exec fluentd -c /fluentd/etc/$FLUENTD_CONF -p /fluentd/plugins $FLUENTD_OPT

Building and running with your plugin excluded produces:

2016-10-07 22:24:15 +0000 [info]: reading config file path="/fluentd/etc/fluent.conf"
2016-10-07 22:24:15 +0000 [info]: **starting fluentd-0.12.28**
2016-10-07 22:24:15 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.7.0'
2016-10-07 22:24:15 +0000 [info]: gem 'fluent-plugin-forest' version '0.3.3'
2016-10-07 22:24:15 +0000 [info]: gem 'fluent-plugin-parser' version '0.6.1'
2016-10-07 22:24:15 +0000 [info]: gem 'fluent-plugin-record-modifier' version '0.5.0'
2016-10-07 22:24:15 +0000 [info]: gem 'fluent-plugin-record-reformer' version '0.8.2'
2016-10-07 22:24:15 +0000 [info]: gem 'fluent-plugin-rewrite' version '0.0.13'
2016-10-07 22:24:15 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'
2016-10-07 22:24:15 +0000 [info]: gem 'fluentd' version '0.12.28'
2016-10-07 22:24:15 +0000 [info]: adding match in @mainstream pattern="docker.**" type="file"

With your plugin not excluded produces:

2016-10-08 00:06:06 +0000 [info]: reading config file path="/fluentd/etc/fluent.conf"
2016-10-08 00:06:06 +0000 [info]: starting fluentd-0.14.7
2016-10-08 00:06:06 +0000 [info]: spawn command to main: /usr/bin/ruby -Eascii-8bit:ascii-8bit /home/fluent/.gem/ruby/2.3.0/bin/fluentd -c /fluentd/etc/fluent.conf -p /fluentd/plugins --under-supervisor
2016-10-08 00:06:06 +0000 [info]: reading config file path="/fluentd/etc/fluent.conf"
2016-10-08 00:06:06 +0000 [info]: **starting fluentd-0.14.7 without supervision**
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.7.0'
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-forest' version '0.3.3'
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.0.0'
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-parser' version '0.6.1'
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-record-modifier' version '0.5.0'
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-record-reformer' version '0.8.2'
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-rewrite' version '0.0.13'
2016-10-08 00:06:06 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'
2016-10-08 00:06:06 +0000 [info]: gem 'fluentd' version '0.14.7'
2016-10-08 00:06:06 +0000 [info]: gem 'fluentd' version '0.12.28'
2016-10-08 00:06:06 +0000 [info]: adding match in @mainstream pattern="docker.**" type="file"

Is this by design and is there a way to prevent it from upgrading the FluentD release?

how to use grok pattern file

<source>
  @type tail
  path /path/to/log
  <parse>
    @type grok
    grok_pattern %{MY_SUPER_PATTERN}
    custom_pattern_path /path/to/my_pattern
  </parse>
</source>

I have a grok file but should I change the grok_pattern line ?
and what's t should be for.

I just test without this line the fluentd config file error happened

Not compatible with td-agent 0.12.29

Hi,

when trying to use this plugin with td-agent 0.12.29 on centOS i get the following error:

2016-10-20 16:14:12 +0200 [info]: adding match pattern="td.*.*" type="tdlog"  
/opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/specification.rb:2112:in `raise_if_conflicts': Unable to activate td-client-0.8.83, because msgpack-1.0.2 conflicts with msgpack (!= 0.5.0, != 0.5.1, != 0.5.2, != 0.5.3, < 0.8.0, >= 0.4.4) (Gem::ConflictError)    
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/specification.rb:1280:in `activate'                              
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems.rb:198:in `rescue in try_activate'                               
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems.rb:195:in `try_activate'                                         
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:126:in `rescue in require'            
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:39:in `require'                       
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-td-0.10.29/lib/fluent/plugin/out_tdlog.rb:1:in `<top (required)>'                                                                                                                                  
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'                       
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'                       
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/registry.rb:94:in `block in search'             
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/registry.rb:92:in `each'                        
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/registry.rb:92:in `search'                      
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/registry.rb:43:in `lookup'                      
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin.rb:146:in `new_impl'                     
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin.rb:104:in `new_output'                   
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/agent.rb:125:in `add_match'                     
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/agent.rb:70:in `block in configure'             
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/agent.rb:63:in `each'                           
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/agent.rb:63:in `configure'                      
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/root_agent.rb:86:in `configure'                 
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/engine.rb:119:in `configure'                    
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/engine.rb:93:in `run_configure'                 
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/supervisor.rb:693:in `run_configure'            
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/supervisor.rb:453:in `block in run_worker'      
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/supervisor.rb:626:in `call'                     
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/supervisor.rb:626:in `main_process'             
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/supervisor.rb:449:in `run_worker'               
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/command/fluentd.rb:288:in `<top (required)>'    
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'                       
        from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'                       
        from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/bin/fluentd:5:in `<top (required)>'                        
        from /opt/td-agent/embedded/bin/fluentd:23:in `load'                                                                           
        from /opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'                                                               
        from /sbin/td-agent:7:in `load'                                                                                                
        from /sbin/td-agent:7:in `<main>' 
2016-10-20 16:14:12 +0200 [info]: Worker 0 finished unexpectedly with status 1 

Grok pattern doesn't work when there are double quotes at the beginning of the pattern

This could be a non issue.

But, somehow I am not able to get following pattern working. Is my grok pattern wrong ?

td-agent configuration

<source>
  @type beats
  metadata_as_tag
  format grok
  time_format %d/%b/%Y:%H:%M:%S %z
  grok_failure_key grokfailure
  <grok>
    pattern "%{DATA:side}" \[%{HTTPDATE:timestamp}\] %{IPORHOST:clientip} %{QS:agent} "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{QS:response_code} %{QS:cache} %{NUMBER:first_byte:float} %{NUMBER:upstream_resp_time:float}
  </grok>
  <grok>
    pattern %{GREEDYDATA:message}
  </grok>
  port 5044
  bind 0.0.0.0
</source>

When I use above pattern, I get below error

/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/config/basic_parser.rb:92:in `parse_error!': expected end of line at td-agent.conf line 152,26 (Fluent::ConfigParseError)
151:   <grok>
152:     pattern "%{WORD:side}" \[%{HTTPDATE:timestamp}\] %{IPORHOST:clientip} %{QS:agent} "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{QS:response_code} %{DATA:cache} %{DATA:first_byte:float} %{DATA:upstream_resp_time:float}

     --------------------------^
153:   </grok>
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/config/v1_parser.rb:132:in 'parse_element'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/config/v1_parser.rb:95:in 'parse_element'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/config/v1_parser.rb:95:in 'parse_element'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/config/v1_parser.rb:43:in 'parse!'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/config/v1_parser.rb:33:in 'parse'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/config.rb:39:in 'parse'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:741:in 'read_config'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:451:in 'run_supervisor'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/command/fluentd.rb:310:in '<top (required)>'
    from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in 'require'
    from /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in 'require'
    from /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/bin/fluentd:8:in '<top (required)>'
    from /opt/td-agent/embedded/bin/fluentd:22:in 'load'
    from /opt/td-agent/embedded/bin/fluentd:22:in '<top (required)>'
    from /usr/sbin/td-agent:7:in 'load'
    from /usr/sbin/td-agent:7:in '<main>'

But when I replace the pattern with following, it works as expected.

  <grok>
    pattern %{QS:side} \[%{HTTPDATE:timestamp}\] %{IPORHOST:clientip} %{QS:agent} "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{QS:response_code} %{QS:cache} %{NUMBER:first_byte:float} %{NUMBER:upstream_resp_time:float}
  </grok>

types not working in multiple grok patterns

Types are not working when i used the following config

<source>
  @type tail
  tag test1
  path /log
  pos_file /log.pos
  path_key tailed_path
  <parse>
   @type grok
   <grok>
     types type1,type2
     pattern1
   </grok>
   <grok>
     types type1,type2
     pattern2
 </parse>
</source>

I am getting the error while exposing the metrics
failed to instrument a metric. error_class=TypeError error=#<TypeError: String can't be coerced into Intege

multiline_grok doesn't capture unmatched multi-line events in grok_failure output

Hi okimoto,

We've been comparing fluentd with logstash for some time now, and both work well. We like logstash's grok parsing and error handling abilities, hence we're trying your plugin. We have a requirement to capture all logs even if it fails parsing.

Apache Test:

<source>
  @type tail
  <parse>
    @type grok
    grok_failure_key grokfailure
    grok_pattern %{COMBINEDAPACHELOG}
    time_format "%d/%b/%Y:%H:%M:%S %z"
  </parse>
  ...
</source>

Input 1 (expected to work):
127.0.0.1 - - [19/Dec/2016:11:30:43 +1100] "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"

Produces:
{"clientip":"127.0.0.1","ident":"-","auth":"-","timestamp":"19/Dec/2016:11:52:29 +1100","verb":"GET","request":"/","httpversion":"1.0","response":"403","bytes":"3985","referrer":""-"","agent":""ApacheBench/2.3"","server":"tranj-fluentd-s3b","stack":"my-app-dev-01","application":"my-app","log_type":"filter.test","time":"2017-01-10T20:56:49Z"}

Input 2 (expected to fail):
127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"

Produces:
{"message":"127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"","grokfailure":"No grok pattern matched","server":"tranj-fluentd-s3b","stack":"my-app-dev-01","application":"my-app","log_type":"filter.test","time":"2017-01-10T21:05:42Z"}

Exactly what we wanted.

Tomcat Test:

patterns file (/tmp/patterns):

MY_CATALINA_LOG (?<timestamp>%{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM)) %{JAVACLASS:class} %{WORD:method}%{GREEDYDATA:tomcatmsg}
MY_TOMCAT_LOG (?<timestamp>20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND})) %{LOGLEVEL:level}\s+%{USERNAME:class}\s+%{GREEDYDATA:tomcatmsg}
MY_TOMCAT %{MY_CATALINA_LOG}|%{MY_TOMCAT_LOG}
<source>
  @type tail
  <parse>
    @type multiline_grok
    grok_failure_key grokfailure
    custom_pattern_path /tmp/patterns
    grok_pattern %{MY_TOMCAT}
    multiline_start_regexp /^(\w+\s\d+,\s\d+)|(\d+-\d+-\d+\s)/
  </parse>
  ...
</source>

Input 1 (expected to work):

Dec 16, 2016 9:29:56 AM org.apache.jasper.servlet.TldScanner scanJars
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
09:29:56.861 [localhost-startStop-1] WARN com.something.common.ServiceContext - Found application_config, profile is: test

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/

Produces:
{"timestamp":"Dec 16, 2016 9:29:56 AM","class":"org.apache.jasper.servlet.TldScanner","method":"scanJars","tomcatmsg":"\nINFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.\n09:29:56.861 [localhost-startStop-1] WARN com.something.common.ServiceContext - Found application_config, profile is: test\n\n . ____ _ __ _ \n /\\ / ' __ _ () __ __ _ \ \ \ \\n( ( )\_ | '_ | '| | ' \/ ` | \ \ \ \\n \\/ )| |)| | | | | || (| | ) ) ) )\n ' || .__|| ||| |\, | / / / /\n =========||==============|/=///_/","server":"tranj-fluentd-s3b","stack":"my-app-pdev-01","application":"my-app","log_type":"filter.test","time":"2017-01-11T01:14:20Z"}

Input 2:
127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"

Produces no output, but in the td-agent.log:
2017-01-11 12:14:51 +1100 [warn]: plugin/in_tail.rb:390:block in parse_multilines: got incomplete line before first line from /var/log/filter_test.log: "127.0.0.1 - - NOTIME "GET / HTTP/1.0" 403 3985 "-" "ApacheBench/2.3"\n"

With Logstash's grok filter, it would capture the unmatched lines and store it in the output with grokparsefailure tag, but this plugin drops the invalid log lines.

Is it possible to make the plugin capture unmatched multiline log events?

insert fields based on grok matches

Would it be possible to assign a value to some extra field depending on which grok pattern matches. To slightly modify the example from another question (#44):

<source>
  @type tail
  path /path/to/log
  parsetype_key myeventtype
  <parse>
    @type grok
    <grok>
      pattern %{COMBINEDAPACHELOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
      parsetype_value apachelog
    </grok>
    <grok>
      pattern %{IP:ip_address}
      parsetype_value ipline
    </grok>
    <grok>
      pattern %{GREEDYDATA:message}
      parsetype_value message
    </grok>
  </parse>
</source>

The resulting event would have a new field named 'myeventtype' which would have the value of 'apachelog', 'ipline', 'message' depending on which of the grok patterns hit. If they all miss, maybe the default could be 'unknown' or something. The parsetype_key and parsetype_value configuration names are just suggestions on how it might be configured.

So, a good idea, or is there perhaps already a better way to achieve this sort of thing?

Requirements about docker/fluent/fluentd:v0.12.42

Which version should i use???
Requirements chapter of readme : < 1.0.0 if fluentd >= v0.12.0
or “If you want to use this plugin with Fluentd v0.12.x or earlier, you can use this plugin version v1.0.0.”

Support matching against multiple grok patterns

A canonical use case would be something like matching against CISCO ASA logs, which has several patterns.

It should look like something like

<source>
  type syslog
  format grok
  <grok>
    pattern1
  </grok>
  <grok>
    pattern2 #matched only if pattern1 does not match
  </grok>
  ...
</source>

HTTPDATE error with td-agent 3.1.1

td-agent-3.1.1-0.el7.x86_64.rpm
centos
example log

00.000.000.000 - - [03/Jul/2018:10:45:37 -0400] "GET /private/ASVTEST/marketing-test/check/health HTTP/1.1" 200 114

following code in td-agent.conf .. cannot start td-agent.service (Job for td-agent.service failed because the control process exited with error code. See "systemctl status td-agent.service" and "journalctl -xe" for details.)

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion}))" %{NUMBER:response} (?:%{NUMBER:bytes}|-) 

when HTTPDATE changed to GREEDYDATA ... it works

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{GREEDYDATA:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion}))" %{NUMBER:response} (?:%{NUMBER:bytes}|-)

code bit

<source>
  @type tail
  path /xxx/xxx/xxx/*access*
  pos_file /var/log/td-agent/access.pos
  tag test-access
  <parse>
  @type grok
  grok_pattern %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion}))" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
  </parse>
</source>

Types not working in V2.5.0

In previous version(ie 2.4.0) i used the below config:-

<parse>
   @type grok
   types log_timestamp:time:%d/%b/%Y:%H:%M:%S
   grok_pattern %{IPORHOST:remote_addr} - %{HTTPDUSER:remote_user} \[%{HTTPDATE:log_timestamp}\] "(?:%{WORD:http_verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version})?|%{DATA:rawrequest})" %{NUMBER:http_response_code} %{NUMBER:bytes:integer}( %{NUMBER:reqtime}| -)?( %{NUMBER:upstime}| -)?( %{HOSTPORT:upstream_addr} | - )?(?<pipe>[pP]|.)? %{QS:http_referer} %{QS:http_user_agent}
 </parse>

And got the output as below:-

{"remote_addr":"103.21.16.20","remote_user":"-","log_timestamp":1549156652000,"http_verb":"POST","request":"/game","http_version":"1.1","http_response_code":"200","bytes":"101","http_referer":"\"-\"","http_user_agent":"\"Python-urllib/2.7\"","log_file_path":"/home/access.log"}

But this does not seem to be working in the current version.can you provide me a solution

Thanks

Note:- I tried the following in new version and i am getting nil
\[%{HTTPDATE:log_timestamp:time:%d/%b/%Y:%H:%M:%S %z}\]

Negate Multiline Regex

Is it possible to negate the multiline regex?
it works on http://grokconstructor.appspot.com and in logstash/filebeat just trying to make work in fluentd to replace logstash and filebeat.

Using the following grok and multiline regex:
grok pattern:
[%{TIMESTAMP_ISO8601:sourceTimestamp}]\s|\s(?.?)\s|\s[(?.?)]\s|\s[%{IPORHOST:tempHost}]\s|\s%{LOGLEVEL:level}\s|\s(?.?)\s|\s%{IPORHOST:ipAddress}\s|\s(?.?)\s|\s?(?.?)?\s?|\s(?(.|\r|\n))

multiline regex: ^[0-9]+

Data:

[2018-01-10 13:22:01,128] | ServiceName | [84] | [ServerName] | ERROR | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | 66.249.73.137 | /pro/visitedproduct?pid=50724 | http://www.example.com/pr/quinn-popcorn-microwave-popcorn-vermont-maple-sea-salt-2-bags-3-6-oz-102-g-each/50724 | System.Web.HttpException (0x80004005): A public action method 'visitedproduct' was not found on controller 'iHerb.Web.Catalog.Controllers.ProController'.
   at System.Web.Mvc.Controller.HandleUnknownAction(String actionName)
   at System.Web.Mvc.Controller.<BeginExecuteCore>b__1d(IAsyncResult asyncResult, ExecuteCoreState innerState)
   at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
   at System.Web.Mvc.Controller.EndExecuteCore(IAsyncResult asyncResult)
   at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
   at System.Web.Mvc.Controller.EndExecute(IAsyncResult asyncResult)
   at System.Web.Mvc.MvcHandler.<BeginProcessRequest>b__5(IAsyncResult asyncResult, ProcessRequestState innerState)
   at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
   at System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Support for `format_firstline` directive

Right now I have a working solution using the regular fluentd multiline parser.

 format_firstline /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}/
 format1 /^(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) \[(?<thread>.*?)\] (?<severity>[A-z]*) (?<logger>.*?) - (?<message>.*)/

That is, I know my log blocks start with TIMESTAMP_ISO8601 and everything after it should be included until we see another timestamp to start a line.

Does grok-parser have support for these semantics?

As far as I can tell, multiline_start_regexp /^[^\s]/ tells the plugin that lines to be combined start with whitespace. Some of the stack traces I'm looking to pick up are not indented though.

Am I able to use negative lookahead in multiline_start_regexp in order to say "my multilines start with anything except a timestamp"?

i.e. multiline_start_regexp (?!\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})

Seems like field's type is not counted by this plugin.

Even though I am providing grokked field with type as follow:
%{NUMBER:response:integer}
It appears in EFK as type string.
Though no errors in fluentd's logs etc.
It seems like it's being ignored at all.
Any bits of advice on how can it be overcome?

root@fluentd-fluentd-elasticsearch-jpcqh:/# fluentd --version
fluentd 1.3.3

grok pattern don,t work

please help me thanks
I have a log

2019-07-30 10:31:28.882  INFO [tdf-cloud-gateway,2f55a7bf10aab48f,2f55a7bf10aab48f,true] 1 --- [r-http-epoll-41] c.c.t.gateway.filter.LocalLimiterFilter  : 192.168.70.40	admin	tdf-service-sys	2019-07-30T10:31:28.863Z	2019-07-30T10:31:28.882Z	true

My config

<source>
  @type tail
  path /opt/logs/tdf-cloud/*/*/*.log
  tag sysf.log
 <parse>
    @type grok
    <grok>
      pattern %{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:severity}\s+\[%{DATA:service},%{DATA:trace},%{DATA:span},%{DATA:exportable}\]\s+%{DATA:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{IP:ip}\s+%{USERNAME:usernmae}\s+%{NOTSPACE:appname}\s+%{TIMESTAMP_ISO8601:requesttime}\s+%{TIMESTAMP_ISO8601:reponsetime}\s+%{NOTSPACE:success}
    </grok>
  </parse>
</source>

I just want to collect the above format. but it collect But it also collects another format.my pattern is right it don't collection like this

2019-07-30 11:05:14.730 DEBUG [tdf-cloud-gateway,8496bd9186e8e190,8496bd9186e8e190,false] 1 --- [or-http-epoll-7] t.g.f.CustomReactiveAuthorizationManager : return true.jwtTokenStr:eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsInVzZXJfbmFtZSI6ImFkbWluIiwic2NvcGUiOlsiYWxsIl0sImV4cCI6MjI4Mzc4NTEwOSwiYXV0aG9yaXRpZXMiOlsiUk9MRV9BRE1JTiIsIlJPTEVfVVNFUiJdLCJqdGkiOiI2M2VjYTg2YS1jMWFjLTRkYTYtYjg3Mi00ZGI1NDJiMzgyYTgiLCJjbGllbnRfaWQiOiJzd2FnZ2VyIn0.FFm7ofGV2wt3FAKJiEkQOxouSFwAPAlQREQiEUXcXsUeSVwxnW1TpgVTMBCCBx4kaiXvpuI0qZXjGbSV4QNDg_aP-f--FEvW4kKEegUGtf_qOqqBtif0F-3O5esxM94Tm0PxF-5MA18YKgxw5VWBVkKCjZp6M54BW4GoIDuvHtn5H5b3_iYn12CjvIYdRl4GO0pqOAjM4m8hxTFfcA9YE0xHDvOp4YUAJsHtTyRSg5XLh6eBe-JIDmOb9Yl6ykDI-DgZwATl35iEoV9xbabEUtMzPr6KbkfjoYqnnqj-ULWZX1bFEvxTYxJXo0hgCxkl5o7zrnNN3yBxGdU7A6X_iw

i sure fluentd is collect,this is fluentd log

2019-07-30 11:06:54.730098385 +0000 sysf.log: {"message":"2019-07-30 11:06:54.729 DEBUG [tdf-cloud-gateway,d7fef7e2675b2277,d7fef7e2675b2277,false] 1 --- [or-http-epoll-7] t.g.f.CustomReactiveAuthorizationManager : return true.jwtTokenStr:eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZG1pbiIsInVzZXJfbmFtZSI6ImFkbWluIiwic2NvcGUiOlsiYWxsIl0sImV4cCI6MjI4Mzc4NTEwOSwiYXV0aG9yaXRpZXMiOlsiUk9MRV9BRE1JTiIsIlJPTEVfVVNFUiJdLCJqdGkiOiI2M2VjYTg2YS1jMWFjLTRkYTYtYjg3Mi00ZGI1NDJiMzgyYTgiLCJjbGllbnRfaWQiOiJzd2FnZ2VyIn0.FFm7ofGV2wt3FAKJiEkQOxouSFwAPAlQREQiEUXcXsUeSVwxnW1TpgVTMBCCBx4kaiXvpuI0qZXjGbSV4QNDg_aP-f--FEvW4kKEegUGtf_qOqqBtif0F-3O5esxM94Tm0PxF-5MA18YKgxw5VWBVkKCjZp6M54BW4GoIDuvHtn5H5b3_iYn12CjvIYdRl4GO0pqOAjM4m8hxTFfcA9YE0xHDvOp4YUAJsHtTyRSg5XLh6eBe-JIDmOb9Yl6ykDI-DgZwATl35iEoV9xbabEUtMzPr6KbkfjoYqnnqj-ULWZX1bFEvxTYxJXo0hgCxkl5o7zrnNN3yBxGdU7A6X_iw"}

[Question] Add information about grok parser failures to log.

Hi,

Is it possible to add special field when grokparser failed (something similar that we have in logstash)?
I was thinking about something like this:

<filter type.stdout.web-log>
    @type parser
    reserve_data yes 
    key_name message
    hash_value_field grok
    <parse>
        @type grok
        <grok>
            pattern %{COMBINEDAPACHELOG}
        </grok>
        <grok>
            pattern %{GREEDYDATA:message}
            <record>
                grokfailure 'true'
            </record>
        </grok>
    </parse>
</filter>

In above scenario, I've added section that will add specific field when second parser will be processed successfully. Unfortunately section is not used in .
Do you know any other methods how I can achieve this behaviour?
Is it possible to add section to grok plugin?

Thanks!

Multiple grok patterns with multiline not parsing the log

Hi.

I am facing problem while using fluentd-0.14.23. The logs are not being parsed even when i went according to the documentation and your Readme file. Can you help me a little with solving this issue? It is running under Docker container created by Kubernetes DaemonSet.

I uploaded the zip file which contains an example of log i am trying to parse with my configuration
data.zip

Thanks ahead.

multiline_grok not working as expected with td-agent v3.1.1 for windows 2012 R2

This may or may not be user error but I would appreciate some help.

Message is not being separated into multiple key value pairs as defined by the grok pattern. but the lines are being concatenated correctly. Multiline aspect works grok matching does not appear to be functioning.

Config

<source>
  @type tail
  path C:/Logs/*
  pos_file logs.log.pos
  <parse>
      @type multiline_grok
      grok_pattern \[(?<sourceTimestamp>%{DATE} %{TIME} (AM|PM))\]\t(?<tempMessage>(.|\r|\n)*)
      multiline_start_regexp /\[[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}\s[0-9]{1,2}\:[0-9]{2}\:[0-9]{2}\s(?:A|P)M\]/
  </parse>
    tag logs
</source>
<match **>
    @type copy
    # Output to Console
    <store>
      @type stdout
    </store>
  </match>

Sample Log Entry

[2/16/2018 10:19:34 AM] Message Line 1
Message Line 2
Message Line 3

Received Output

2018-02-16 11:00:50.115236000 -0800 logs: {"message":"[2/16/2018 10:19:34 AM]\tMessage Line 1\r\nMessage Line 2\r\nMessage Line 3\r\n"}

As you can see we only get the entire log concatenated into "message" instead of it being concatenated and split into "sourceTimestamp" and "tempMessage"

Maintainer switch-up?

@okkez, would you be interested in becoming the main maintainer? I'm really out of touch with coding, and it might be better for you to run this project. Probably the completion of #10 is a good timing.

Happy to transfer the repository to you as well. Thank you for your help!

Unknown format template 'grok'

When trying to use the v1.0.0 version of this plugin with fluentd 0.12.29, I am getting an error from fluentd saying that the grok template format is unknown? I am relatively new to fluentd and am not sure what this error is about.

config error file="/etc/fluent/fluent.conf" error="Unknown format template 'grok'"

Info on my setup...

I'm using fluentd in a docker container and I intend on using this grok parser to grab certain lines from /var/log/audit/audit.log...

Fluentd Version:

# fluentd --version
fluentd 0.12.29

Input Config:

      <source>
        @type tail
        path /var/log/audit/audit.log
        tag system.*
        format grok
        grok_pattern %{WORD:pam_module}\(%{DATA:pam_caller}\): session %{WORD:pam_session_state} for user %{USERNAME:username}(?: by %{GREEDYDATA:pam_by})?
      </source>

Error when I start fluentd:

2017-11-07 16:35:00 +0000 [error]: config error file="/etc/fluent/fluent.conf" error="Unknown format template 'grok'"
2017-11-07 16:35:00 +0000 [error]: fluentd main process died unexpectedly. restarting.

Steps to install fluentd-plugin-grok-parser plugin (from Dockerfile):

RUN \
  GEM='fluent-plugin-grok-parser' && \
  TAR_FILE='https://github.com/fluent/fluent-plugin-grok-parser/archive/v1.0.0.tar.gz' && \
  mkdir /${GEM} && \
  curl -sL "${TAR_FILE}" | \
    tar -xzvC /${GEM} --strip-components=1 && \
  cd /${GEM} && \
  gem build *.gemspec && \
  cd / && \
  fluent-gem install -N --conservative --minimal-deps \
    /${GEM}/*.gem && \
  cp -r /${GEM}/*.gem /etc/fluent/plugin && \
  rm -rf /${GEM}

US-ASCII error on default grok-pattern file

Hi,

since fluent version 0.14.17/0.14.18 with the fluent-plugin-grok-parser plugin 2.3.1, which is reading the default grok-patterns file, I get a /var/lib/gems/2.3.0/gems/fluent-plugin-grok-parser-2.1.3/lib/fluent/plugin/grok.rb:39:in `block in add_patterns_from_file': invalid byte sequence in US-ASCII (ArgumentError).
The default grok pattern file contains month where the german character "ä" ( May/März) is not us-ascii compatible. This was working with 0.14.16 with the same plugin version for fluent-plugin-grok-parser 2.3.1., but now it is crashing. I am using an Ubuntu container, where I have just changed the version number for fluentd and now it is crashing with: block in add_patterns_from_file': invalid byte sequence in US-ASCII (ArgumentError).

2017-06-30 09:57:27 +0000 [info]: reading config file path="/etc/fluent/fluent.conf"
/var/lib/gems/2.3.0/gems/fluent-plugin-grok-parser-2.1.3/lib/fluent/plugin/grok.rb:39:in `block in add_patterns_from_file': invalid byte sequence in US-ASCII (ArgumentError)
from /var/lib/gems/2.3.0/gems/fluent-plugin-grok-parser-2.1.3/lib/fluent/plugin/grok.rb:38:in each_line'
from /var/lib/gems/2.3.0/gems/fluent-plugin-grok-parser-2.1.3/lib/fluent/plugin/grok.rb:38:in add_patterns_from_file'
from /var/lib/gems/2.3.0/gems/fluent-plugin-grok-parser-2.1.3/lib/fluent/plugin/parser_grok.rb:30:in block in configure'
from /var/lib/gems/2.3.0/gems/fluent-plugin-grok-parser-2.1.3/lib/fluent/plugin/parser_grok.rb:29:in glob'
from /var/lib/gems/2.3.0/gems/fluent-plugin-grok-parser-2.1.3/lib/fluent/plugin/parser_grok.rb:29:in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/plugin.rb:164:in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/plugin_helper/parser.rb:90:in block in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/plugin_helper/parser.rb:85:in each'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/plugin_helper/parser.rb:85:in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/plugin/filter_parser.rb:42:in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/plugin.rb:164:in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/agent.rb:152:in add_filter'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/agent.rb:70:in block in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/agent.rb:64:in each'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/agent.rb:64:in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/root_agent.rb:112:in configure'
from /var/lib/gems/2.3.0/gems/fluentd-0.14.18/lib/fluent/engine.rb:131:in configure'

Any idea, what was changed?

Regards,

Olaf

fluentd can not init grok-parser plugin, return error="uninitialized constant Fluent::Plugin::GrokParser::NoneParser"

here are output lines:

2017-02-06 08:45:38 +0000 [info]: adding filter pattern="apache_access" type="parser"
2017-02-06 08:45:38 +0000 [error]: #0 unexpected error error_class=NameError error="uninitialized constant Fluent::Plugin::GrokParser::NoneParser"
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-grok-parser-2.1.0/lib/fluent/plugin/parser_grok.rb:19:in `initialize'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin.rb:149:in `new'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin.rb:149:in `new_impl'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin.rb:123:in `new_parser'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin_helper/parser.rb:89:in `block in configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin_helper/parser.rb:85:in `each'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin_helper/parser.rb:85:in `configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin/filter_parser.rb:41:in `configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/plugin.rb:164:in `configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/agent.rb:149:in `add_filter'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/agent.rb:69:in `block in configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/agent.rb:64:in `each'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/agent.rb:64:in `configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/root_agent.rb:86:in `configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/engine.rb:117:in `configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/engine.rb:91:in `run_configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/supervisor.rb:743:in `run_configure'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/supervisor.rb:494:in `block in run_worker'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/supervisor.rb:671:in `call'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/supervisor.rb:671:in `main_process'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/supervisor.rb:490:in `run_worker'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/lib/fluent/command/fluentd.rb:300:in `<top (required)>'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.12/bin/fluentd:5:in `<top (required)>'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/bin/fluentd:23:in `load'
  2017-02-06 08:45:38 +0000 [error]: #0 /opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
  2017-02-06 08:45:38 +0000 [error]: #0 /usr/sbin/td-agent:7:in `load'
  2017-02-06 08:45:38 +0000 [error]: #0 /usr/sbin/td-agent:7:in `<main>'
2017-02-06 08:45:38 +0000 [info]: Worker 0 finished unexpectedly with status 1

Here are my setting:

<filter apache_access>
  @type parser
  reserve_data true
  key_name message
  <parse>
  @type grok
   grok_pattern \[%{HTTPDATE:Timestamp}\] "(?:%{WORD:http_method} %{NOTSPACE:http_url}(?: HTTP/%{NUMBER:http_version})?|%{DATA:rawrequest})" %{NUMBER:http_status} (?:\d+|-)
   time_key Timestamp
   time_format %d/%b/%Y:%H:%M:%S %z
   keep_time_key true
  </parse>
</filter>

after we comment it ,fluentd is well.
Thanks all for help.

date string parsing: [error]: #0 error_class=Encoding::UndefinedConversionError error="\"\\xC3\" from ASCII-8BIT to UTF-8"

Hi,

this kind of log line, nginx log:
100.68.0.1 - - [17/Feb/2017:11:01:03 +0000] "GET /healthcheck HTTP/1.1" 200 2 "-" "Go-http-client/1.1" "-"

is always producing dumps. It is the parsing of the timestamp, which is failing all the time.

If I try grok_pattern %{COMBINEDAPACHELOG}
time_format "%d/%b/%Y:%H:%M:%S %z"

or
%{IP:remote_addr} %{NOTSPACE:ident} %{NOTSPACE:auth} [%{HTTPDATE:timestamp}]

or

%{IP:remote_addr} %{NOTSPACE:ident} %{NOTSPACE:auth} [%{HAPROXYDATE:timestamp}]

I always get:

2017-02-17 11:14:57 +0000 [error]: #0 error_class=Encoding::UndefinedConversionError error=""\xC3" from ASCII-8BIT to UTF-8"
If I parse until the date starts, it is working. Any idea, what I am doing wrong?

Thanks a lot and regards,

Olaf

/usr/local/bin/fluentd --u
2017-02-17 11:14:56 +0000 [info]: reading config file path="/etc/fluent/fluent.conf"
2017-02-17 11:14:56 +0000 [info]: starting fluentd-0.14.12 pid=2203
2017-02-17 11:14:56 +0000 [info]: spawn command to main: cmdline=["/usr/bin/ruby2.3", "-Eascii-8bit:ascii-8bit", "/usr/local/bin/fluentd", "--use-v1-config", "--under-supervisor"]
2017-02-17 11:14:57 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.9.2'
2017-02-17 11:14:57 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.1.2'
2017-02-17 11:14:57 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.26.2'
2017-02-17 11:14:57 +0000 [info]: gem 'fluentd' version '0.14.12'
2017-02-17 11:14:57 +0000 [info]: adding filter pattern="kubernetes." type="kubernetes_metadata"
2017-02-17 11:14:57 +0000 [info]: adding filter pattern="kubernetes.var.log.containers.swift-proxy
" type="parser"
2017-02-17 11:14:57 +0000 [info]: #0 Expanded the pattern (?(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?|(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])...)(?![0-9]))|\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(.?|\b))) (?[a-zA-Z][a-zA-Z0-9_.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(.?|\b)|[a-zA-Z0-9.-]+) (?[a-zA-Z][a-zA-Z0-9.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(.?|\b)|[a-zA-Z0-9._-]+) [(?(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])/\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|Mm?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|Oo?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b/(?>\d\d){1,2}:(?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9]) (?:[+-]?(?:[0-9]+)))] "(?:(?\b\w+\b) (?\S+)(?: HTTP/(?(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:.[0-9]+)?)|(?:.[0-9]+))))))?|(?.?))" (?(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:.[0-9]+)?)|(?:.[0-9]+))))) (?:(?(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:.[0-9]+)?)|(?:.[0-9]+)))))|-) (?(?>(?<!\)(?>"(?>\.|[^\\"]+)+"|""|(?>'(?>\.|[^\\']+)+')|''|(?>(?>\\.|[^\\]+)+)|``))) (?<agent>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>(?>\.|[^\\`]+)+`)|``))) into (?<clientip>(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?|(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))|\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))) (?<ident>[a-zA-Z][a-zA-Z0-9_.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)|[a-zA-Z0-9._-]+) (?<auth>[a-zA-Z][a-zA-Z0-9_.+-=:]+@\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)|[a-zA-Z0-9._-]+) \[(?<timestamp>(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])/\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b/(?>\d\d){1,2}:(?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9]) (?:[+-]?(?:[0-9]+)))\] "(?:(?<verb>\b\w+\b) (?<request>\S+)(?: HTTP/(?<httpversion>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))))?|(?<rawrequest>.*?))" (?<response>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))))) (?:(?<bytes>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))|-) (?<referrer>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))) (?<agent>(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))
2017-02-17 11:14:57 +0000 [error]: #0 error_class=Encoding::UndefinedConversionError error=""\xC3" from ASCII-8BIT to UTF-8"

2017-02-17 11:16:53 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="parse failed undefined method `parse' for #String:0x00000003de6090" tag="kubernetes.var.log.containers.swift-proxy-cluster-3-1476268755-86fop_swift_collector-c344944c94d0a310467cf0037733a87a6c0d25283a550322a35c1ff4a3f6d347.log" time=#<Fluent::EventTime:0x000000043fcfb0 @sec=1487301796, @nsec=290669046> record={"log"=>" File "/usr/lib/python2.7/ConfigParser.py", line 305, in read\n", "stream"=>"stderr", "time"=>"2017-02-17T03:23:16.290669046Z", "docker"=>{"container_id"=>"c344944c94d0a310467cf0037733a87a6c0d25283a550322a35c1ff4a3f6d347"}, "kubernetes"=>{"namespace_name"=>"swift", "pod_id"=>"cc2e079f-f456-11e6-9e19-8699b28281dc", "pod_name"=>"swift-proxy-cluster-3-1476268755-86fop", "labels"=>{"component"=>"swift-proxy-cluster-3", "pod-template-hash"=>"1476268755"}, "host"=>"minion0.cc.staging.cloud.sap", "container_name"=>"collector"}}
Killed

parse failed undefined method `parse' for #<String:

Not sure which of the plugins is the culprit, so I'll raise the issue in both and hopefully I can get pointed in the right direction. I'm trying to use the grok-parser plugin to format logs captured with the parser plugin to grok elasticsearch slowlogs. Here is the test configuration:

<source>
  @type dummy
  dummy {"log":"[2016-08-21 18:02:29,649][WARN ][index.search.slowlog.fetch] [logstash-2016.08.21]took[3s], took_millis[3080], types[], stats[], search_type[QUERY_AND_FETCH], total_shards[1], source[{\"size\":500,\"sort\":[{\"@timestamp\":{\"order\":\"desc\",\"unmapped_type\":\"boolean\"}}],\"query\":{\"filtered\":{\"query\":{\"query_string\":{\"analyze_wildcard\":true,\"query\":\"*\"}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"gte\":1471801647542,\"lte\":1471802547542,\"format\":\"epoch_millis\"}}}],\"must_not\":[]}}}},\"highlight\":{\"pre_tags\":[\"@kibana-highlighted-field@\"],\"post_tags\":[\"@/kibana-highlighted-field@\"],\"fields\":{\"*\":{}},\"require_field_match\":false,\"fragment_size\":2147483647},\"aggs\":{\"2\":{\"date_histogram\":{\"field\":\"@timestamp\",\"interval\":\"30s\",\"time_zone\":\"America/Chicago\",\"min_doc_count\":0,\"extended_bounds\":{\"min\":1471801647541,\"max\":1471802547542}}}},\"fields\":[\"*\",\"_source\"],\"script_fields\":{},\"fielddata_fields\":[\"@timestamp\"]}], extra_source[], \n","stream":"stdout","time":"2016-08-21T18:02:29.650016637Z"}
  tag dummy-data
</source>

<filter dummy-data>
  @type parser
  format grok
  <grok>
    grok_pattern %{ESINDEXSEARCHSLOWLOGS}
    custom_pattern_path /opt/fluentd/grok-patterns
  </grok>
  key_name log
</filter>

<match dummy-data>
  type stdout
</match>

I'm getting this error in the output of fluentd:
[warn]: parse failed undefined method parse' for #String:0x0000000329d3c8`

Lose time field after grok filter

Versions:

source 'https://rubygems.org'

gem 'fluentd', '<=1.2.5'
gem 'activesupport', '~>5.2.1'
gem 'fluent-plugin-kubernetes_metadata_filter', '~>2.0.0'
gem 'fluent-plugin-elasticsearch', '~>2.11.5'
gem 'fluent-plugin-systemd', '~>1.0.1'
gem 'fluent-plugin-detect-exceptions', '~>0.0.11'
gem 'fluent-plugin-prometheus', '~>1.0.1'
gem 'fluent-plugin-multi-format-parser', '~>1.0.0'
gem 'fluent-plugin-grok-parser','~>2.2.0'
gem 'oj', '~>3.6.5'

My config:

<source>
  @id fluentd-containers.log
  @type tail
  path /mnt/logs/*.log
  pos_file /var/log/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag raw.kubernetes.*
  read_from_head true
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key time
      time_format %Y-%m-%dT%H:%M:%S.%NZ
    </pattern>
    <pattern>
      format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
      time_format %Y-%m-%dT%H:%M:%S.%N%:z
    </pattern>
  </parse>
</source>

<filter raw.kubernetes.**>
  @type parser
  key_name log
  keep_time_key true
  <parse>
    @type grok
    <grok>
      pattern %{IPORHOST} - \[%{IPORHOST:the_real_ip}\] - (?:-|%{USERNAME:remote_user}) \[%{HTTPDATE:time_local}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:-|%{DATA:referer})" "(?:-|%{DATA:agent})" %{NUMBER:request_length} %{NUMBER:request_time} \[%{IPORHOST:proxy_upstream_name}\] %{IPORHOST:upstream_addr}:%{POSINT} %{NUMBER:upstream_response_length} %{NUMBER:upstream_response_time} %{NUMBER:upstream_status} %{BASE16NUM:req_id}
    </grok>
    <grok>
      pattern (?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}
    </grok>
  </parse>  
</filter> 


<match **>
  @type stdout
  @id stdout_output
</match>

I'm trying to parse string like:

{"log":"127.0.0.1 - [127.0.0.1] - - [09/Sep/2018:12:13:28 +0000] \"POST /images/rpc HTTP/1.1\" 200 2724 \"-\" \"curl/7.54.0\" 515 0.007 [kube-public-my-service-80] 127.0.0.1:1000 2724 0.008 200 d01a314ea75b826dd35aabc40771b786\n","stream":"stdout","time":"2018-09-09T12:13:28.802648471Z"}

And after <filter> section, messages lose time field. If I delete <filter> config block, all working fine.

TIMESTAMP_ISO8601 not working

My Source is
`
@type tcp
port 24224
bind 0.0.0.0
tag all
format grok
grok_pattern %{TIMESTAMP_ISO8601:time} +%{DATA:UNWANTED}: id=%{DATA:id} time="%{DATA:UNWANTED}" pri=%{BASE10NUM:pri:integer} fw=%{IP:fw} vpn=%{DATA:vpn} user=%{DATA:user} realm="%{DATA:realm}" roles="%{DATA:roles}" (type=%{WORD:type} )?(((proto=(%{DATA:proto})? src=(%{IP:src})? dst=%{IP:dst}? (dstname=%{DATA:dstname})?)?)?((( type=%{WORD:type})? op=(%{WORD:op})? arg="%{DATA:arg}?" result=%{WORD:result}?)? sent=(%{DATA:sent}) rcvd=(%{DATA:rcvd}) )?(agent="(%{DATA:agent})" duration=(%{DATA:duration}) )?)?msg="%{DATA:evt_id}: +%{DATA:msg}"$

`

All the other fields are parsed but not the TIMESTAMP_ISO8601 one... and it also doesn't work when we use time as the name of the field... if I use tame as the name of the field then it is parsed into a string but the other fields which should be present are not.

2016-05-06T13:10:54-07:00 UseLess: id=firewall time="2016-05-06 13:10:54" pri=6 fw=0.0.0.0 vpn=ive user=System realm="Admin Users" roles=".Administrators" type=mgmt msg="ERROR3012: Successfully executed 'Renew Credential Request (Admin)'.

Print out missing pattern when you can't find it

This error is helpful:
2016-04-21 09:23:06 -0700 [error]: unexpected error error="Fluent::Grok::GrokPatternNotFoundError"

but would be much more so if it also included the pattern that it didn't resolve.

timezone support

It would be nice to be able to specify the timezone within the <grok> block as it works with regexp parser:

    <grok>
      pattern  /(?<time>.+) (?<message>.*)/
      time_format %m/%d/%y %H:%M:%S:%N
      timezone Europe/Berlin
    </grok>

Using array type with space delimiter

Hi,

I have a log item that has an array using space as delimiter, and I'm unable to get this to work with the array type.

Is this supported in any way? I've tried %{GREEDYDATA:myfield:array: } and then it splits each character up in separate items in the array, so I end up with a char array.

Example data: values="[my/first/item my/second/item]"
I would like an array like so: ["my/first/item", "my/second/item"]

Time conversion always fail

Hi,

I try to parse the time of the following string but it alwasy fail
May 23 11:50:40 localhost systemd[1]: Stopping System Logging Service...

here is my config

<source>
  @type tail
  path /var/log/syslog
  tag test
  <parse>
    @type grok
    grok_pattern %{SYSLOGTIMESTAMP:timestamp:time:%d/%b/%Y:%H:%M:%S} %{GREEDYDATA:message}
  </parse>
</source>

and fail message
2017-05-23 14:46:21.687243345 +0800 test: {"timestamp":null}

If I just write the following pattern:
%{SYSLOGTIMESTAMP:timestamp:time}

the output is
2017-05-23 15:08:57.449623167 +0800 test: {"timestamp":1495523337,"message":"localhost systemd[1]: Stopping System Logging Service..."}

Am I misunderstand something ?

uninitialized constant & multiline regex match issue

Hello,

When trying to use the plugin without multiline I always get this error:

2015-12-09 16:36:03 +0200 [error]: unexpected error error="uninitialized constant Fluent::TextParser::MultilineGrokParser"
  2015-12-09 16:36:03 +0200 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-grok-parser-0.0.3/lib/fluent/plugin/grok.rb:23:in `initialize'
  2015-12-09 16:36:03 +0200 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-grok-parser-0.0.3/lib/fluent/plugin/parser_grok.rb:21:in `new'
  2015-12-09 16:36:03 +0200 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-grok-parser-0.0.3/lib/fluent/plugin/parser_grok.rb:21:in `configure'

And when running in multiline mode, no matter how and what regex pattern I set for detecting a new log message I always get this error:

2015-12-09 16:39:26 +0200 [error]: empty range in char class: followed by the log line it had read.
The source seems to be the following:

2015-12-09 16:39:26 +0200 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-grok-parser-0.0.3/lib/fluent/plugin/parser_multiline_grok.rb:23:in `match'
  2015-12-09 16:39:26 +0200 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-grok-parser-0.0.3/lib/fluent/plugin/parser_multiline_grok.rb:23:in `firstline?'
  2015-12-09 16:39:26 +0200 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.12/lib/fluent/plugin/in_tail.rb:266:in `block in parse_multilines'

Keep in mind, running it without setting the multiline_start_regexp will work, of course without benefiting from the multiline aggregation functionality.

Also please fix the typo in the example config you have in the Readme because there you specify the parameter as being multiline_start_regex, without a p in the end, which basically makes the plugin run without the multi line detection.

Thanks

P.S. - for reference:

  • td-agent v0.12.12
  • fluentd v0.12.17
  • fluent-plugin-grok-parser v0.0.3
  • ruby 1.9.3p484

running on Linux Mint 17.2 (Ubuntu 14.04.2 LTS Trusty)

More licensing issues

I had a glance at https://github.com/kiyoto/fluent-plugin-grok-parser/blob/master/lib/fluent/plugin/parser_grok.rb and much of it is very clearly a derivative of the ruby grok implementation, but the license on your project claims copyright on code that is copied directly and in other cases are pretty obvious derivative work of the jls-grok rubygem (jordansissel/ruby-grok).

Licensing and authorship in open source is a very important thing, and plagiarism must be avoided if communities are to work together.

Please do not claim copyright on that which you do not own.

Undefined method parse

Hello,

I recently installed your plugin for fluentd. After configuring as below,

<source>
    type tail
    path /var/log/supervisor/statsApi_1.stdout.log
    format grok
    grok_pattern %{TIMESTAMP_ISO8601:timestamp}:%{USER:module}:%{USER:lib} - %{DATA:message}
    tag vt4.api.stats
  </source>

I receive the following errors in td-agent.log:

error="undefined method `parse' for #<Fluent::TextParser::GrokParser:0x002b2e47067700>"

Am I missing something here? The patterns seem to expand according to the same log ( "Expanded the pattern"...).

Thanks

Can i use grok in <match> or <filter>? fluent-plugin-grok-parser=v1.0.0

<match **>
@type copy

format grok
grok_pattern %{DATA:log_time} - %{DATA:log_name} - %{DATA:log_level} - %{GREEDYDATA:message}

<store>
@type stdout
</store>

<store>
@include common.elasticsearch.conf
</store>

</match>

I use grok parser like this but i did not get the result of grok parser

multiline_grok

Hello
First thank you for your open source project.

I use fluent-plugin-grok-parser. my configuration is like this:

@type tail path /var/log/error.log format multiline_grok grok_pattern %{IP:ip_address}\n%{GREEDYDATA:message} multiline_start_regexp /^\s/ tag grokked_log

<match **>
type stdout

and my log file(/var/log/error.log) is like this:
1.2.3.4
client address

I just got this warning in td-agent and it does not parse the log file.
got incomplete line before first line from
there is no parsed log

Thanks a lot

More of a question than an issue

We just switched from filebeat/logstash. With logstash we were able to tag different messages based on the grok that parsed them successfully. Is it possible to tag messages differently based on the grok that successfully parsed?

something like...

<source>
  @type tail
  path /path/to/log
  tag grokked_log
  <parse>
    @type grok
    <grok>
      pattern %{COMBINEDAPACHELOG}
      time_format "%d/%b/%Y:%H:%M:%S %z"
      tag tag_one
    </grok>
    <grok>
      pattern %{IP:ip_address}
      tag tag_two
    </grok>
    <grok>
      pattern %{GREEDYDATA:message}
      tag tag_three
    </grok>
  </parse>
</source>

This would be an awesome feature if we could do this. We are at the ground level of a project that we are about to open source. It is a machine learning framework for the EFK stack. Being able to tag individual log messages accordingly would be huge for us.

thanks in advance.

time_format is ignored in <grok> section

Here is a sample fluentd configuration:

<system>
  log_level warn
</system>

<source>
  @type exec
  run_interval 3s
  format json

  command echo '{"message":"127.0.0.1 - - [21/Nov/2024:17:42:53 +0000] "GET / HTTP/1.1" 200 3189 "-" "check_http/v2.0.x (monitoring-plugins 2.0.x)"}'

  <parse>
    @type grok
    time_key timestamp

    <grok>
       pattern %{HTTPD_COMBINEDLOG:timestamp:time:%F %T,%L %z}
       time_format %d/%b/%Y:%H:%M:%S %z
    </grok>
  </parse>

  tag first
</source>

<filter first>
  @type record_transformer
  enable_ruby true

  <record>
    hostname "#{Socket.gethostname}"
    time ${time}
  </record>
</filter>

<match **>
  @type stdout
</match>

OUTPUT

2018-11-23 19:18:29 +0000 [warn]: parameter 'time_format' in <grok>
  pattern "(?<timestamp>(?:(?<clientip>(?:(?:(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:)))(%.+)?)|(?:(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))))|(?:\\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\\.?|\\b)))) (?<ident>(?:(?:[a-zA-Z][a-zA-Z0-9_.+-=:]+)@(?:\\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\\.?|\\b)))|(?:(?:[a-zA-Z0-9._-]+))) (?<auth>(?:(?:[a-zA-Z][a-zA-Z0-9_.+-=:]+)@(?:\\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\\.?|\\b)))|(?:(?:[a-zA-Z0-9._-]+))) \\[(?<timestamp>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\\b)/(?:(?>\\d\\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+))))\\] \"(?:(?<verb>\\b\\w+\\b) (?<request>\\S+)(?: HTTP/(?<httpversion>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\\.[0-9]+)?)|(?:\\.[0-9]+)))))))?|(?<rawrequest>.*?))\" (?<response>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\\.[0-9]+)?)|(?:\\.[0-9]+)))))) (?:(?<bytes>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\\.[0-9]+)?)|(?:\\.[0-9]+))))))|-)) (?<referrer>(?:(?>(?<!\\\\)(?>\"(?>\\\\.|[^\\\\\"]+)+\"|\"\"|(?>\'(?>\\\\.|[^\\\\\']+)+\')|\'\'|(?>`(?>\\\\.|[^\\\\`]+)+`)|``)))) (?<agent>(?:(?>(?<!\\\\)(?>\"(?>\\\\.|[^\\\\\"]+)+\"|\"\"|(?>\'(?>\\\\.|[^\\\\\']+)+\')|\'\'|(?>`(?>\\\\.|[^\\\\`]+)+`)|``)))))"
  time_format %d/%b/%Y:%H:%M:%S %z
</grok> is not used.
2018-11-23 19:18:32.607140903 +0000 first: {"timestamp":"21/Nov/2024:17:42:53 +0000","clientip":"127.0.0.1","ident":"-","auth":"-","verb":"GET","request":"/","httpversion":"1.1","response":"200","bytes":"3189","referrer":"\"-\"","agent":"\"check_http/v2.0.x (monitoring-plugins 2.0.x)\"","hostname":"localhost","time":"2018-11-23 19:18:32 +0000"}
2018-11-23 19:18:35.606768480 +0000 first: {"timestamp":"21/Nov/2024:17:42:53 +0000","clientip":"127.0.0.1","ident":"-","auth":"-","verb":"GET","request":"/","httpversion":"1.1","response":"200","bytes":"3189","referrer":"\"-\"","agent":"\"check_http/v2.0.x (monitoring-plugins 2.0.x)\"","hostname":"localhost","time":"2018-11-23 19:18:35 +0000"}

This one works, but you can't use multiply patterns obviously:

<system>
  log_level warn
</system>

<source>
  @type exec
  run_interval 3s
  format json

  command echo '{"message":"127.0.0.1 - - [21/Nov/2024:17:42:53 +0000] "GET / HTTP/1.1" 200 3189 "-" "check_http/v2.0.x (monitoring-plugins 2.0.x)"}'

  <parse>
    @type grok
    time_format %d/%b/%Y:%H:%M:%S %z
    time_key timestamp
    grok_pattern %{HTTPD_COMBINEDLOG:timestamp:time:%F %T,%L %z}
  </parse>

  tag first
</source>

<filter first>
  @type record_transformer
  enable_ruby true

  <record>
    hostname "#{Socket.gethostname}"
    time ${time}
  </record>
</filter>

<match **>
  @type stdout
</match>

OUTPUT

2024-11-21 17:42:53.000000000 +0000 first: {"clientip":"127.0.0.1","ident":"-","auth":"-","verb":"GET","request":"/","httpversion":"1.1","response":"200","bytes":"3189","referrer":"\"-\"","agent":"\"check_http/v2.0.x (monitoring-plugins 2.0.x)\"","hostname":"localhost","time":"2024-11-21 17:42:53 +0000"}
2024-11-21 17:42:53.000000000 +0000 first: {"clientip":"127.0.0.1","ident":"-","auth":"-","verb":"GET","request":"/","httpversion":"1.1","response":"200","bytes":"3189","referrer":"\"-\"","agent":"\"check_http/v2.0.x (monitoring-plugins 2.0.x)\"","hostname":"localhost","time":"2024-11-21 17:42:53 +0000"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.