sirkon / ldetool Goto Github PK

View Code? Open in Web Editor NEW

317.0 7.0 25.0 843 KB

Code generator for fast log file parsers

License: MIT License

Go 98.88% ANTLR 1.05% Makefile 0.07%

log-parsing parsing bigdata datamining logs-analysis parsing-csv logs-parsing

ldetool's Introduction

ldetool means line data extraction tool

ldetool is a command line utility to generate Go code for fast log files parsing.

go install github.com/sirkon/ldetool@latest

How it works.

First write extraction script, we usually name it <something>.lde
Generate go code with ldetool <something.lde> --package main. Of course you can use your own package name, not only main
Use it via the generated extraction method Parse(line []byte).

It turned out we like using it even for non-performant tasks, where we are dealing with strings, not slices of bytes and it would be handy to use it for strings as well without manual type casting. There's an option to generate code that use string, just put an option --go-string

CLI utility options

--go-string generates code that uses string everywhere instead of []byte. You better not to use it for log processing as it may lead to excessive memory allocations.
--yaml-dict or --json-dict sets translation rules for names. For instance, if we have YAML file with
```
http: HTTP
```
and feed this file to the ldetool then every name (of field or rule itself) like GetHttpHandle or get_http_handle will be translated into GetHTTPHandle
--package <pkg name> name of the package to use in generated code. If a directory of *.lde file has other Go files package name will automatically setup with these files' package name.
--big-endian or --little-endian sets the target architecture to be either big or little endian. This enables prefix check optimization

Example

Take a look at these two lines

[2017-09-02T22:48:13] FETCH first[1] format[JSON] hidden[0] userAgent[Android App v1.0] rnd[21341975] country[MA]
[2017-09-02T22:48:14] FETCH first[0] format[JSON] userAgent[Android App v1.0] rnd[10000000] country[LC]

We likely need a time, value of parameter first, format, hidden, userAgent and country. We obviously don't need rnd

Extraction script syntax

See more details on extraction rules

# filename: line.lde
Line =                                   # Name of the extraction object' type
  ^'[' Time(string) ']'                  # The line must start with [, then take everything as a struct field Time string right to ']' character
  ^" FETCH "                             # Current rest must starts with " FETCH " string
  ^"first[" First(uint8) ']'[1]          # The rest must starts with "first[" characters, then take the rest until ']' as uint8. It is
                                         # known First is the single character, thus the [1] index.
                                         # under the name of First
  ^" format[" Format(string) ~']'        # Take format id. Format is a short word: XML, JSON, BIN. ~ before lookup oobject suggests
                                         # generator to use for loop scan rather than IndexByte, which is although fast
                                         # has call overhead as it cannot be inlined by Go compiler.
  ?Hidden (^" hidden[" Value(uint8) ']') # Optionally look for " hidden[\d+]"
  ^" user_agent[" UserAgent(string) ']'  # User agent data
  _ "country[" Country(string)  ']'      # Look for the piece starting with country[
;

Code generation

The recommended way is to put something like //go:generate ldetool --package main Line.lde in generate.go of a package and then generate a code with

go generate <project path>

It will be written into line_lde.go file in the same directory. It will look like this

Now, we have

Data extractor type

// Line autogenerated parser
type Line struct {
    Rest   []byte
    Time   []byte
    First  uint8
    Format []byte
    Hidden struct {
        Valid bool
        Value uint8
    }
    UserAgent []byte
    Country   []byte
}

Parse method
```
// Extract autogenerated method of Line
func (p *Line) Extract(line []byte) (bool, error) {
   …
}
```
Take a look at return data. First bool signals if the data was successfully matched and error that is not nil signals if there were any error. String to numeric failures are always treated as errors, you can put ! into extraction script and all mismatches after the sign will be treated as errors

Helper to access optional Hidden area returning default Go value if the the area was not matched

// GetHiddenValue retrieves optional value for HiddenValue.Name
func (p *Line) GetHiddenValue() (res uint8) {
    if !p.Hidden.Valid {
        return
    }
    return p.Hidden.Value
}

Generated code usage

It is easy: put

l := &Line{}

before and then feed Extract method with lines:

scanner := bufio.NewScanner(reader)
for scanner.Scan() {
    ok, err := l.Extract(scanner.Bytes())
    if !ok {
        if err != nil {
            return err
        }
        continue
    }
    …
    l.Format
    l.Time
    l.GetHiddenValue()
    …
}

custom types

Special thanks to Matt Hook (github.com/hookenz) who proposed this feature

It is possible to use custom types in generated structure. You should declare them first via

type pkg.Type from "pkgpath";

for external types and

type typeName;

for local types before all rules definitions and you can use them as field types. The parsing to be done via

p.unmarshal<FieldName>([]byte) (Type, error)

function.

Example:

type time.Time from "time";
type net.IP from "net";

Custom = Time(time.Time) ' ' ?Addr(^"addr: " IP(ip.IP) ' ');

Now, two parsing functions will be needed to parse this (they are to be written manually):

func (p *Custom) unmarshalTime(s string) (time.Time, error) { … }

func (p *Custom) unmarshalAddrIP(s string) (net.IP, error) { … }

ldetool's People

Contributors

Stargazers

Watchers

ldetool's Issues

consume multiple chars edge case

thank you for adding the *'c' rule, i think i found an edge case though:
If i have a lde rule like this
LDERule1 =
*' ' Val1 (int) " |"
*' ' Val2 (int) " |"
;

the resulting code:

	// Pass all characters ' ' at the rest start
	for headPassCounter, headPassValue = range string(p.Rest) {
		if headPassValue != ' ' {
			break
		}
	}
	if headPassCounter > 0 {
		p.Rest = p.Rest[headPassCounter:]
	}

	// Take until " |" as Val1(int)
	pos = strings.Index(p.Rest, spaceBar)
	if pos >= 0 {
		tmp = p.Rest[:pos]
		p.Rest = p.Rest[pos+len(spaceBar):]
	} else {
		return false, nil
	}
	if tmpInt, err = strconv.ParseInt(tmp, 10, 64); err != nil {
		return false, fmt.Errorf("cannot parse `%s` into field Val1(int): %s", tmp, err)
	}
	p.Val1 = int(tmpInt)

	// Pass all characters ' ' at the rest start
	for headPassCounter, headPassValue = range string(p.Rest) {
		if headPassValue != ' ' {
			break
		}
	}
	if headPassCounter > 0 {
		p.Rest = p.Rest[headPassCounter:]
	}

The problem come in when you test a line like: " 3 |"
The second time it looks for the spaces to pass, it will do:
for headPassCounter, headPassValue = range string(p.Rest) {
but since there is no more characters in p.Rest it will not set headPassCounter to zero, it will maintain the previous count.

So the following ends up being larger than 0, and results in a panic.

	if headPassCounter > 0 {
		p.Rest = p.Rest[headPassCounter:]
	}

Setting headPassCounter to 0 before doing the range fixes the panic. I will try to fork, make a change and issue a pull request. But figured it would be good to outline the issue here.

Why isn't the optional area working?

Log lines are:

17.965 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
19.996 Pump 10 change internal state AUTHORISE_ISTATE to IDLE_ISTATE

Rule:

State =    
    Time(string) " Pump "
    Pump(int8) ~' '
    ?PState( ^"State change " _ " to " State(string) ~"[" )
    ?IState( ^"change internal state " _ " to " State(string) );

Output:

Rule `State`: processing
Take until " Pump " as Time(string)
Take until ' ' as Pump(int8)
    Option PState 
Check and pass "State change "
Look for " to " in the rest and pass it
Take until "[" as State(string)
End of option PState
Option IState
Check and pass "change internal state "
Look for " to " in the rest and pass it
Take the rest as State(string)
End of option IState
Rule `State`: done
    <standard input>:177:29: expected ';', found p 
    1:  
    2: /*
    3:  This file was autogenerated via
    4:  ------------------------------------------
    5:  ldetool generate --package main rules2.lde
    6:  ------------------------------------------
    7:  do not touch it with bare hands!
    8: */
    9:
   10: package main
   11: import (
   12:  "bytes"
   13:  "fmt"
   14:  "strconv"
   15:  "unsafe"
   16: )
   17: var changeSpaceInternalSpaceStateSpace = []byte("change internal state ")
   18: var loopSpace = []byte("Loop ")
   19: var lsbrck = []byte("[")
   20: var spaceLsbrck = []byte(" [")
   21: var spacePumpSpace = []byte(" Pump ")
   22: var spaceToSpace = []byte(" to ")
   23: var starsSpaceTimeColonSpace = []byte("*** Time: ")
   24: var stateSpaceChangeSpace = []byte("State change ")
   25: // Timestamp ...
   26: type Timestamp struct {
   27: Rest []byte
   28: Time []byte
   29: }
   30: // Extract ...
   31: func (p *Timestamp) Extract(line []byte) (bool, error) {
   32: p.Rest = line
   33: var pos int
   34:
   35: // Looking for "*** Time: " and then pass it
   36: pos = bytes.Index(p.Rest, starsSpaceTimeColonSpace, )
   37: if pos>=0 {
   38: p.Rest = p.Rest[pos+len(starsSpaceTimeColonSpace, ):]
   39: } else {
   40: return false, nil}
   41:
   42: // Take the rest as Time(string)
   43: p.Time = p.Rest
   44: p.Rest = p.Rest[len(p.Rest, ):]
   45: return true, nil}
   46: // Loop ...
   47: type Loop struct {
   48: Rest []byte
   49: Loop uint8
   50: Driver []byte
   51: }
   52: // Extract ...
   53: func (p *Loop) Extract(line []byte) (bool, error) {
   54: p.Rest = line
   55: var err error
   56: var pos int
   57: var tmp []byte
   58: var tmpUint uint64
   59:
   60: // Looking for "Loop " and then pass it
   61: pos = bytes.Index(p.Rest, loopSpace, )
   62: if pos>=0 {
   63: p.Rest = p.Rest[pos+len(loopSpace, ):]
   64: } else {
   65: return false, nil}
   66:
   67: // Take until " [" as Loop(uint8)
   68: pos = bytes.Index(p.Rest, spaceLsbrck, )
   69: if pos>=0 {
   70: tmp = p.Rest[:pos]
   71: p.Rest = p.Rest[pos+len(spaceLsbrck, ):]
   72: } else {
   73: return false, nil}
   74: if tmpUint,err=strconv.ParseUint(*(*string)(unsafe.Pointer(&tmp, ), ), 10, 8, );err!=nil {
   75: return false, fmt.Errorf("cannot parse `%s` into field Loop(uint8): %s", *(*string)(unsafe.Pointer(&tmp, ), ), err, )
   76: }
   77: p.Loop = uint8(tmpUint, )
   78:
   79: // Take until ']' as Driver(string)
   80: pos = bytes.IndexByte(p.Rest, ']', )
   81: if pos>=0 {
   82: p.Driver = p.Rest[:pos]
   83: p.Rest = p.Rest[pos+1:]
   84: } else {
   85: return false, nil}
   86:
   87: return true, nil}
   88: // State ...
   89: type State struct {
   90: Rest []byte
   91: Time []byte
   92: Pump int8
   93: PState struct {
   94: Valid bool
   95: State []byte
   96: }
   97: IState struct {
   98: Valid bool
   99: State []byte
  100: }
  101: }
  102: // Extract ...
  103: func (p *State) Extract(line []byte) (bool, error) {
  104: p.Rest = line
  105: var err error
  106: var pos int
  107: var rest1 []byte
  108: var tmp []byte
  109: var tmpInt int64
  110:
  111: // Take until " Pump " as Time(string)
  112: pos = bytes.Index(p.Rest, spacePumpSpace, )
  113: if pos>=0 {
  114: p.Time = p.Rest[:pos]
  115: p.Rest = p.Rest[pos+len(spacePumpSpace, ):]
  116: } else {
  117: return false, nil}
  118:
  119: // Take until ' ' as Pump(int8)
  120: pos = -1
  121: for i, char := range p.Rest {
  122: if char==' ' {
  123: pos = i
  124: break}
  125: }
  126: if pos>=0 {
  127: tmp = p.Rest[:pos]
  128: p.Rest = p.Rest[pos+1:]
  129: } else {
  130: return false, nil}
  131: if tmpInt,err=strconv.ParseInt(*(*string)(unsafe.Pointer(&tmp, ), ), 10, 8, );err!=nil {
  132: return false, fmt.Errorf("cannot parse `%s` into field Pump(int8): %s", *(*string)(unsafe.Pointer(&tmp, ), ), err, )
  133: }
  134: p.Pump = int8(tmpInt, )
  135: rest1 = p.Rest
  136:
  137: // Checks if the rest starts with `"State change "` and pass it
  138: if bytes.HasPrefix(rest1, stateSpaceChangeSpace, ) {
  139: rest1 = rest1[len(stateSpaceChangeSpace, ):]
  140: } else {
  141: p.PState.Valid = false;goto statePStateLabel}
  142:
  143: // Looking for " to " and then pass it
  144: pos = bytes.Index(rest1, spaceToSpace, )
  145: if pos>=0 {
  146: rest1 = rest1[pos+len(spaceToSpace, ):]
  147: } else {
  148: p.PState.Valid = false;goto statePStateLabel}
  149:
  150: // Take until "[" as State(string)
  151: pos = bytes.Index(rest1, lsbrck, )
  152: if pos>=0 {
  153: p.PState.State = rest1[:pos]
  154: rest1 = rest1[pos+len(lsbrck, ):]
  155: } else {
  156: p.PState.Valid = false;goto statePStateLabel}
  157: p.PState.Valid = true
  158: p.Rest = rest1
  159: statePStateLabel:
  160: rest1 = p.Rest
  161:
  162: // Checks if the rest starts with `"change internal state "` and pass it
  163: if bytes.HasPrefix(rest1, changeSpaceInternalSpaceStateSpace, ) {
  164: rest1 = rest1[len(changeSpaceInternalSpaceStateSpace, ):]
  165: } else {
  166: p.IState.Valid = false;goto stateIStateLabel}
  167:
  168: // Looking for " to " and then pass it
  169: pos = bytes.Index(rest1, spaceToSpace, )
  170: if pos>=0 {
  171: rest1 = rest1[pos+len(spaceToSpace, ):]
  172: } else {
  173: p.IState.Valid = false;goto stateIStateLabel}
  174:
  175: // Take the rest as State(string)
  176: p.IState.IState.State = rest1
  177: rest1 = rest1[len(rest1, ):]p.IState.Valid = true
  178: p.Rest = rest1
  179: stateIStateLabel:
  180:
  181: return true, nil}
  182: // GetPStateState ...
  183: func (p *State) GetPStateState() (res []byte) {
  184: if p.PState.Valid {
  185: res = p.PState.State
  186: }
  187: return}
  188: // GetIStateState ...
  189: func (p *State) GetIStateState() (res []byte) {
  190: if p.IState.Valid {
  191: res = p.IState.State
  192: }
  193: return}
  194:
  195:
  196:

---------------------------------------
exit status 2
main.go:1: running "ldetool": exit status 1

Comment handling refinement

We currently have a support for comments #35

But the current approach is kind of limiting: it is natural to have a prefix or a lookup before a field. I mean something like that:

# Rule rule
Rule = 
    _"field[" Field(string) ']'
    ^" value[" Value(string) '];

I guess it would be right to introduce the following rule of comment consumption:

# Rule rule
Rule = 
    # Field filed
    _"field[" Field(string) ']'
    # Value value
    ^" value[" Value(string) '];

That is, if we have the only taker in the line then the comment above aligned with the start of the current line will be treated as this taker's comment.

Ignoring unicode prefix

At the start of the log file I have:

ï»¿*** Time: 2/1/2019 12:10:17

But only on the first line.
Subsequently, the first line does not match with this rule

Timestamp = 
    ^"*** Time: " Time(string);

How do I write an appropriate rule to ignore those unicode related characters (ï»¿) ?

Invalid code on exact position lookup

Simple rule

Rule = Head(hex16) '-'[4] Tail(hex16);

produces invalid code: unused variable tmpRest appears

Latest version [v0.3.3] is not working for me. Version v0.1.0 is working fine

My lde file

Audit = 
    ^"type=" Type(string)   " "
    ^"msg="  Msg(string)    " "
    ^"arch=" Arch(string)    " "
    ^"syscall=" SysCall(int)    " "
    ^"success=" Success(string)    " "
    ^"exit=" Exit(string)    " "
    ^"a0=" A0(int)    " "
    ^"a1=" A1(int)    " "
    ^"a2=" A2(int)    " "
    ^"a3=" A3(int)    " "
    ^"items=" Items(int)    " "
    ^"ppid=" PPID(int)    " "
    ^"pid="  PID(int)    " "
    ^"auid=" AUID(int)    " "
    ^"uid="  UID(int)    " "
    ^"gid="  GID(int)    " "
    ^"suid="  SUID(int)    " "
    ^"fsuid=" FSUID(int)    " "
    ^"egid=" EGID(int)    " "
    ^"sgid=" SGID(int)    " "
    ^"tty=" TTY(string)    " "
    ^"ses=" SES(int)    " "
    ^"comm=" COMM(string)    " "
    ^"exe=" EXE(string)    " "
    ^"subj=" SUBJ(string)    " "
    ^"key=" KEY(string)    " "
;

pass heading characters action won't work with --go-string

Code generated with "pass heading characters" (*'#') won't compile if --go-string is enabled

Support for literal of decimal type is needed

Literal representation of decimal types is needed.

Decimal type representation in script:

Rule = Field(dec14_12);

means a decimal with 14 digits where 12 of them are designated for fraction part of the number

Decimal type representation in generated structure ([]byte):

type Rule struct {
    Rest []byte
    Field struct {
        Negative bool
        Integral   []byte
        Fraction  []byte
    }
}

it will be

   Rest string
   Field struct {
       Negative bool
       Integral   string
       Fraction  string
   }
}

in case of strings (--go-string option).

Parsing rules:
- Leading zeroes of integral part are to be omitted
- Trailing zeroes of fraction part are to be omitted too
- Parser will check amount of meaningful numbers against decimal type in a field declaration and will raise a error if something goes wrong.

Optional generation for strings

Right now only []byte/bytes (type/package) generate is supported. It would be nice to be able to generate for string/strings as well.

Variable constant collision when generating multiple similar extractors

Using the ldetool to parse game logs. There are ~30 different types of entries that can be in the log file, and of that I plan to parse ~1/2 of them. Ran into an issue when using strings instead of bytes where I have this defined in multiple files:

var constSpaceMinusSpace = []byte(" - ")

Is there a way to generate the code with these constants being locally scoped to the Extract function, pulled out into its own file of constants, or should I try and convert everything into bytes instead of using strings?

Upper bounding lookup generates counter-logic check that demands the rest must not be smaller than the upper bound

For instance, "."[:15] lookup fragment will fail extraction (stress is off, not in optional area)

if len(p.Rest) < 15 {
    return false, nil
}

There's no need to generate guards for upper bounds, only lower ones does make a sense

Can't parse floats?

Great project. But I so wish the documentation was better explained with more examples.

For example, I'm having great trouble with this

e.g.
17.965 Pump 10 hose FF price level 1 limit 0.0000 authorise pending (Type 00)

I tried this but it doesn't work when limit is a float32.

Auth =
    Offset(string)
    " Pump " Pump(int8)
    " hose " Hose(hex8)
    " price level " PriceLevel(int8)
    " limit " Limit(string);
    " authorise pending ";

It works as a string, but not float32

What am I doing wrong?

Deleting all releases

I am deleting all releases as my versioning didn't satisfy semver criteria and thus I am having troubles with go modules.
Will start from v0.0.1 now

Rest method needed to return unconsumed rest

Standard Rest method needed for the extractor type. This is a major change BTW.

.\ldetool.go:59:12: undefined: generateAction

I go get all the project ldetool, and then I run the command go build ldetool.go , it shows me .\ldetool.go:59:12: undefined: generateAction
please tell me how to solve it? My aim is to debug ldetool in goland, so I build it from source.

New action to check a length of a rest is needed.

There should be an action which checks how many signs left in the rest. With comparison operator (>, <) support.

Syntax:

%15    # 15 symbols left in a rest
%<15   # less than 15 symbols left in a rest
%>15   # more than 15 symbols left in a rest

I would use # instead, but it is used for comments

C version planned?

this is a really impressive tool.

unfortunately due to being written in go/emitting go it's restricted to be only used by go code, whereas if there was a C version of it it could be used by any PL that can interface to C libraries/code (i.e. almost any).

are there any plans to make it available to them ?

Support comments for types and type fields

It would be nice to have opportunity to define comments on types and type fields to make them appear in generated code.

diskstats.lde:

// Disk usage statistics provided by Linux kernel prior to 4.19
DiskstatsBefore419 =
    *' '
    // Major device number
    Major(uint16) ~' '
    *' '
    // Minor device number
    Minor(uint16) ~' '
    // Device name
    Device(str) ' '
    // Read operations happened since system boot
    ReadIOs(uint32) ' '

results in diskstats.go:

    // Disk usage statistics provided by Linux kernel prior to 4.19
    type DiskstatsBefore419 struct {
        // Major device number
        Major uint16
        // Minor device number
        Minor uint16
        // Device name
        Device string
        // Read operations happened since system boot
        ReadIOs uint32
    }

Extraction lacks field names in error messages

When extraction into field failed for some reason resulting error lacks field name.

[Feature Request] JSON tags

Hi,

Would be very helpful a options to write JSON tags on the generated structures.

// Line autogenerated parser
type Line struct {
    Rest   []byte         `json:"rest"
    Time   []byte        `json:"time"
    First  uint8           `json:"first"
    Format []byte      `json:"format"
    Hidden struct {
        Valid bool
        Value uint8     `json:"value,omitempty"
    }   `json:"hidden,omitempty"
    UserAgent []byte `json:"user_agent"
    Country   []byte   `json:"country"
}

Call arbitrary function on match/conversion

A great feature would be to support custom types. e.g. time.

For example, instead of:

Rule = Time(string)

Have:

Rule = Time(time.Time)  // time.Time = go type

Then expect the user to write the function to decode it

func (p *Rule) UnmarshalTime(field byte[]) (time.Time, err) {

}

or what about if the input is a string, but we do some special decoding to convert it to an int

Rule = PumpState(int)

func (p *Rule) UnmarshallPumpState(field byte[]) (int, err) {

}

Perhaps the unmarshaller is somehow optional. Just some things to ponder.

Exact position lookup syntax

Make another syntax for exact position lookup:

^"separator"[N]

instead of

_"separator"[N]

as it will be closer to task semantics and its implementation. Bounded take should remain the same.

End check code lacks commentary

Code generated for check end ($) lacks a commentary.

Generated files are empty

My problem is that when I run ldetool the resulting generated go code is an empty file.

Very much in the early stages of playing around (so there may be some glaring mistake), I'm running go 1.16.3 and my lde file is as follows:

type time.Duration from "time";
type ip.IP from "net";

Combined =
  ?RemoteServer(IPAddress(ip.IP) ' - ')
  ?RemoteUser(Value(string))
  ^' [' Time(string) ']'
  ^' ' ?HTTPHost(Value(string))
  ^' "' ?HTTPLine(Value(string) '"')
  ^' ' ?TlsProtocol(Value(string) '/')
  ^' ' ?TlsCipher(Value(string) ' ')
  ^' ' ?HTTPCode(Value(uint16) ' ')
  ^' ' ?HTTPBodyBytesSent(Value(uint16) ' ')
  ^' ' ?HTTPResponseTime(DurationFromSeconds(time.Duration) ' ')
  ^' ' ?HTTPUpstreamResponseTime(DurationFromSeconds(time.Duration) ' ')
  ^' ' ?HTTPUpstreamCode(Value(uint16) ' ')
  ^' ' ?HTTPUpstreamHost(Host(string) ' ')
  ^' ' ?HTTPUpstreamCacheStatus(Value(string) ' ')
  ^' "' ?HTTPReferer(Value(string) '" ')
  ^' "' ?HTTPUserAgent(Value(string) '" ')
  ^'" ' ?HTTPForwardFor(Csv(string) '" ')
  ^'" ' ?HTTPAuthorization(Value(string) '"');

and the output from running ldetool is:

> ldetool --package main combined.lde

Rule `Combined`: processing
Named option RemoteServer
Take until ' - ' as IPAddress(ip.IP)
End of named option RemoteServer
Named option RemoteUser
Take the rest as Value(string)
End of named option RemoteUser
Check and pass character ' ['
Take until ']' as Time(string)
Check and pass character ' '
Named option HTTPHost
Take the rest as Value(string)
End of named option HTTPHost
Check and pass character ' "'
Named option HTTPLine
Take until '"' as Value(string)
End of named option HTTPLine
Check and pass character ' '
Named option TlsProtocol
Take until '/' as Value(string)
End of named option TlsProtocol
Check and pass character ' '
Named option TlsCipher
Take until ' ' as Value(string)
End of named option TlsCipher
Check and pass character ' '
Named option HTTPCode
Take until ' ' as Value(uint16)
End of named option HTTPCode
Check and pass character ' '
Named option HTTPBodyBytesSent
Take until ' ' as Value(uint16)
End of named option HTTPBodyBytesSent
Check and pass character ' '
Named option HTTPResponseTime
Take until ' ' as DurationFromSeconds(time.Duration)
End of named option HTTPResponseTime
Check and pass character ' '
Named option HTTPUpstreamResponseTime
Take until ' ' as DurationFromSeconds(time.Duration)
End of named option HTTPUpstreamResponseTime
Check and pass character ' '
Named option HTTPUpstreamCode
Take until ' ' as Value(uint16)
End of named option HTTPUpstreamCode
Check and pass character ' '
Named option HTTPUpstreamHost
Take until ' ' as Host(string)
End of named option HTTPUpstreamHost
Check and pass character ' '
Named option HTTPUpstreamCacheStatus
Take until ' ' as Value(string)
End of named option HTTPUpstreamCacheStatus
Check and pass character ' "'
Named option HTTPReferer
Take until '" ' as Value(string)
End of named option HTTPReferer
Check and pass character ' "'
Named option HTTPUserAgent
Take until '" ' as Value(string)
End of named option HTTPUserAgent
Check and pass character '" '
Named option HTTPForwardFor
Take until '" ' as Csv(string)
End of named option HTTPForwardFor
Check and pass character '" '
Named option HTTPAuthorization
Take until '"' as Value(string)
End of named option HTTPAuthorization
Rule `Combined`: done

Type string type which always converts into Go string is needed

Type string converts into []byte or string dependeding on --go-string flag. Special string type is needed, which will be exactly go string whatever the flag.

The suggestion is to use str type name for the task

Invalid code generation with nested optional blocks

The following lde definition generates invalid golang code:

FOOBARBAZ =
        ^"<foo>" Stuff(string) ^"</foo>"
        ?Bar ( ^"<Bar" ?ID (^" foobar='{" Foobarbaz(string) "}'") ^"/>" )
        ^"<baz>" Baz(string) "</baz>"
;

It appears that the function that generates labels for goto operations is colliding. This is the golang error:

./parser_lde.go:62:8: label foobarbazBarLabel not defined
./parser_lde.go:64:10: undefined: barRest
./parser_lde.go:85:2: undefined: barRest
./parser_lde.go:98:1: label foobarbazBarIDLabel already defined at ./parser_lde.go:86:1

Here is the lde tool output:

Rule `FOOBARBAZ`: processing
Check and pass "<foo>"
Take the rest as Stuff(string)
Check and pass "</foo>"
Named option Bar
Check and pass "<Bar"
Named option ID
Check and pass " foobar='{"
Take until "}'" as Foobarbaz(string)
End of named option ID
Check and pass "/>"
End of named option Bar
Check and pass "<baz>"
Take until "</baz>" as Baz(string)
Rule `FOOBARBAZ`: done

Length check code needs lacks commentary

Length check code (%N, %>N, %<N) needs commentary before it.

Ability to catch value including bounds is needed

There can be a need to catch a value including a bound. At least I needed it once and solved via ugly hack that lead to problems on testing further.

Thus, a syntax extension can be introduced:

Rule = Value(string] '-';

This rule will take everything up to - character including it and put into Value

panic in create autogenerate file

ProcStats = 
    Pid(int32) ' '
    Comm(string) ' '
    State($uint8) ' '[1]
    Ppid(int32) ' '
    Pgrp(int32) ' '
    Session(int32) ' '
    TtyNr(int32) ' '
    Tpgid(int32) ' '
    Flags(uint32) ' '
    Minflt(uint32) ' '
    Cminflt(uint32) ' '
    Majflt(uint32) ' '
    Cmajflt(uint32) ' '
    Utime(uint32) ' '
    Stime(uint32) ' '
    Cutime(int32) ' '
    Cstime(int32) ' '
    Priority(int32) ' '
    Nice(int64) ' '
    NumThreads(int64) ' '
    Itrealvalue(int64) ' '
    Starttime(uint64) ' '
    Vsize(uint32) ' '
    Rss(int32) ' '
    Rsslim(uint32) ' '
    Startcode(uint32) ' '
    Endcode(uint32) ' '
    StartStack(uint32) ' '
    Kstkesp(uint32) ' '
    Kstkeip(uint32) ' '
    Signal(uint32) ' '
    Blocked(uint32) ' '
    Sigignore(uint32) ' '
    Sigcatch(uint32) ' '
    Wchan(uint32) ' '
    Nswap(uint32) ' '
    Cnswap(uint32) ' '
    ExitSignal(int32) ' '
    Processor(int32) ' '
    RtPriority(uint32) ' '
    Policy(uint32) ' '
    DelayacctBlkioTicks(uint64) ' '
    GuestTime(uint32) ' '
    CguestTime(uint32) ' '
    StartData(uint32) ' '
    EndData(uint32) ' '
    StartBrk(uint32) ' '
    ArgStart(uint32) ' '
    ArgEnd(uint32) ' '
    EnvStart(uint32) ' '
    EnvEnd(uint32) ' '
    ExitCode(int32)

;

panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

goroutine 1 [running]:
main.generate.func1(0xc0001b1f28, 0xc0001b1eb0, 0xc0001343c0, 0xc0001b1ea8)
	go/src/github.com/sirkon/ldetool/ldetool.go:156 +0x701
panic(0x6ea840, 0x7a9e30)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/sirkon/ldetool/internal/generator/gogen.(*Generator).AddField(0xc0000daa00, 0xc00017f3d8, 0x5, 0xc00017f3f0, 0x6, 0x7b9840, 0xc00020c210, 0x0, 0x0)
	go/src/github.com/sirkon/ldetool/internal/generator/gogen/generator.go:238 +0xa48
github.com/sirkon/ldetool/internal/srcbuilder.(*SrcBuilder).DispatchTake.func2(0x0, 0x0)
	go/src/github.com/sirkon/ldetool/internal/srcbuilder/dispatching.go:389 +0x88
github.com/sirkon/ldetool/internal/srcbuilder.(*SrcBuilder).BuildRule(0xc0000e6e10, 0xc000269500, 0x75363c, 0x1e)
	go/src/github.com/sirkon/ldetool/internal/srcbuilder/builder.go:72 +0xbb
main.generate(0xc0001343c0, 0x0, 0x0)
	go/src/github.com/sirkon/ldetool/ldetool.go:220 +0x9ca
main.main()

Not able to parse lines with arbitrary number of spaces in the begining of the line

Type definition:

DiskstatsLineAfter418 =
    Major(uint8) ~' '
    Minor(uint8) ~' '
    Device(string) ' '
    ReadIOs(uint32) ' '
    ReadMerges(uint32) ' '
    ReadSectors(uint64) ' '
    ReadTicks(uint32) ' '
    WriteIOs(uint32) ' '
    WriteMerges(uint32) ' '
    WriteSectors(uint64) ' '
    WriteTicks(uint32) ' '
    InFlight(uint64) ' '
    IOTicks(uint64) ' '
    TimeInQueue(uint64) ' '
    DiscardIOs(uint32) ' '
    DiscardMerges(uint32) ' '
    DiscardSectors(uint64) ' '
    DiscardTicks(uint32)
;

Example data:

  11       0 sr0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Error:

Cannot parse ``: strconv.ParseUint: parsing "": invalid syntax

Also it would be nice if parser could point out parsing of which field of the struct was failed exactly.

Detect package name conflicts

Package name conflicts must be detected and reported when generating into a directory with existing Go source files.
Package name should not be mandatory CLI option when generating into a directory with existing Go sources files except the case when there's only one file and it will be overwritten by the tool.

Code generation improvement for lookup on fixed position

Currently rule

Rule = A(string) "a"[2];

produces code with

…
// Take until 3rd  if it starts "a" substring as A(string)
if len(p.Rest) >= len(constA)+2 && bytes.HasPrefix(p.Rest[2:], constA) {
	pos = 2
} else {
	pos = -1
}
if pos >= 0 {
	p.A = p.Rest[:pos]
	p.Rest = p.Rest[pos+len(constA):]
} else {
	return false, nil
}
…

This code neither fast, neither clear (it must be both).

It should looks like

if len(p.Rest) < len(constA)+2 || !bytes.HasPrefix(p.Rest[2:], constA) {
	return false, nil
} 
p.A = p.Rest[:2]
p.Rest = p.Rest[2+len(constA):]

The plan is to introduce dedicated handling for bound takes like A(type) 'X'[N], A[type] "bound"[N], etc in opposite to current state, where we have generic take until bound generator for A(type) 'X', A(type) 'X'[N:], A(type) 'X'[N:M], etc.

go install command is not supported

After go 1.16, an error occurred when using the go install command:

go install github.com/sirkon/ldetool@latest

go: github.com/sirkon/ldetool@latest (in github.com/sirkon/[email protected]):
        The go.mod file for the module providing named packages contains one or
        more replace directives. It must not contain directives that would cause
        it to be interpreted differently than if it were the main module.

Add hex8, hex16, hex32, hex64 and oct8, oct16, oct32 and oct64 types

It can be useful to introduce following types

type	hex	hex8	hex16	hex32	hex64	oct	oct8	oct16	oct32	oct64
Go type	uint	uint8	uint16	uint32	uint64	uint	uint8	uint16	uint32	uint64

octal types are less needed obviously

Best approach for log processing

I have a log file that contains lines like these.

Normally, the log lines are prefixed by "17.965 Pump"
It's basically a time offset.
However it could look like 12:10:17.000 which is an actual time (without date).
The other form is 15/05/2019 06:42:22.841 which is a full date time.

How the logs look depend on whether a special tool has been used to modify the logs.
I'd like my log parsing tool to cater for all 3. So I've done the following:

e.g. some log lines

15/05/2019 06:42:22.841 Pump 14 State change DELIVERING_PSTATE to DELIVERY_FINISHED_PSTATE [34]
12:10:17.000 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
17.965 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
19.996 Pump 10 change internal state AUTHORISE_ISTATE to IDLE_ISTATE
17.965 Pump 10 hose FF price level 1 limit    0.0000 authorise pending (Type 00)
48.373 Pump 11 delivery complete, Hose 1, price 72.9500, level 1, value  200.0000, volume

2.7400, v-total 8650697.0000, m-total 21869117.6700, T12:10:48
17.965 Pump 10 hose FF price level 1 limit 0.0000 authorise pending (Type 00)

Rule =
?Timestamp(
Time(string) " Pump "
)
?Offset(
Offset(float64) " Pump "
)

Would the above approach be appropriate?

Then is it better to do this:

Rule = 
   ?Offset(
       Offset(float64) " Pump "
    )
    ?Timestamp(
        Time(string) " Pump "
    )   
    Pump(int8) ~" "
    ?PState(
        ^"State change " FromState(string) " to " ToState(string) ~' '
    )
    ?IState(
        ^"change internal state " FromState(string) " to " ToState(string)
    )

etc.

Or should I pre-examine the log file and determine what the dates look like and pick the appropriate decoder at the start. I guess that would speed things up. Not that it feels particularly slow either way.

Also trying the following:

# *** Time: 2/1/2019 12:10:17
Timestamp = 
    _"*** Time: " Time(string);

# e.g. Loop 1 [Tatsuno driver v3.6.0.0] Gen v4.6.3 - log continues
Loop =
    ^"Loop " Loop(uint8)
    " [" Driver(string) ']';

# *** Time: 2/1/2019 12:10:17
# 15/05/2019 06:42:22.841 Pump 14 State change DELIVERING_PSTATE to DELIVERY_FINISHED_PSTATE [34]
# 12:10:17.000 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
# 17.965 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
# 19.996 Pump 10 change internal state AUTHORISE_ISTATE to IDLE_ISTATE
# 17.965 Pump 10 hose FF price level 1 limit    0.0000 authorise pending (Type 00)
# 48.373 Pump 11 delivery complete, Hose 1, price 72.9500, level 1, value  200.0000, volume    2.7400, v-total 8650697.0000, m-total 21869117.6700, T12:10:48
# 17.965 Pump 10 hose FF price level 1 limit    0.0000 authorise pending (Type 00)
Rule = 
    Time(string) " Pump "
    Pump(int8) ~" "
    ?PState(
        ^"State change "
        _ " to " State(string) ~' '
    )
    ?IState(
        ^"change internal state "
        _ " to " State(string)
    )
;

But it triggers an error during running the tool and spits out what it generated
That was more of an experiment. Should I use a separate rule for each line?

Rule `Timestamp`: processing
    Look for "*** Time: " in the rest and pass it 
Take the rest as Time(string)
Rule `Timestamp`: done

Rule `Loop`: processing
Check and pass "Loop "
Take until " [" as Loop(uint8)
Take until ']' as Driver(string)
Rule `Loop`: done

Rule `Rule`: processing
    Take until " Pump " as Time(string) 
Take until " " as Pump(int8)
Option PState
Check and pass "State change "
Look for " to " in the rest and pass it
Take until ' ' as State(string)
End of option PState
    Option IState 
Check and pass "change internal state "
Look for " to " in the rest and pass it
Take the rest as State(string)
End of option IState
Rule `Rule`: done
    <standard input>:176:29: expected ';', found p 
    1:  
    2: /*
    3:  This file was autogenerated via
    4:  ------------------------------------------
    5:  ldetool generate --package main rules2.lde
    6:  ------------------------------------------
    7:  do not touch it with bare hands!
    8: */
    9:
   10: package main

Invalid code on exact lookup of strings

Exact lookup produces syntactically invalid code for string targets