Giter Site home page Giter Site logo

benhoyt / goawk Goto Github PK

View Code? Open in Web Editor NEW
1.9K 39.0 82.0 3.21 MB

A POSIX-compliant AWK interpreter written in Go, with CSV support

Home Page: https://benhoyt.com/writings/goawk/

License: MIT License

Go 97.96% Shell 1.06% Python 0.98%
awk go interpreter parser csv

goawk's Introduction

GoAWK: an AWK interpreter with CSV support

Documentation GitHub Actions Build

AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse The AWK Programming Language I was inspired to write an interpreter for it in Go. So here it is, feature-complete and tested against "the one true AWK" and GNU AWK test suites.

GoAWK is a POSIX-compatible version of AWK, and additionally has a CSV mode for reading and writing CSV and TSV files. This feature was sponsored by the library of the University of Antwerp. Read the CSV documentation.

You can also read one of the articles I've written about GoAWK:

Basic usage

To use the command-line version, simply use go install to install it, and then run it using goawk (assuming ~/go/bin is in your PATH):

$ go install github.com/benhoyt/goawk@latest

$ goawk 'BEGIN { print "foo", 42 }'
foo 42

$ echo 1 2 3 | goawk '{ print $1 + $3 }'
4

# Or use GoAWK's CSV and @"named-field" support:
$ echo -e 'name,amount\nBob,17.50\nJill,20\n"Boba Fett",100.00' | \
  goawk -i csv -H '{ total += @"amount" } END { print total }'
137.5

To use it in your Go programs, you can call interp.Exec() directly for simple needs:

input := strings.NewReader("foo bar\n\nbaz buz")
err := interp.Exec("$0 { print $1 }", " ", input, nil)
if err != nil {
    fmt.Println(err)
    return
}
// Output:
// foo
// baz

Or you can use the parser module and then interp.ExecProgram() to control execution, set variables, and so on:

src := "{ print NR, tolower($0) }"
input := "A\naB\nAbC"

prog, err := parser.ParseProgram([]byte(src), nil)
if err != nil {
    fmt.Println(err)
    return
}
config := &interp.Config{
    Stdin: strings.NewReader(input),
    Vars:  []string{"OFS", ":"},
}
_, err = interp.ExecProgram(prog, config)
if err != nil {
    fmt.Println(err)
    return
}
// Output:
// 1:a
// 2:ab
// 3:abc

If you need to repeat execution of the same program on different inputs, you can call interp.New once, and then call the returned object's Execute method as many times as you need.

Read the package documentation for more details.

Differences from AWK

The intention is for GoAWK to conform to awk's behavior and to the POSIX AWK spec, but this section describes some areas where it's different.

Additional features GoAWK has over AWK:

  • It has proper support for CSV and TSV files (read the documentation).
  • It's the only AWK implementation we know with a code coverage feature (read the documentation).
  • It supports negative field indexes to access fields from the right, for example, $-1 refers to the last field.
  • It's embeddable in your Go programs! You can even call custom Go functions from your AWK scripts.
  • Most AWK scripts are faster than awk and on a par with gawk, though usually slower than mawk. (See recent benchmarks.)
  • The parser supports 'single-quoted strings' in addition to "double-quoted strings", primarily to make Windows one-liners easier when using the cmd.exe shell (which uses " as the quote character).

Things AWK has over GoAWK:

  • Scripts that use regular expressions are slower than other implementations (unfortunately Go's regexp package is relatively slow).
  • AWK is written by Alfred Aho, Peter Weinberger, and Brian Kernighan.

Stability

This project has a good suite of tests, which include my own intepreter tests, the original AWK test suite, and the relevant tests from the Gawk test suite. I've used it a bunch personally, and it's used in the Benthos stream processor as well as by the software team at the library of the University of Antwerp. However, to err == human, so please use GoAWK at your own risk. I intend not to change the Go API in a breaking way in any v1.x.y version.

AWKGo

The GoAWK repository also includes the creatively-named AWKGo, an AWK-to-Go compiler. This is experimental and is not subject to the stability requirements of GoAWK itself. You can read more about AWKGo or browse the code on the awkgo branch.

License

GoAWK is licensed under an open source MIT license.

The end

Have fun, and please contact me if you're using GoAWK or have any feedback!

goawk's People

Contributors

benhoyt avatar codefromthecrypt avatar crestwave avatar fioriandrea avatar juntuu avatar juster avatar mingrammer avatar raff avatar u5surf avatar xonixx avatar ypdn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goawk's Issues

The "printf" statement and the "sprintf" function do not tolerate extra arguments

Most popular AWK interpreters accept extra arguments for "sprintf" and "printf", but GoAWK does not:

buildbox@sinister:~$ goawk 'BEGIN { sprintf("%d", 1, 2) }'
format error: got 2 args, expected 1
(1)
buildbox@sinister:~$ original-awk 'BEGIN { sprintf("%d", 1, 2) }'
buildbox@sinister:~$ gawk 'BEGIN { sprintf("%d", 1, 2) }'
buildbox@sinister:~$ busybox awk 'BEGIN { sprintf("%d", 1, 2) }'
buildbox@sinister:~$ mawk 'BEGIN { sprintf("%d", 1, 2) }'

The use-case for this is user-defined functions that behave like "sprintf" and in turn may receive a variable number of arguments. Here's an excerpt from the script this comes from -- a linter for Markdown files:

# Track an issue for the file currently being linted if reporting of the issue
# has not been disabled.
#
# Arguments:
# - rule: Name of the rule being reported.
# - lineno: Line number in the original markdown file in where the problem
#   occurs.
# - format: A printf-style format string explaining the issue being reported.
#   There is an implicit "\n" at the end of this string.
# - a, b, c, d, e, f: Up to 6 optional parameters for the format string.
#
function report(rule, lineno, format, a, b, c, d, e, f,    prefix, text)
{
    if (!(rule in LINT_RULES)) {
        abort("report: unknown lint rule \"%s\"", rule)
    } else if (lineno < 1) {
        abort("report: invalid line number \"%s\" for %s", lineno, rule)
    } else if (!LINT_RULES[rule]) {
        return
    }

    text = index(format, "%") ? sprintf(format, a, b, c, d, e, f) : format
    prefix = (SHOW_FILENAMES ? lint_target ":" : "") lineno ":"

    if (SHOW_RULE_NAMES) {
        prefix = prefix " " rule ":"
    }

    arrayval_append(lint_issues, lineno, prefix " " text "\n")
}

And here's two calls to that function:

# A link or image is syntactically invalid.
#
# Arguments:
# - lineno: Number of the line with the problem.
# - n: Number of broken links or images.
#
function broken_link_or_image(lineno, n)
{
    report("broken_link_or_image", lineno,
        "%d syntactically invalid link%s or image%s", n, S(n), S(n) \
    )
}

# A list contains an empty item.
#
# Arguments:
# - lineno: Number of the line with the problem.
#
function empty_list_entry(lineno)
{
    report("empty_list_entry", lineno, "list entry is empty")
}

Side note: the GoAWK error message is less helpful than it could be because there isn't a line number in the error message.

Use of map[string]... to store arrays makes array access non-deterministic

I've been running into some issues trying to get consistent output from goawk with a fairly non-trivial script I have. It seems that arrays are stored as map[string]value types internally, which makes array access with for (x in y) non-deterministic.

Consider the following example:

BEGIN {
	str = "1,2,3,4"
	split(str, fields, ",")
	for (i in fields) {printf "%d ", i}
}

For other AWK implementations I have access to (nawk, gawk, and busybox awk), this will return 1 2 3 4 (actually, that's a lie, nawk will return 2 3 4 1, i.e. will return the first element last, for whatever reason). For goawk, as a consequence of randomized maps in Go, this will be random.

I don't assume people are writing non-trivial programs in Awk all the time, but having for (x in y) be non-deterministic is a pitfall, and is not always possible to work around (e.g. when keys are not known).

Is it possible that this will be fixed in the future?

Consider making handling of CR LF newlines more consistent with Gawk

Per discussion on issue #33 (from here down), GoAWK handles CR LF (Windows) line endings differently from gawk (I haven't tried awk or mawk). GoAWK doesn't include the CR in the field (because it's part of the line ending), whereas Gawk does. I'm not sure if there are differences between Gawk's handling on Windows and Linux.

I kinda think the GoAWK approach is more sensible and platform-native, but consistency with other AWKs is good too ... worth thinking about further.

Arnold Robbins said this:

Gawk is consistent . RS has the default value of \n and that is what terminates records. As far as gawk is concerned, the \r is no different from any other character, which is why it appears as part of the last field in the record.

That said, on Windows, I believe the default is to work in text mode, in which case gawk never sees the \r\n line ending, it only sees \n. One can use BINMODE to force gawk to see those characters, in which case you would need to set RS = "\r?\n" in order to get correct processing.

Take the Windows advice with a grain of salt. I have not used a Windows system directly in over two years, and when I did I used Cygwin, so some experimentation may be in order.

If one is processing a Windows file on Linux, then one should use a utility like dos2unix on the file, or tr, before sending the data to GoAwk, which does not (yet! hint, hint) allow RS to be a regular expression. Using GoAwk on Windows, well, you'll have to figure out what the Go runtime is handing off to your code.

Can this replace sed ?

If anyone has a decent answer it would be interesting.

Currently using sed for various hscky scripts and we are converting them into go and mage.

Hoawk could replace sed in this regards perhaps

RS skips blank lines instead of splitting on blank lines

Report via email from Edward Perry:

I was looking at your goawk code and find it very educational since I was going to try something similar, but I am very new to Go.

I ran into a bug with RS="" and FS="\n", didn't do what I expected:

echo -e "\nFROM GOAWK\n"
echo -n -e "one\ntwo\n\nuno\ndos\n\n" | goawk 'BEGIN{ RS=""; FS="\n"; } { print $1 }'

echo -e "\nFROM GAWK\n"
echo -n -e "one\ntwo\n\nuno\ndos\n\n" | gawk 'BEGIN{ RS=""; FS="\n"; } { print $1 }'

I don't have a github account yet, or I would have made an issue there.

I am trying to find a fix, but it is slow going since I'm new to Go, will let you know if I find something. Maybe I will have a github account by then.

Support double hyphen (--) in argument parsing

Hello,

I've been experimenting with Goawk recently and I discovered that goawk does not accept double hyphens to indicate the end of options parsing.

Here's an example of what I'm talking about:

$ awk -- '{print $1}' -file

The above snippet uses double hyphens to indicate the end of options parsing etc, and to allow for safely handling weird file names or untrusted input.

For example, if we have a file that we'd like to munge that is annoyingly named "-file" and we do not use double hyphens, then the above snippet will fail because it parses "-file" as "-f ile", ie attempts to read a progfile from the non-existent file "ile".

Goawk however simply throws an an error: "flag provided but not defined: --".

Just thought I'd share my observations here.

Regards,

Jordan

Add support for "getline lvalue"

Currently we support most getline forms, but not getline lvalue. For example, see the commented-out tests in interp/interp_test.go:

		// TODO: currently we support "getline var" but not "getline lvalue"
		// TODO: {`BEGIN { getline a[1]; print a[1] }`, "foo", "foo\n", "", ""},
		// TODO: {`BEGIN { getline $1; print $1 }`, "foo", "foo\n", "", ""},

Array creation

AWK should have a variadic function for array creation. Currently if you wish to
create an array, you must use this:

dd[1] = "aa"
dd[2] = "bb"
dd[3] = "cc"

Or:

split("aa bb cc", dd)

the split syntax is problematic if your elements contain spaces. That can be
worked around by using a custom separator:

split("aa bb\1cc", dd, "\1")

but then it will fail again if your separator happens to be part of one of the
elements. Many other languages have syntax for array literals, for example C:

char *dd[] = {"aa", "bb", "cc"};

Python:

dd = ['aa', 'bb', 'cc']

JavaScript:

var dd = ['aa', 'bb', 'cc'];

Ruby:

dd = ['aa', 'bb', 'cc']

Go:

dd := []string {"aa", "bb", "cc"}

with AWK, it could look like one of these:

anew(dd, "aa", "bb", "cc")
dd[2] == "bb"
aset(dd, "aa", "bb", "cc")
dd[2] == "bb"

Not interpreting NR multi-line awk syntax

Hello there!
I was wondering if NR is working fine on this AWK library/interpreter, i was testing some examples and seems that NR when making diffs between different lines doesn't work while other simple commands works fine.

Example:
awk ' NR == 1{old = $1; next} { print $1 - old; old = $1} ' (Taking the value from previous line and subtracting it to the current line)

Bash:
cat numbers | awk ' NR == 1{old = $1; next} { print $1 - old; old = $1} '
3 -6 1 1 4 -7

Goawk:
$ go run awktest.go
4 7 1 2 3 7

Bests,
Sami

Removing duplicate lines

Dear @benhoyt, Iโ€™m confused right from the start because

awk "!line[$0]++" wordlist.txt > deduplicated.txt

works as expected whereas

goawk "!line[$0]++" wordlist.txt > deduplicated.txt     
----------------------------------------------------            
!line[$0]++                                                     
         ^                                                      
----------------------------------------------------            
parse error at 1:10: expected lvalue before ++ or --            

throws aforementioned error. What would you recommend to do here?

`system("ssh user@host")` fails

$ ./soft/goawk1.8.1 'BEGIN { system("ssh user@host") }'
Pseudo-terminal will not be allocated because stdin is not a terminal.
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-147-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

...

Command outputs the Linux greeting text and exits.

Gawk / Mawk / BWK all open interactive shell same like if you just run

$ ssh user@host

Native Functions in Config

I am trying to figure out how to define a native function in config but so far I have been unsuccessful. Can you please help me ? This is what I have so far:

` config := &parser.ParserConfig {

	Funcs: map[string]interface{} {
		("min":  func min(num1 float64, num2 float64) {
          			            if (num1 < num2) {
          					return num1
          				}
                            	return num2
        			}
	        )}
}`

Allow length(array) in addition to length(string)

Apparently length() can't be called with an array argument currently. If I'm not mistaken, that's POSIX behaviour. Though, various other awks support it. The following program:

BEGIN {
	split("abc", a, "")
	for (i=1; i<=length(a); i++) {
		print a[i]
	}
	exit
}

, run with goawk, will print parse error: can't use array "a" as scalar. As I noticed, I can work around it using n = split(.... So perhaps in this case it is not much of a problem, but I opened the issue nonetheless, just in case this isn't intended.

Wrong program arguments handling

$ cat tmp.awk 

BEGIN {
  print "ARGC", ARGC
  dbgA("ARGV", ARGV)
}

function dbgA(name, arr,   i) { print "--- " name ": "; for (i in arr) print i " : " arr[i] }
$ ./soft/gawk51 -f tmp.awk a b c -v
ARGC 5
--- ARGV: 
0 : gawk51
1 : a
2 : b
3 : c
4 : -v

$ ./soft/mawk134 -f tmp.awk a b c -v
ARGC 5
--- ARGV: 
3 : c
2 : b
1 : a
4 : -v
0 : mawk134

$ ./soft/bwk -f tmp.awk a b c -v
ARGC 5
--- ARGV: 
2 : b
3 : c
4 : -v
0 : /home/xonix/proj/makesure/soft/bwk
1 : a

BUT

$ ./soft/goawk -f tmp.awk a b c -v
flag needs an argument: -v

Obviously it parses argument that should belong to tmp.awk as awk's own parameter -v
Other than that it works fine:

$ ./soft/goawk -f tmp.awk a b c
ARGC 4
--- ARGV: 
0 : goawk
1 : a
2 : b
3 : c

$ ./soft/gawk51 -f tmp.awk a b c
ARGC 4
--- ARGV: 
0 : gawk51
1 : a
2 : b
3 : c

goawk should allow argument to -F to be up against the -F

Compare all other awks:

$ gawk -F: '/arnold/ { print $5 }' /etc/passwd
Arnold Robbins,,,

to goawk:

$ goawk -F: '/arnold/ { print $5 }' /etc/passwd
flag provided but not defined: -F:

Goawk requires that the option and argument be separate:

$ goawk -F : '/arnold/ { print $5 }' /etc/passwd
Arnold Robbins,,,

This is also contrary to POSIX.

Thanks.

Show parse error message in a more standard format

Per Nelson Beebe's comment, parse errors might be better expressed as:

filename.awk:line:col: error message

To be more consistent with gawk et al, and so that IDEs can parse the filename and line/col and jump to the position.

Might be a little tricky to implement as the code is structured right now, because currently goawk just concatenates the source files together into a byte string, so ParseProgram doesn't know which file it's dealing with.

Add support for `getline < "-"` to read from stdin

gawk, mawk and the "one true awk" have support for this, and it seems reasonable considering that POSIX requires interpretation of - as stdin in basically all other areas.

Only issue is that read prompts in general are a bit awkward with #47. For example:

$ goawk 'BEGIN { ARGV[ARGC++] = "-"; print "Enter input:"; getline; print "You entered: " $0 }'
this is my input
Enter input:
You entered: this is my input

go get doesn't work for me

[ketan@localhost ~]$ go get github.com/benhoyt/goawk
[ketan@localhost ~]$ goawk 'BEGIN { print "foo", 42 }'
bash: goawk: command not found...
[ketan@localhost ~]$ which go
/usr/bin/go
[ketan@localhost ~]$ go version
go version go1.9.4 linux/amd64

Parser allows statements directly after one another

The parser allows statements directly after one another, without being separated by semicolons.

Consider this:

goawk 'BEGIN{a = 1 print a}'

goawk doesn't detect the error, while gawk, mawk, nawk and busybox awk do.

Reconsider handling of bytes vs characters in length() and similar functions

Per @arnoldrobbins comment rmyorston/busybox-w32#174 (comment), the POSIX spec says that length and similar functions should work with characters, not bytes. The POSIX spec says:

The index, length, match, and substr functions should not be confused with similar functions in the ISO C standard; the awk versions deal with characters, while the ISO C standard deals with bytes.

Given that, I wonder why (onetrueawk) awk's length() function counts bytes instead of characters, at least by default. Here's output on my macOS Catalina machine:

$ echo ็ตต | nawk '{print length}'    # onetrueawk/awk 20200228
3
$ echo ็ตต | gawk '{print length}'    # gawk 5.0.1
1
$ echo ็ตต | mawk '{print length}'    # mawk 1.3.4 2020012
3
$ echo ็ตต | goawk '{print length}'    # goawk 1.6.0
3

# for reference:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

So I feel good that GoAWK has the majority on its side :-), but as you point out, the `nawk/mawk/goawk/ behavior does seem wrong according to POSIX. What do you recommend, @arnoldrobbins?

Incorrect `sh` determination for `system()`?

Or at least inconsistent with major implementations:

$ PATH= ./soft/gawk51 'BEGIN { system("echo hello") }'
hello
$ PATH= ./soft/bwk 'BEGIN { system("echo hello") }'
hello
$ PATH= ./soft/mawk134 'BEGIN { system("echo hello") }'
hello
$ PATH= ./soft/goawk1.8.1 'BEGIN { system("echo hello") }'
exec: "sh": executable file not found in $PATH
$ ./soft/goawk1.8.1 'BEGIN { system("echo hello") }'
hello

AFAIK the system() call should use hardcoded /bin/sh

Create `Sandbox` configuration structure

I was wondering if creating a Sandbox configuration structure that would be composed in the Config structure. This might help organize the code a bit and allow for more fined grained control over the feature set.

Concept:

// Bitmasked flag of options for sandboxing the awk vm
type SandboxFlag uint8

const (
	// Enable the 'system' call that can execute shell scripts
	Exec SanboxFlag = 1 << iota
	// Allow file reads
	FileReads
	//  Allow file writes
	FileWrites

	EnableAllFlags = Exec | FileReads | FileWrites
)

// Sandbox configurations
type Sandbox struct {
	// List of flags enabled for security it defaults to none
	Flags SandboxFlag
	// Control length of runtime of a script
	Timeout time.Duration
}

Originally posted by @lateefj in #26 (comment)

Parsing Begin Statements

Is there a specific reason why Begin and End statements are parsed with stmtsBrace instead of stmts ? Or in other words why semi-colons in Begin and End statements are chosen to be ignored ? I am asking because one could want to do using pawk something like this "awk 'BEGIN {temp = 1 ; print "emp"}' ", setting temp to 1 and printing emp.

Thank you in advance for your time and consideration

GoAWK seems to buffer more aggressively than (g)awk

For example, this script doesn't buffer in (g)awk: one is printed right away, then the sleep, then two. Whereas in GoAWK it buffers, so you see one\ntwo together after the sleep:

$ gawk 'BEGIN { print "one"; system("sleep 3"); print "two" }'

GoAWK waits because it uses buffered output by default. But the gawk docs indicate gawk buffers too. Is it because gawk doesn't buffer when writing directly to the TTY? But the same behavior seems to happen even when (g)awk output is redirected to a file, so I'm not sure.

Go AWK seems to have problems with constructs other interpreters don't

I have a pure-AWK implementation of wcwdith(3) that I've written that can be found at https://github.com/ericpruitt/wcwidth.awk. The test suite for that script runs against BusyBox AWK, GNU Awk, MAWK and "One True AWK" without issue, but it doesn't work with your Go AWK. There appear to be at least two issues. One is parsing; Go AWK doesn't like the placement of one of my break statements:

buildbox@sinister:wcwidth.awk [1]$ goawk -f wcwidth.awk
-------------------------------------------------------
            break
            ^
-------------------------------------------------------
parse error at 165:13: break must be inside a loop body
(1)
buildbox@sinister:wcwidth.awk [1]$ cat -n wcwidth.awk | grep -C2 165
   163          } else if (WCWIDTH_POSIX_MODE) {
   164              _total = -1
   165              break
   166          } else {
   167              # Ignore non-printable ASCII characters.

The other issue has to do with the way command line arguments are handled. AWK lets you modify the ARGV array, so not all arguments need to be options or file paths. Go AWK tries to treat an argument as a file path when it shouldn't:

buildbox@sinister:wcwidth.awk [1]$ goawk -f wcwidth.awk -f test.awk -v TERSE=1 width-data:width-data
open width-data:width-data: no such file or directory
(1)

Original AWK (and the other interpreters) have no problem with this:

buildbox@sinister:wcwidth.awk [1]$ original-awk -f wcwidth.awk -f test.awk -v TERSE=1 width-data:width-data
buildbox@sinister:wcwidth.awk [1]$

Variable not detected as array type in presence of recursion

Nelson Beebe reported the following issue, where goawk doesn't detect array as an array variable - presumably due to the presence of recursion. This code works fine in gawk and mawk.

# t.awk
function less(a,b)
{
    return (a < b)
}

function partition(array,left,right,    i,j,swap,v)
{
    i = left - 1
    j = right
    v = array[right]
    for (;;)
    {
        while (less(array[++i],v))
            ;
        while (less(v,array[--j]))
        {
            if (j == left)
                break
        }
        if (i >= j)
            break
        swap = array[i]
        array[i] = array[j]
        array[j] = swap
    }
    swap = array[i]
    array[i] = array[right]
    array[right] = swap
    return (i)
}

function quicksort(array,left,right,    i)
{
    # The code in partition() and quicksort() is a direct translation
    # of the simple quicksort algorithm given in Robert Sedgewick's
    # ``Algorithms in C'', 3rd edition, Addison-Wesley, 1998,
    # pp. 305--307.  We need an O(N lg N) algorithm here instead of a
    # simpler O(N^2) algorithm because the font list has thousands of
    # entries.  There are many things that one can do to tweak
    # quicksort() to make its worst-case behavior of O(N^2) unlikely,
    # and to improve its performance on small sequences by switching
    # to other sorting algorithms.  However, we do not attempt any of
    # those refinements here.
    #
    # The user-defined less(a,b) function conceals the details of how
    # array items are compared.

    if (right <= left)
        return
    i = partition(array,left,right)
    quicksort(array, left, i - 1)
    quicksort(array, i + 1, right)
}

BEGIN {
    a[1] = "aye"
    a[2] = "c"
    a[3] = "bee"
    quicksort(a, 1, 3)
    print 1, a[1]
    print 2, a[2]
    print 3, a[3]
}

The output is:

$ go run . -f t.awk
-------------------------------------------------------------
    i = partition(array,left,right)
        ^
-------------------------------------------------------------
parse error at 50:9: can't pass scalar "array" as array param
exit status 1

Fix parsing of string concatenation with prefix ++

As reported by Nelson Beebe, GoAWK doesn't handle this correctly:

$ goawk 'BEGIN { x = "s" ++n; print x }'
----------------------------------------------------
BEGIN { x = "s" ++n; print x }
                ^
----------------------------------------------------
parse error at 1:17: expected lvalue before ++ or --

Compare to gawk/mawk:

$ gawk 'BEGIN { x = "s" ++n; print x }'
s1
~/h/goawk$ mawk 'BEGIN { x = "s" ++n; print x }'
s1

Note that postfix ++ does work in GoAWK:

$ goawk 'BEGIN { x = "s" n++; print x }'
s0

Feature request for embedded use

This is really cool. Your article about how you wrote it is even better. Congrats!

I have a scenario that would be cool to handle if possible. Your readme shows how you can run the parser from go. The print statements are just going to stdio I assume. It would be cool if in embedded code like that I could route results to another go routine.

The idea is to read some large files of data. Awk is great at the first processing of that data but then I want to be able to operate on that output at the same time (and in a steaming vs. batch way as the inout can be rather large.)

I'm just spit-balling - but it would be cool if in the embedded case you could register channels with the interpreter and then has a special chan function that could be used to send data to the named channel.

In any case, good work. I've always been a fan of AWK and I'm doing a lot of Go now - so this is really fun to see.

Race detected using go test -race

Several data races are found when running go test -race ./... using go 1.16. I'm seeing this on both amd64 and arm64. This is tested agains commit f1bdc5e09b8a84d4af13ee67cf542412d534389e (HEAD -> master, tag: v1.7.0, origin/master).

% go test -race ./... -count=1 
==================
WARNING: DATA RACE
Write at 0x00c0003e6f80 by goroutine 91:
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:200 +0x30
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Previous write at 0x00c0003e6f80 by goroutine 44:
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:200 +0x30
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Goroutine 91 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getInputScannerPipe()
      /tmp/goawk/interp/io.go:157 +0x458
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:821 +0x2bc
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:411 +0x1aac
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170

Goroutine 44 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getOutputStream()
      /tmp/goawk/interp/io.go:102 +0xa50
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:452 +0xc40
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170
==================
==================
WARNING: DATA RACE
Read at 0x00c0003e6f60 by goroutine 91:
  bytes.(*Buffer).Len()
      /usr/local/go/src/bytes/buffer.go:73 +0x30
  bytes.(*Buffer).grow()
      /usr/local/go/src/bytes/buffer.go:118 +0x28
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:202 +0x5c
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Previous write at 0x00c0003e6f60 by goroutine 44:
  bytes.(*Buffer).grow()
      /usr/local/go/src/bytes/buffer.go:144 +0x1f8
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:202 +0x5c
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Goroutine 91 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getInputScannerPipe()
      /tmp/goawk/interp/io.go:157 +0x458
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:821 +0x2bc
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:411 +0x1aac
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170

Goroutine 44 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getOutputStream()
      /tmp/goawk/interp/io.go:102 +0xa50
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:452 +0xc40
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170
==================
==================
WARNING: DATA RACE
Read at 0x00c0003e6f78 by goroutine 91:
  bytes.(*Buffer).Len()
      /usr/local/go/src/bytes/buffer.go:73 +0x4c
  bytes.(*Buffer).grow()
      /usr/local/go/src/bytes/buffer.go:118 +0x28
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:202 +0x5c
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Previous write at 0x00c0003e6f78 by goroutine 44:
  bytes.(*Buffer).grow()
      /usr/local/go/src/bytes/buffer.go:147 +0x230
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:202 +0x5c
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Goroutine 91 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getInputScannerPipe()
      /tmp/goawk/interp/io.go:157 +0x458
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:821 +0x2bc
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:411 +0x1aac
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170

Goroutine 44 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getOutputStream()
      /tmp/goawk/interp/io.go:102 +0xa50
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:452 +0xc40
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170
==================
==================
WARNING: DATA RACE
Write at 0x00c0001167a0 by goroutine 72:
  bytes.(*Reader).WriteTo()
      /usr/local/go/src/bytes/reader.go:139 +0x30
  io.copyBuffer()
      /usr/local/go/src/io/io.go:405 +0x3e4
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x78
  os/exec.(*Cmd).stdin.func1()
      /usr/local/go/src/os/exec/exec.go:266 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Previous write at 0x00c0001167a0 by goroutine 71:
  bytes.(*Reader).WriteTo()
      /usr/local/go/src/bytes/reader.go:139 +0x30
  io.copyBuffer()
      /usr/local/go/src/io/io.go:405 +0x3e4
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x78
  os/exec.(*Cmd).stdin.func1()
      /usr/local/go/src/os/exec/exec.go:266 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Goroutine 72 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getInputScannerPipe()
      /tmp/goawk/interp/io.go:157 +0x458
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:821 +0x2bc
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:411 +0x1aac
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170

Goroutine 71 (finished) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getInputScannerPipe()
      /tmp/goawk/interp/io.go:157 +0x458
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:821 +0x2bc
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:667 +0x1324
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:546 +0x1cc8
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170
==================
==================
WARNING: DATA RACE
Read at 0x00c000116750 by goroutine 74:
  bytes.(*Buffer).Len()
      /usr/local/go/src/bytes/buffer.go:73 +0x30
  bytes.(*Buffer).grow()
      /usr/local/go/src/bytes/buffer.go:118 +0x28
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:202 +0x5c
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Previous write at 0x00c000116750 by goroutine 48:
  bytes.(*Buffer).ReadFrom()
      /usr/local/go/src/bytes/buffer.go:209 +0x174
  io.copyBuffer()
      /usr/local/go/src/io/io.go:409 +0x388
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x5c
  os/exec.(*Cmd).writerDescriptor.func1()
      /usr/local/go/src/os/exec/exec.go:311 +0x30
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:441 +0x28

Goroutine 74 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getInputScannerPipe()
      /tmp/goawk/interp/io.go:157 +0x458
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:821 +0x2bc
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:411 +0x1aac
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170

Goroutine 48 (finished) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:440 +0x768
  github.com/benhoyt/goawk/interp.(*interp).getInputScannerPipe()
      /tmp/goawk/interp/io.go:157 +0x458
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:821 +0x2bc
  github.com/benhoyt/goawk/interp.(*interp).eval()
      /tmp/goawk/interp/interp.go:667 +0x1324
  github.com/benhoyt/goawk/interp.(*interp).execute()
      /tmp/goawk/interp/interp.go:546 +0x1cc8
  github.com/benhoyt/goawk/interp.(*interp).executes()
      /tmp/goawk/interp/interp.go:398 +0x74
  github.com/benhoyt/goawk/interp.(*interp).execBeginEnd()
      /tmp/goawk/interp/interp.go:312 +0x7c
  github.com/benhoyt/goawk/interp.ExecProgram()
      /tmp/goawk/interp/interp.go:270 +0xc3c
  github.com/benhoyt/goawk_test.interpGoAWKStdin()
      /tmp/goawk/goawk_test.go:181 +0x278
  github.com/benhoyt/goawk_test.TestGAWK.func1()
      /tmp/goawk/goawk_test.go:261 +0x448
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1194 +0x170
==================
--- FAIL: TestGAWK (0.32s)
    --- FAIL: TestGAWK/iobug1 (0.01s)
        testing.go:1093: race detected during execution of test
    --- FAIL: TestGAWK/rstest4 (0.01s)
        testing.go:1093: race detected during execution of test
    testing.go:1093: race detected during execution of test
FAIL
FAIL	github.com/benhoyt/goawk	5.182s
?   	github.com/benhoyt/goawk/internal/ast	[no test files]
ok  	github.com/benhoyt/goawk/internal/strutil	0.247s
--- FAIL: TestNative (0.01s)
    --- FAIL: TestNative/_BEGIN_{__print__print_bool(),_bool(0),_bool(1),_bool(""),_bool("0"),_ (0.00s)
        interp_test.go:745: expected "\n0 0 1 0 1 1\n0 42 -5 3 -3\n0 42 -5 127 -128 -1 0\n0 42 -5 32767 -32768 -1 0\n0 42 -5 2147483647 -2147483648 -2147483648 -2147483648\n0 42 -5 2147483647000 -2147483647000\n0 42 1.84467e+19\n0 42 251 127 128 255 0\n0 42 65535 65535 0\n0 42 4294967295 4294967295 0\n0 42 1.84467e+19 4294967296 2147483647000\n..Foo bar.1234\n..Foo bar.1234\n", got "\n0 0 1 0 1 1\n0 42 -5 3 -3\n0 42 -5 127 -128 -1 0\n0 42 -5 32767 -32768 -1 0\n0 42 -5 2147483647 2147483647 2147483647 2147483647\n0 42 -5 2147483647000 -2147483647000\n0 42 0\n0 42 251 127 128 255 0\n0 42 65535 65535 0\n0 42 4294967295 4294967295 0\n0 42 0 4294967296 2147483647000\n..Foo bar.1234\n..Foo bar.1234\n"
FAIL
FAIL	github.com/benhoyt/goawk/interp	5.045s
ok  	github.com/benhoyt/goawk/lexer	0.693s
ok  	github.com/benhoyt/goawk/parser	0.445s
FAIL
(exit 1)                                          

in particular, race testing TestGAWK/iobug1 and TestGAWK/rstest4 are consistently repeatable.

getline should silently return -1 when an input redirection fails

Consider this comparison (done on current goawk source):

$ for i in nawk mawk gawk mksawk 'busybox awk' goawk
> do echo ======= $i
> $i 'BEGIN { print getline foo < "/no/such/file" }'
> done
======= nawk
-1
======= mawk
-1
======= gawk
-1
======= mksawk
-1
======= busybox awk
-1
======= goawk
input redirection error: open /no/such/file: no such file or directory

Goawk should not fatal out in this case. Instead, getline should return -1.

Issue parsing | as field separator?

I have a simple program:

package main
import (
	"fmt"
	"bytes"
	"github.com/benhoyt/goawk/interp"
)

func main() {
	input := bytes.NewReader([]byte("xyz title | Healthcare IT News"))
	err := interp.Exec(" $0 { print $1}", "|", input, nil)
	if err != nil {
		fmt.Println(err)
		return
	}
}

The output produces:
x

Instead of:
xyz title

Any idea why the "|" is not being parsed properly? If I use comma or hyphen, it works.

it can't work on windows?

I download the newest release version, and run as example (Win7/X64, under cmder):

$ goawk 'BEGIN { print "foo", 42 }'
---------------------------------------------------
'BEGIN
      ^
---------------------------------------------------
parse error at 1:7: didn't find end quote in string
$ echo 1 2 3 | goawk '{ print $1 + $3 }'
---------------------------------------------------
'{
  ^
---------------------------------------------------
parse error at 1:3: didn't find end quote in string

If I use use double-quoted instead single-quoted, it can work:

$ goawk "BEGIN { print "foo", 42 }"
 42

this result is 42, lose 'foo' ?

$ echo 1 2 3 | goawk "{ print $1 + $3 }"
4
$ goawk "BEGIN { print 'foo', 42 }"
foo 42

work ok.

IF I RUN THAT:

$ goawk "{print 'foo'}"

!!!RUN WITH ENDLESS!!!

How I can set output of interp.ExecProgram to a variable

input:= `1T;2T;3T;4T;5T;6`
src := "{ print $2,$3,$6 }"
fieldSep := "T;"
prog, err := parser.ParseProgram([]byte(src), nil)
if err != nil {
	fmt.Println(err)
	return
}
config := &interp.Config{
	Stdin: bytes.NewReader([]byte(input)),
	Vars:  []string{"FS", fieldSep},
}
_, err = interp.ExecProgram(prog, config)
if err != nil {
	fmt.Println(err)
	return
}

Error "can't pass scalar" when var could be array

(From an email bug report via Arnold Robbins, maintainer of Gawk.)

In testing a large awk program of mine for portability, I came across
this bug in goawk. Given this code:

----------------------------
function testit()
{
        del_array(Foo)
}

function del_array(array)
{
        split("", array)
}
----------------------------

I get this error:

$ gowork/bin/goawk -f  /tmp/x.awk
----------------------------------------------------------
    del_array(Foo)
    ^
----------------------------------------------------------
parse error at 3:2: can't pass scalar "Foo" as array param

This isn't an error, since Foo could be an array by the time
testit() is called.

This is a small test case cut down from my much larger program.

goawk's printf family of functions under Windows

Please let single-quotes be used in the printf family of functions. Example:

C:\bin>dir c:\windows\system32 /s/a-d/-c | mawk "{sum+=$4} END {printf('Total size: %.0f bytes,  %3.2f (GB) for %1d files.\n', sum, (sum/1073741824.0), NR)}"
Total size: 4559643636 bytes,  4.25 (GB) for 24857 files.

Under goawk:

C:\bin>dir c:\windows\system32 /s/a-d/-c | goawk "{sum+=$4} END {printf('Total size: %.0f bytes,  %3.2f (GB) for %1d files.\n', sum, (sum/1073741824.0), NR)}"
------------------------------------
{sum+=$4} END {printf('Total size: %.0f bytes,  %3.2f (GB) for %1d files.\n', sum, (sum/1073741824.0), NR)}
                      ^
------------------------------------
parse error at 1:23: unexpected '\''

Under Windows, the double-quote has to be used to combine space-containing arguments into one single, command-line argument. Single quotes can not be used. Therefore, single-quotes must be allowed in the goawk implementation of the printf family of functions. Can you please make it so that the printf family can use either single or double quotes?

Fix how we handle numstr type when setting $0 directly

For example, this script prints 1 under GoAWK, but under (g)awk it prints 0 (presumably the correct answer):

BEGIN { $0="0"; print !$0 }

There's a commented-out test for this in interp/interp_test.go:

{`BEGIN { $0="0"; print !$0 }`, "", "0\n", "", ""},

compilation error on Windows

C:\GitHub>git clone https://github.com/benhoyt/goawk.git && cd goawk && go version && go build

Cloning into 'goawk'...
remote: Counting objects: 1728, done.
remote: Compressing objects: 100% (238/238), done.
remote: Total 1728 (delta 212), reused 324 (delta 142), pack-reused 1328
Receiving objects: 100% (1728/1728), 1.72 MiB | 8.24 MiB/s, done.
Resolving deltas: 100% (762/762), done.

go version go1.10.3 windows/amd64

# _/C_/GitHub/goawk
.\goawk.go:147:13: undefined: interp.Config
.\goawk.go:160:17: undefined: interp.ExecProgram

Panic when parsing config defined functions with config defined variable args

When parsing a program using config defined functions, where an argument is a config defined variable, the parser panics with:

panic: runtime error: index out of range [recovered]
        panic: interface conversion: interface {} is runtime.errorString, not *parser.ParseError [recovered]                                
        panic: interface conversion: interface {} is runtime.errorString, not *parser.ParseError                                            

goroutine 321 [running]:
testing.tRunner.func1(0xc000179a00)
        /usr/local/Cellar/go/1.11.4/libexec/src/testing/testing.go:792 +0x387                                                               
panic(0x118aea0, 0xc000386a50)
        /usr/local/Cellar/go/1.11.4/libexec/src/runtime/panic.go:513 +0x1b9                                                                 
github.com/benhoyt/goawk/parser.ParseProgram.func1(0xc00011fe08)
        /Users/ash/src/go/goawk/parser/parser.go:59 +0x98
panic(0x11895e0, 0x130d8f0)
        /usr/local/Cellar/go/1.11.4/libexec/src/runtime/panic.go:513 +0x1b9                                                                 
github.com/benhoyt/goawk/parser.(*parser).resolveVars(0xc000333180, 0xc000388310)                                                           
        /Users/ash/src/go/goawk/parser/resolve.go:260 +0x1caa
github.com/benhoyt/goawk/parser.(*parser).program(0xc000333180, 0xc000333180)                                                               
        /Users/ash/src/go/goawk/parser/parser.go:176 +0x8ca
github.com/benhoyt/goawk/parser.ParseProgram(0xc0000145a0, 0x16, 0x20, 0xc0000326e0, 0x0, 0x0, 0x0)                                         
        /Users/ash/src/go/goawk/parser/parser.go:71 +0x147

The panic is reproducible with this unit test:

func TestConfigVarsWithNativeFuncs(t *testing.T) {
	funcs := map[string]interface{}{
		"foo": func(i int) int {
			return i
		},
	}
	prog, err := parser.ParseProgram(
		[]byte(`BEGIN { print foo(x) }`), &parser.ParserConfig{
			Funcs: funcs,
		},
	)
	if err != nil {
		t.Fatalf("error parsing: %v", err)
	}

	outBuf := &bytes.Buffer{}
	errBuf := &bytes.Buffer{}
	config := &interp.Config{
		Stdin:  strings.NewReader(""),
		Output: outBuf,
		Error:  errBuf,
		Funcs:  funcs,
		Vars:   []string{"x", "5"},
	}
	if _, err = interp.ExecProgram(prog, config); err != nil {
		t.Fatal(err)
	}

	exp := "5\n"
	normalized := normalizeNewlines(outBuf.String() + errBuf.String())
	if normalized != exp {
		t.Fatalf("expected %q, got %q", exp, normalized)
	}
}

At a glance it looks as though it's because config defined functions aren't tracked the same way as others and this breaks the way in which variable type inference is done.

hexadecimal value escapes

These are currently supported by GoAWK:

$ goawk 'BEGIN { print "hello\x21" }'
hello!

However they are not defined by POSIX:

One sequence that is not supported is hexadecimal value escapes beginning with
\x. This would allow values expressed in more than 9 bits to be used within
awk as in the ISO C standard. However, because this syntax has a
non-deterministic length, it does not permit the subsequent character to be a
hexadecimal digit. This limitation can be dealt with in the C language by the
use of lexical string concatenation. In the awk language, concatenation
could also be a solution for strings, but not for extended regular expressions
(either lexical ERE tokens or strings used dynamically as regular
expressions). Because of this limitation, the feature has not been added to
POSIX.1-2017.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_18

Gawk deals with this by disabling them via gawk --posix or
POSIXLY_CORRECT=y. If the feature is to remain it should perhaps be mentioned
here:

https://github.com/benhoyt/goawk#differences-from-awk

Support for capturing groups in match function

I'm not sure if adding extensions to awk is something you want to do, but I find having support for capturing groups in match useful, so I've added it here: https://github.com/pstibrany/goawk/commit/45fd318ecad5eaa3a0e98c60f40aaf89e29716b8

It is based on Gawk extension (described at https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html), although I didn't do "start" and "length" subscripts described there, as I didn't know how :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.