Giter Site home page Giter Site logo

json.awk's Introduction

JSON.awk

A practical JSON parser written in awk.

https://github.com/step-/JSON.awk

Introduction

JSON.awk is a self-contained, single-file program with no external dependencies. It is similar to JSON.sh, a JSON parser written in Bash -- retrieved on 2013-03-13 to form the basis for JSON.awk. Since then the projects have separated their development paths, each one adding new features that you will not find in the other.

Features

  • Single file without external dependencies
  • Can parse multiple input files within a single invocation (one JSON text per file)
  • Callback interface (awk) to hook into parser and output events
  • Library of practical callbacks (optional)
  • Capture invalid JSON input for further processing
  • Choice of MIT or Apache 2 license
  • JSON.sh compatible (as of 2013-03-13) default output format

Non-features

  • Transforming input values, e.g., string/number normalization

Compatibility with Awk Implementations

Of the many awk implementations around, JSON.awk works better with the POSIX ones and with GNU awk. JSON.awk is routinely tested on Linux with gawk, busybox awk and mawk in this order. I recommend gawk. JSON.awk does not require GNU gawk extensions, and the differences of running gawk with or without the --posix option enabled are minimal, if any. Running with busybox awk requires a simple patch FAQ. Running with mawk requires mawk version 1.3.4 20150503 or higher FAQ.

Supported Platforms

All OS platforms for which a POSIX awk implementation is available. Special cases:

Conformance

There is no official conformance test for the JSON language. Thankfully, some unofficial test suites exist. JSON.awk is tested against the JSONTestSuite.

Test results and comparisons

Installing

Add files JSON.awk and optionally callbacks.awk to your project and follow the examples.

Usage Examples

For full instructions please read the docs. Mawk users please read the FAQ. Busybox awk users also please read the FAQ.

Passing file names as command arguments:

awk -f JSON.awk file1.json [file2.json...]

awk -f JSON.awk - < file.json

cat file.json | awk -f JSON.awk -

Passing file names on stdin:

echo -e "file1.json\nfile2.json" | awk -f JSON.awk

Using callbacks to build a custom application (FAQ 5):

awk -f your-callbacks.awk -f JSON.awk file.json

Applications

  • Opera-bookmarks.awk Extract (Chromium) Opera bookmarks and QuickDial thumbnails. Convert bookmark data to SQLite database and CSV file.

Projects known to use JSON.awk

  • KindleLauncher a.k.a. KUAL, an application launcher for the Kindle e-ink models, uses JSON.awk to parse menu descriptions.

License

This software is available under the following licenses:

  • MIT
  • Apache 2

Credits

  • JSON.sh's source code, retrieved on 2013-03-13, more than inspired version 1.0 of JSON.awk; without JSON.sh this project would not exist.

  • gron for inspiration leading to library module js-dot-path.awk, and for some test files.

  • JSONTestSuite

json.awk's People

Contributors

kimbo avatar mohd-akram avatar step- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

json.awk's Issues

Can't parse stdin with JSON data

There are cases when JSON.awk can't parse valid JSON data coming from a pipe. For instance see issue #6, comments 6 (definition) through 8 (analysis). Test case file

Error about regex

I got this problem.
awk: ./JSON.awk:258: warning: regexp escape sequence "' is not a known regexp operator`

it's just a warning. But the result is ok.

Bug?

gawk: JSON.awk:231: (FILENAME=c.tmp FNR=1) fatal: delete: illegal use of variable `JPATHS' as array

My gawk can not delete befor use.

split("",JPATHS)
is better than
delete JPATHS;

Parse files with byte-order-mark

When parsing a JSON file with BOM such as:

https://github.com/dotnet/toolset/blob/40cc5860e2ef311b9aca733b1d2eccaa681bd422/TestAssets/InstallationScriptTests/InstallationScriptTests.json

JSON.awk gives the following error:

/datadrive/projects/toolset/TestAssets/InstallationScriptTests/InstallationScriptTests.json: expected <value> but got <> at input token 1
<<>> { "sdk" : { "version" : "1.0.0-beta.19463.3" } }

Current workaround is to strip these charecters using tool like sed sed '1s/^\xEF\xBB\xBF//' "$json_file" | awk -f JSON.awk - | .....

It would be nice if parser ignores these BOM characters so consumer do not need to strip them.

mawk 1.3.3 support (Debian/Ubuntu)

It seems that the script is not mawk compatible. Running with test file:

{
    "asd":1
}

Result:

./test.json: expected <string> but got <EOF> at input token 2
{ <<EOF>>          " a s d "
invalid: ./test.json
expected <string> but got <EOF> at input token 2
{ <<EOF>>          " a s d "

System info:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

AWK: ii mawk 1.3.3-17ubuntu2

It works as expected with gawk.

HOW-TO mac OSX

I do not have access to Apple computers so I can't test JSON.awk under OSX. I would like for this thread to become a self-help resource for mac users.

If you have found an issue involving JSON.awk and a mac OSX system, and you have found a solution, please post a comment here describing the issue and the solution you have found.

In the interest of anyone reading please do not post issues without solutions. If you want support please open a new issue. This thread is about solutions.

No output on GNU AWK, wrong output depending on AWK version

Introduction

Version 1.01 includes a kludgy work-around for some AWK implementations that incorrectly tokenize the backslash-quote escape in JSON strings. Recently I stumbled upon a good test case which has enabled me to:

  • Get rid of the kludgly work-around
  • Uncover and fix another major AWK implementation-dependent issue

This fix is included in JSON.awk version 1.1 (recommended upgrade) along with other minor enhancements and fixes.

If you want to trace the following steps you should start from https://github.com/step-/JSON.awk/blob/962a8e4a44eb310866a587df355973e62d43790c/JSON.awk and comment out lines 236-239, 241-242.

Issue Description

Create file test.json with the following contents:

{"sh": "[ -r 'k.cfg' ] || echo \"# k.cfg - `date`\" >'k.cfg';s=$(awk \"`echo -n 'BEGIN{nf=1} /^@Bs*mode=/{sub(/=.*/,@Q=123@Q);nf=0} {print} END{if(nf) print @Qmode=123@Q}'|sed 's/@Q/\\x22/g;s/@B/\\x5C/g'`\" 'k.cfg') && [ 0 != ${#s} ] && echo -n \"$s\" >'k.cfg'"}

Parse test.json with JSON.awk:

echo -e "test.json\n" | awk -f JSON.awk

I am going to show output from two AWK implementations; (A) GNU AWK 4.0.0 and (B) Busybox 1.17.1 AWK.
(A)

expected <, or }> but got <#> at input token 5
{ "sh" : "[ -r 'k.cfg' ] || echo \" <<#>> k . c f g - ` d a t

(B)

expected <value> but got <"> at input token 4
{ "sh" : <<">> [ - r ' k . c f g ' 

Now change line 230 from

CHAR="[^[:cntrl:]\\\"]"

to:

CHAR="[^[:cntrl:]\"\\]"

(A)

 awk: JSON.awk:241: (FILENAME=uiui FNR=1) fatal: Invalid range end: /"[^[:cntrl:]"\]*((\[^u[:cntrl:]]|\u[0-9a-fA-F]{4})[^[:cntrl:]"\]*)*"|-?(0|[1-9][0-9]*)([.][0-9]*)?([eE][+-]?[0-9]*)?|null|false|true|[[:space:]]+|./

(B)

expected <value> but got <"> at input token 4
{ "sh" : <<">> [ - r ' k . c f g ' 

So we get very different results and, in all cases, the tokenizer fails. Replacing a regex constant for the string constant in gsub() on line 241 fixes this issue on both platforms! Change line 241 to:

gsub(/\"[^[:cntrl:]\"\\]*((\\[^u[:cntrl:]]|\\u[0-9a-fA-F]{4})[^[:cntrl:]\"\\]*)*\"|-?(0|[1-9][0-9]*)([.][0-9]*)?([eE][+-]?[0-9]*)?|null|false|true|[[:space:]]+|./, "\n&", a1)

You may comment out line 230 if you like, it makes no difference anymore.

(A)

["sh"]  "[ -r 'k.cfg' ] || echo \"# k.cfg - `date`\" >'k.cfg';s=$(awk \"`echo -n 'BEGIN{nf=1} /^@Bs*mode=/{sub(/=.*/,@Q=123@Q);nf=0} {print} END{if(nf) print @Qmode=123@Q}'|sed 's/@Q/\\x22/g;s/@B/\\x5C/g'`\" 'k.cfg') && [ 0 != ${#s} ] && echo -n \"$s\" >'k.cfg'"

(B)

["sh"]  "[ -r 'k.cfg' ] || echo \"# k.cfg - `date`\" >'k.cfg';s=$(awk \"`echo -n 'BEGIN{nf=1} /^@Bs*mode=/{sub(/=.*/,@Q=123@Q);nf=0} {print} END{if(nf) print @Qmode=123@Q}'|sed 's/@Q/\\x22/g;s/@B/\\x5C/g'`\" 'k.cfg') && [ 0 != ${#s} ] && echo -n \"$s\" >'k.cfg'"

Conclusion

Although a complex regex constant isn't very readable it gets the job done correctly, so this change is committed for good - with due comments - in version 1.1

How to embed JSON.awk without editing function apply?

(Request started in PR #11)

The current method for embedding JSON.awk in a larger awk application requires modifying JSON.awk by editing stub function apply. Can a new embedding method be defined that doesn't require modifying the JSON.awk script, and that works across POSIX awk, mawk and gawk?

For instance, the following method (not tested, suggested in #11) leverages gawk's @include statement, which POSIX awk and mawk don't support.

$ declare -A "$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID \
  |awk -v STREAM=0 -v ARRAY="DEPLOYMENT_GITHUB" -v KEYS='"deploymentInfo","revision","gitHubLocation","(commitId|repository)"' '
    @include "json.awk";
{
    array = sprintf("%s=(",ARRAY);
    regex = "["KEYS"]";
    for (key in JPATHS) {
        if( JPATHS[key] ~ KEYS ) {
            n=patsplit(JPATHS[key], path, "\"([^\"]+)\"");
            array = sprintf("%s [%s]=%s", array, path[n-1], path[n]);
        }
    }
    array = sprintf("%s )", array);
    printf "%s", array;
}' -
)"
$ echo $DEPLOYMENT_GITHUB['repository'];
$ echo $DEPLOYMENT_GITHUB['commitId'];

[Busybox] "Invalid regexp" after applying the busybox patch from FAQ

Hello,

I'm running Busybox on Alpine 3.12.1. I'm still getting an "Invalid regexp" after applying the Busybox patch from the FAQ (sed -i "s#\\\000#\\\001#g" JSON.awk).

Before patch:

/ # wget -qO- http://localhost:8080/actuator/metrics/jvm.memory.committed | awk -f JSON.awk -
awk: bad regex '^|^��|^��|"[^"\\': Invalid regexp

After patch (sed -i "s#\\\000#\\\001#g" JSON.awk):

/ # wget -qO- http://localhost:8080/actuator/metrics/jvm.memory.committed | awk -f JSON.awk -
awk: bad regex '^|^��|^��|"[^"\\╔-]*((\\[^u╔-]|\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])[^"\\╔-]*)*"|-?(0|[1-9][0-9]*)([.][0-9]+)?([eE][+-]?[0-9]+)?|null|false|true|[
]+|.': Invalid regexp
/ #

Infos:

/ # cat /etc/alpine-release
3.12.1
/ # awk --help
BusyBox v1.31.1 () multi-call binary.

Usage: awk [OPTIONS] [AWK_PROGRAM] [FILE]...

        -v VAR=VAL      Set variable
        -F SEP          Use SEP as field separator
        -f FILE         Read program from FILE
        -e AWK_PROGRAM
/ #

What are the valid ESCAPE characters?

The JSON grammar diagram on www.json.org shows a small set of valid escaped charactes, while the ESCAPE regex in tokenizer() is more inclusive. Should the ESCAPE regex be reduced to comply with json.org?

What do other JSON parsers do?

  • jshon reports, i.e., \x22 as an invalid escape

Experts advice?

Use of [:cntrl:] character class in tokenize()

The POSIX [:cntrl:] character class does not exactly cover the same chars which must be escaped according to https://tools.ietf.org/html/rfc7159#section-7 (i.e. U+0000 through U+001F). [:cntrl:] does also cover U+007F, and all C1 control chars when used in a UTF locale. See the below example where I am getting an error in my locale "en_US.UTF-8". Apart from using LC_ALL=C, the error can be avoided when changing [:cntrl:] to the range defined in the spec: \x00-\x1F.

$ echo world_bank109.json | awk -f JSON.awk > /dev/null
world_bank109.json: expected <value> but got <"> at input token 263
, "productlinetype" : "L" , "project_abstract" : { "cdata" : <<">> T h e o b j e c t i
$ echo world_bank109.json | LC_ALL=C awk -f JSON.awk > /dev/null
(no error message here)

world_bank109.json.txt, which is line 109 from the world bank sample file at http://jsonstudio.com/resources/

illegal primary in regular expression ^(|[^0-9])$

Got the following error when I tried json.awk on OSX.

awk: illegal primary in regular expression ^(|[^0-9])$ at [^0-9])$
 source line number 158 source file json.awk
 context is
        } else if (TOKEN ~ >>>  /^(|[^0-9])$/ <<< ) {

How to pass JSON using pipeline?

I want to be able to use the following command:

$ curl -s https://raw.github.com/archan937/jsonv.sh/master/examples/complex.json | awk -f JSON.awk

But until now, I have found that I can pass a file path, e.g.:

$ echo "examples/complex.json" | awk -f utils/json.awk

and that I can enter JSON with cat:

$ { echo -; echo; cat; } | awk -f utils/json.awk 
{"foo":"bar"}
^D
["foo"] "bar"

The latter option is close but I want it to be automated.

Can you provide a solution for this? I am trying to use JSON.awk with https://github.com/archan937/jsonv.sh (instead of JSON.sh)

function cb_fail1 never defined

I have downloaded the JSON.awk-1.4.2.tar.gz from JSON.awk tarball, and ported it to Ubuntu 18.04.3 LTS (Bionic Beaver), when I run it with
awk -f JSON.awk object.json
and it print out

awk: JSON.awk: line 409: function cb_fail1 never defined
awk: JSON.awk: line 409: function cb_parse_object_exit never defined
awk: JSON.awk: line 409: function cb_parse_object_enter never defined
awk: JSON.awk: line 409: function cb_parse_object_empty never defined
awk: JSON.awk: line 409: function cb_parse_array_exit never defined

the "object.json" file content list below:
{
"key": "Value"
}
Is it a issue?

Even the simplest json can not be parsed on FreeBSD or OS X

curl -O https://raw.githubusercontent.com/step-/JSON.awk/master/JSON.awk
cat <<EOF > x.json
{
    "name": "latest"
}
EOF
awk -f JSON.awk x.json 

produces

x.json: expected <string> but got <"> at input token 2
{ <<">> 
x.json: expected <value> but got <a> at input token 1
<<a>> m e ": " l a t e s t " 
invalid: x.json
expected <string> but got <"> at input token 2
{ <<">> 
expected <value> but got <a> at input token 1
<<a>> m e ": " l a t e s t "

This is on FreeBSD with awk version 20121220 (FreeBSD) as well as on Darwin with awk version 20070501.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.