Giter Site home page Giter Site logo

json-cpp-gen's Introduction

JSON-CPP-gen

This is a program that parses C++ structures from a header file and automatically generates C++ code capable of serializing said structures into the JSON format and parsing it back.

For example, if provided with the following structure as input,

struct User {
    int id;
    std::string name;
    std::vector<int> friendIds;
};

JSON-CPP-gen can generate a UserParser class that can be used as

User user;
auto error = UserParser::parse(user, jsonString);

to automatically parse a JSON string that matches the stucture, e.g.

{
    "id": 137,
    "name": "John Smith",
    "friendIds": [ 63, 51, 206 ]
}

And analogously, it can generate a UserSerializer class for the inverse operation. Of course, the input may be more complex - see Configuration and the generated ConfigurationParser for a more complex example.

The generated parsers and serializers are highly efficient because no intermediate DOM or other auxiliary data structures are constructed - data is read directly from / to the input structures.

How to use

To build the program, simply use the provided CMake file. JSON-CPP-gen has no dependencies besides the C++ standard library.

To run the program, you must provide a JSON configuration file that corresponds to the Configuration structure as its command line argument. For example, to generate the UserParser and UserSerializer classes described above, you could use a configuration file containing:

{
    "inputs": [ "User.h" ],
    "includes": [ ],
    "settings": { },
    "parsers": [ {
        "name": "UserParser",
        "types": [ "User" ],
        "headerOutput": "UserParser.h",
        "sourceOutput": "UserParser.cpp"
    } ],
    "serializers": [ {
        "name": "UserSerializer",
        "types": [ "User" ],
        "headerOutput": "UserSerializer.h",
        "sourceOutput": "UserSerializer.cpp"
    } ],
    "stringType": "std::string"
}

All (non-absolute) file names are relative to the configuration file. includes may contain additional include files that should be present in the output files and settings may contain code generator settings. The includes, settings, and stringType fields could be omitted in this case since they are empty or equal to the default value. Even parsers or serializers could be omitted if you only needed one of the two. Additionally, custom string and container data types can be defined in the configuration file - see below. Another example is the configuration file for the ConfigurationParser class.

Features

  • Supported C++ types:
    • All fundamental integer and floating-point types, bool, and size_t
    • Structures (struct)
    • Enumerations (enum and enum class) - serialized as strings
    • Static arrays, including multi-dimensional
    • Standard library types:
      • std::string for strings
      • std::vector, std::deque, std::list for dynamic-sized arrays
      • std::array for static-sized arrays
      • std::map for objects with arbitrary keys and homogeneous values
      • std::optional as well as std::auto_ptr, std::shared_ptr, and std::unique_ptr for optional values
    • Custom string, array, object, and optional types if defined in the configuration file, see below
  • Namespaces
  • Structure inheritance (basic support)
  • Full UTF-8 & UTF-16 support in JSON

Currently NOT supported but planned features:

  • Omitting specific member variables from parsing and serialization - planned via annotations
  • Structure members under different name in JSON - planned via annotations
  • Classes - currently ignored due to their data being typically private but explicit enablement via annotations is planned - access to private members must be ensured by the user
  • #define, typedef, using aliases - basic support planned

What will (probably) never be supported:

  • Heterogeneous JSON objects - not really representable by static stuctures
  • Unions - impossible to serialize without more advanced logic
  • Raw pointers - no point in serializing memory addresses and unclear memory management otherwise
  • Template structures
  • More complex expression / macro evaluation (e.g. array of length 20*sizeof(User))

Custom types

Even though the standard library string and container types are supported by default, you are not locked into using these exclusively in order for the automatic parser and serializer generation to work. In fact, you may configure JSON-CPP-gen even to produce code where none of the C++ standard library is used.

However, in order to do that, you must provide a set of replacement classes, and for each an API that dictates how it should be used. The API consists of code snippets for specific operations with placeholders. See comments in Configuration.h for a full list of the placeholders. The following are examples for each category of definable custom types and how they should be written in the configuration file. You can specify multiple types for each category.

These are mostly demonstrated on types from the standard library, which however should not be put into the configuration file as they are present by default.

String (dynamic)

"stringTypes": [ {
    "name": "std::string",
    "api": {
        "clear": "$S.clear()",
        "getLength": "$S.size()",
        "getCharAt": "$S[$I]",
        "appendChar": "$S.push_back($X)",
        "appendCStr": "$S += $X",
        "appendStringLiteral": "$S += $X",
        "equalsStringLiteral": "$S == $X",
        "iterateChars": "for (char $E : $S) { $F }"
    }
} ]

Constant string

Represents a string type that cannot be constructed incrementally. Does not have an analogue in the standard library. An intermediate (dynamic) string type must be specified for the parser (can be std::string).

"constStringTypes": [ {
    "name": "ConstString",
    "stringType": "std::string",
    "api": {
        "copyFromString": "$S = $X",
        "moveFromString": "$S = std::move($X)",
        "iterateChars": "for (char $E : $S) { $F }"
    }
} ]

Array (dynamic)

"arrayContainerTypes": [ {
    "name": "std::vector<$T>",
    "api": {
        "clear": "$S.clear()",
        "refAppended": "($S.emplace_back(), $S.back())",
        "iterateElements": "for ($T const &$E : $S) { $F }"
    }
} ]

Note: The refAppended operation must add an empty element at the end of the array and return a modifiable reference to the new element.

Fixed-length array

Represents a non-statically fixed-length array that cannot be constructed incrementally. Does not have an analogue in the standard library. An intermediate (dynamic) array type must be specified for the parser (can be std::vector or another implicit array type).

"fixedArrayContainerTypes": [ {
    "name": "FixedArray<$T>",
    "arrayContainerType": "std::vector",
    "api": {
        "copyFromArrayContainer": "$S = $X",
        "moveFromArrayContainer": "$S = std::move($X)",
        "iterateElements": "for ($T const &$E : $S) { $F }"
    }
} ]

Static length array

"staticArrayContainerTypes": [ {
    "name": "std::array<$T, $N>",
    "api": {
        "refByIndex": "$S[$I]"
    }
} ]

Note: $N is array length.

Object (with implicit key type)

"objectContainerTypes": [ {
    "name": "std::map<std::string, $T>",
    "keyType": "std::string",
    "api": {
        "clear": "$S.clear()",
        "refByKey": "$S[$K]",
        "iterateElements": "for (const std::pair<$U, $T> &$I : $S) { $U const &$K = $I.first; $T const &$V = $I.second; $F }"
    }
} ]

Notes: A key type must be specified as keyType (can be std::string). The refByKey operation must create the element if it doesn't already exist and return a modifiable reference to its value. The iterateElements operation must provide keys and values separately as $K and $V.

Object map (with explicit key type)

"objectMapContainerTypes": [ {
    "name": "std::map<$K, $T>",
    "api": {
        "clear": "$S.clear()",
        "refByKey": "$S[$K]",
        "iterateElements": "for (const std::pair<$U, $T> &$I : $S) { $U const &$K = $I.first; $T const &$V = $I.second; $F }"
    }
} ]

Optional value

"optionalContainerTypes": [ {
    "name": "std::optional<$T>",
    "api": {
        "clear": "$S.reset()",
        "refInitialized": "($S = $T()).value()",
        "hasValue": "$S.has_value()",
        "getValue": "$S.value()"
    }
} ]

Note: The refInitialized operation must initialize the container with a default-constructed value and return a modifiable reference to the value.

json-cpp-gen's People

Contributors

chlumsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

json-cpp-gen's Issues

Support pair / tuple

Add support for pair and tuple types like std::pair, represented as a JSON array. The main challenge for std::tuple would be the need to support variadic templates in header parser and custom type definition.

Provide string length to serializer write function

The serializer function void write(const char *str); could have the length of str passed as a second argument, since it is a literal in the majority of the cases. This could very slightly improve performance since string length is almost surely computed when appending to the output JSON string. String API would have to be updated to allow appending a string with known length.

Support raw string literals

If a "raw" string literal (R"(text)") was present in the parsed header files, its range may not be properly detected at least in some cases and it may break the header parser. Make sure this cannot happen.

Automatically create directories for output files

Currently, if the generated files are supposed to go in directories that do not exist yet, the program will fail, unable to create the file. Perhaps, instead the directory structure could be automatically created.

Support enum base type

Support enumerations with an explicit base type, e.g.

enum Foo : short {
    BAR
};

Simply skip the part between enum name and the opening brace.

Custom general type

Another idea I had was to support miscellaneous types in the structures, in different ways:

Converted type

This way, a type in the structure would be converted from / to a type supported by the generator. For example a timestamp type, represented by a string in the JSON, and able to be converted from / to a std::string. Each type like this would have an API template for both conversions.

Custom parse / serialize implementation

Another possibility would be that the custom type would have to provide an implementation of the parse and serialize functions of itself. This option is just an idea and probably should not be implemented.

String conversion

A sort-of hybrid between the two previous options (and raw JSON string #32) would be a type that would parse / serialize itself but not directly. Instead the parser / serializer would first construct a string object and use that to interface with the custom type's API. This would be another way to implement custom number types (#35) if they provide a conversion from / to string.

Incorrect error?

I had a syntax error in my struct as so:

struct Foo {
   enum Bar _bar;
};

and the error code in json-cpp-gen trying to parse this was "INVALID_STRUCT_SYNTAX". Should that be "INVALID_ENUM_SYNTAX"?

[HeaderParser.cpp, parseEnum(), line 242]

Deduplicate code in object map container type

The source code of the following functions is exatly the same between the classes ObjectContainerType and ObjectMapContainerType:

  • generateParserFunctionBody
  • generateSerializerFunctionBody
  • generateClear
  • generateRefByKey
  • generateIterateElements

However, each class must have a different base class. I think this could be fixed with templates and if that fails, simply convert these to static in one class and expose them for the other to use.

enums not working as desired?

(Linux, gcc 9.4.0)

The project looks promising, I'd rather use a generator for serialization instead of code annotation and other weird hacks.

I'm using an enum as a struct field and it is not serializing / parsing as desired. The field is being ignored completely or always parsed as zero.

The desire is to get this:

{"_allkeys":[
{"_act":"NextImage","_keyval":11},
{"_act":"NextImage","_keyval":12},
{"_act":"PrevImage","_keyval":13}
]}

But I'm getting this:

{"_allkeys":[
{"_keyval":11},
{"_keyval":12},
{"_keyval":13}
]}

Note the act_ field, which is an enum, is missing.

I attach what I hope is a complete set of files to demo the problem.

actionKeysTest.zip

Name aliases in configuration

Since #1 will not be implemented for a while, there needs to be another way to resolve cases where JSON field names (or even enumeration values) are not valid C++ names, e.g. protected (reserved keyword) or value-with-dashes, or when the user wants the JSON names to be different for whatever reason.

Anonymous structure support

For example:

struct Foo {
    struct {
        int x;
    } bar;
};

Or perhaps even

struct Foo {
    struct {
        int x;
    };
};

where x would be referred to as Foo::x.

Probably do together with #4.

Allow generation of header-only code

Make it possible to generate parsers and serializers in inlined form that doesn't need to have its own translation unit. The user should be able to pick between the following modes:

  • Header + source file (current output)
  • Single header with implementation directly inlined in class definition
  • Single header with implementation directly following the class definition in the same file
  • Definition header + inline implementation header which is included at the bottom of the first one

typedef support

Parse the typedef keyword and treat it as a valid type. Examples:

typedef std::string StringType;
typedef int FooString;

struct Foo {
    typedef StringType FooString;
    FooString bar;
};

Foo::bar is std::string and not int!

Generate JSON schema

With the currently available data, the program could not only generate parsers and serializers, but also the JSON schema of the root structures. Add schemas array on the same level as parsers and serializers.

Configurable line endings

Currently the files outputted by the program are always saved with Unix-style line endings (LF). Instead, a global lineEndings setting should be added to configuration with possible values NATIVE, LF, CRLF. and possibly a value representing "same as configuration JSON file", although I'm not sure about the last one.

Common parser string buffer

It may be slightly beneficial for performance if the generated parser class had a common string buffer member variable instead of temporaries in individual functions, namely:

// TODO make key a class member to reduce the number of allocations
body += indent+generator->stringType()->name().variableDeclaration("key")+";\n";

// TODO make str a class member to reduce the number of allocations
body += indent+generator->stringType()->name().variableDeclaration("str")+";\n";

Common parser / serializer base class

Implement the Configuration::GeneratorDef::baseClass property. This should allow users to generate a common parser / serializer for the basic types, which take up a lot of space, or even implement their own version (without sscanf etc.), and subsequent parsers / serializers would inherit from it.

Format multi-statement API commands

Some of the API commands, especially iteration for the serializer, tend to result in a long string of commands on a single line (example below). This is not only ugly but also inconsistent with the format of the rest of the generated code, which is formatted very strictly. Simply adding newlines would not be a solution because the indentation would still be messed up. I think it would be best if newlines and indentation would be added as a post-process step to pattern fill.

api.iterateElements = "for (const std::pair<$U, $T> &$I : $S) { $U const &$K = $I.first; $T const &$V = $I.second; $F }";

for (const std::pair<std::string, float> &i : value) { std::string const &key = i.first; float const &elem = i.second; if (prev) write(','); prev = true; serializeStdString(key); write(':'); serializeFloat(elem); }

Always put const at the beginning

To avoid errors, I have put the const keyword for references of uknown types to the safe spot after the type name. However, this is somewhat inconsistent and it is a general convention to have it at the beginning whenever possible. Since pointer and reference types are not supported and static array types do not use references (this part is resolved), I believe it would actually be safe to put const back at the beginning without changing anything else. It also needs to be updated in the standard library API's and the Readme file. Is it a good idea though? How about normalizing it the other way around (always putting const in this position)?

Add space inbetween multiple closing template brackets

Some compilers may have problems with expressions such as std::vector<std::vector<int>>(orig) due to potentially interpreting the two closing brackets as a shift operator. To maximize portability of generated code, it should be ensured that a space is added in such scenarios.

Improve error reporting

If the program fails, it is pretty hard to guess why without using a debugger. Try to provide some useful information regarding the cause of the failure.

Raw JSON string element

If certain parts of the JSON tree don't have a static structure or we don't care about their structure but want to preserve it, it might be useful to provide a special string type to store the subtree without parsing it, and writing it into output JSON as-is. This would also allow users to use another JSON parser for these portions or delay the parsing until it's requested. There could be an option to preserve / strip whitespace formatting for these portions.

Basic macro support

Support parsing simple macros such as

#define STRING_TYPE std::string
#define MAX_ARRAY_LENGTH 256

Do not bother with macros with arguments, nested macros, etc.

Custom number type

Add the possibility to define custom number types (integer or real), e.g. a dynamic-sized "big integer" type. The parser API could look like this:

  • clear - sets $S to zero
  • appendDigit - appends (decimal) digit $X to the whole part ot $S - equivalent to 10*$S+$X, $E is VALUE_OUT_OF_RANGE error statement
  • appendFractionalDigit - appends $I-th (decimal) fractional digit $X to $S, if left blank, the type is assumed to be integer-only
  • setExponent - multiplies $S by 10 to the power of $X
  • makeNegative - changes the value to negative, guaranteed to be called at the end

Or, instead of the last two, there could be finalize with arguments for sign and exponent.

using support

Support the using keyword for types, e.g.

struct Foo {
    using String = std::string;
    String bar;
};

Similar to #9. Probably don't bother with templated using.

Ignore elements with unrecognized template arguments

The intended behavior is to skip any structure elements with unrecognized types. I believe this works fine if the unrecognized type is the direct type of the element, but if it is used within a template argument, the header parser fails.

Improve function name collision resolution

Currently, if parse / serialize functions for different types end up with the same name, the conflict is resolved by simply adding an underscore at the end of the function name until it is unique. This needs to be improved. I also think that multiple underscores in a row have a special meaning so it may even be an error. On the other hand, I don't think these cases would happen too often, and more or less only in cases such as pair of types std::string and StdString.

using namespace support

Detect and honor using namespace x; when encountered, so that users can write e.g. string instead of std::string.

Output position when JSON parser fails

The generated parser should report the postition within the input string if parsing fails. This is simply cur - jsonString. To achieve this, Error must be a structure containing both the error code and position. The current enumeration can be renamed to ErrorType. Proposed error structure:

struct Error {
    ErrorType type;
    int position;

    inline Error(ErrorType type = OK, int position = -1) : type(type), position(position) { }
    operator ErrorType() const;
    operator bool() const;
};

For serializers, reporting the source of error would be tricky because it is an element within a structure. Providing a pointer to the faulty element would be possible but probably not too helpful, because it isn't enough to easily find it within the structure tree. Still, serializers' error enumeration should also be renamed to ErrorType for consistency (with a possibility of typedef ErrorType Error;.

Add error to string conversion to parsers & serializers

Add a public

static const char * errorString(Error error);

to parsers and serializers if enabled in Settings. Also get rid of listing all error types in multiple places while at it, instead putting them in a list or a macro iterator, e.g.

code += std::string(INDENT INDENT)+Error::JSON_SYNTAX_ERROR+",\n";
code += std::string(INDENT INDENT)+Error::UNEXPECTED_END_OF_FILE+",\n";
code += std::string(INDENT INDENT)+Error::TYPE_MISMATCH+",\n";
code += std::string(INDENT INDENT)+Error::ARRAY_SIZE_MISMATCH+",\n";
code += std::string(INDENT INDENT)+Error::UNKNOWN_KEY+",\n";
code += std::string(INDENT INDENT)+Error::UNKNOWN_ENUM_VALUE+",\n";
code += std::string(INDENT INDENT)+Error::VALUE_OUT_OF_RANGE+",\n";
code += std::string(INDENT INDENT)+Error::STRING_EXPECTED+",\n";
code += std::string(INDENT INDENT)+Error::UTF16_ENCODING_ERROR+",\n";

Namespace alias support

Apparently it is possible to alias namespaces as

namespace standard_library = std;
namespace schrono = std::chrono;

Header parser would fail if this was encountered.

Full settings support

Many Settings flags are currently ignored due to not being implemented yet. These include:

  • verboseErrors
  • checkMissingKeys
  • checkRepeatingKeys
  • nanPolicy
  • infPolicy

Support nullptr_t

I have realized that when skipEmptyFields is true, there is no way to output null into the JSON. A possible solution for this very niche use case would be to add a NullType class that would be used for std::nullptr_t and would always serialize as null and when parsed, it would just throw TYPE_MISMATCH if the value is anything else. I would like to add this mainly because it's a pretty elegant use of the available nullptr_t type.

Test suite

A comprehensive test project should be prepared to verify that everything works correctly and no new bugs are introduced with additional changes. It should be ready before the release of version 1.0.

Annotations

Add support for annotations in the input code. These would be specially-marked comments or parts of comments. They should allow:

  • Ignoring a section (structure, member, ...),
  • representing a structure member under a different key in the JSON,
  • parsing a class as if it were a struct.

Nested types of aliased structures cannot be found

For example:

struct Foo {
    struct Bar {
        int x;
    };
};

using FooAlias = Foo;

In this case, we should be able to refer to FooAlias::Bar but due to the way nested names and alias resolution is currently implemented, this is not possible.

Get rid of sscanf / sprintf

Not only are these functions archaic and overly complex (format string parsing etc.) but I have found that sscanf in particular is extremely slow. This needs to be done away with ASAP. However, the implementation for floats is very complex. There should also be some way for the user to select an implementation or provide their own functions for number (de)serialization.

First official release

Release the first official stable version along with a binary. Depending on the state the project is in, it could be version 1.0.0, or something like 0.9.0.

Integer enumeration fallback

Add an option to allow values of enum variables not corresponding to a named value. These values would be serialized / parsed as simple integers.

Type aliases in configuration

Add a section to the configuration file for "typedefs", (e.g. artery::MemInt = std::ptrdiff_t). This is especially useful before #9 is implemented, but even then it may be good for types included from libraries or conditional aliases.

Out of order declaration

Make sure that code is generated properly even if input files or structures are not in the order they are used.

Look into

bool parseStructNamesOnly = false; // prepass in case input files are in the wrong order
which suggests that simply running the parser in two passes using this flag may be enough to resolve this.

Implement settings with ifdefs

The way parsers and serializers are generated could be changed so that they are configurable with macro definitions rather than settings for the generator, with a notable exception of noThrow as that would result in a mess with function return types. This would also make it easier to check different versions of the generated functions as all would be visible simultaneously. The settings can be kept for default macro values, e.g.

#ifndef JSON_CPP_STRICT_SYNTAX_CHECK
#define JSON_CPP_STRICT_SYNTAX_CHECK 0
#endif
// In actual function:
#if JSON_CPP_STRICT_SYNTAX_CHECK
// ...

For checkMissingKeys and checkRepeatingKeys, this would work as

if (buffer == "firstKey") {
    JSON_CPP_KEY_ENCOUNTERED(0, 0x00000001)
    parseXYZ(value.firstKey);
    continue;
}

with JSON_CPP_KEY_ENCOUNTERED being defined at the beginning of the file to either nothing, just flagging doneKeys, or also checking to throw a REPEATED_KEY based on the configuration macros.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.