boostorg / json Goto Github PK

A C++11 library for parsing and serializing JSON to and from a DOM container in memory.

License: Boost Software License 1.0

CMake 0.68% C++ 96.55% Python 0.13% HTML 1.24% Shell 0.45% Batchfile 0.24% Starlark 0.71%

json json-libraries boost parse cplusplus-11 cplusplus fast header-only

json's Introduction

Branch	`master`	`develop`
Azure
Docs
Drone
Matrix
Fuzzing	---
Appveyor
codecov.io

Boost.JSON

Overview

Boost.JSON is a portable C++ library which provides containers and algorithms that implement JavaScript Object Notation, or simply "JSON", a lightweight data-interchange format. This format is easy for humans to read and write, and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language (Standard ECMA-262), and is currently standardised in RFC 8259. JSON is a text format that is language-independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

This library focuses on a common and popular use-case: parsing and serializing to and from a container called value which holds JSON types. Any value which you build can be serialized and then deserialized, guaranteeing that the result will be equal to the original value. Whatever JSON output you produce with this library will be readable by most common JSON implementations in any language.

The value container is designed to be well suited as a vocabulary type appropriate for use in public interfaces and libraries, allowing them to be composed. The library restricts the representable data types to the ranges which are almost universally accepted by most JSON implementations, especially JavaScript. The parser and serializer are both highly performant, meeting or exceeding the benchmark performance of the best comparable libraries. Allocators are very well supported. Code which uses these types will be easy to understand, flexible, and performant.

Boost.JSON offers these features:

Fast compilation
Require only C++11
Fast streaming parser and serializer
Constant-time key lookup for objects
Options to allow non-standard JSON
Easy and safe modern API with allocator support
Optional header-only, without linking to a library

Visit https://boost.org/libs/json for complete documentation.

Requirements

Requires only C++11
Link to a built static or dynamic Boost library, or use header-only (see below)
Supports -fno-exceptions, detected automatically

The library relies heavily on these well known C++ types in its interfaces (henceforth termed standard types):

string_view
memory_resource, polymorphic_allocator
error_category, error_code, error_condition, system_error

Header-Only

To use as header-only; that is, to eliminate the requirement to link a program to a static or dynamic Boost.JSON library, simply place the following line in exactly one new or existing source file in your project.

#include <boost/json/src.hpp>

MSVC users must also define the macro BOOST_JSON_NO_LIB to disable auto-linking.

Embedded

Boost.JSON works great on embedded devices. The library uses local stack buffers to increase the performance of some operations. On Intel platforms these buffers are large (4KB), while on non-Intel platforms they are small (256 bytes). To adjust the size of the stack buffers for embedded applications define this macro when building the library or including the function definitions:

#define BOOST_JSON_STACK_BUFFER_SIZE 1024
#include <boost/json/src.hpp>

Endianness

Boost.JSON uses Boost.Endian in order to support both little endian and big endian platforms.

Supported Compilers

Boost.JSON has been tested with the following compilers:

clang: 3.5, 3.6, 3.7, 3.8, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
gcc: 4.8, 4.9, 5, 6, 7, 8, 9, 10, 11, 12
msvc: 14.0, 14.1, 14.2, 14.3

Note: support for GCC 4.8 and 4.9 is deprecated and will stop in Boost 1.88.0.

Supported JSON Text

The library expects input text to be encoded using UTF-8, which is a requirement put on all JSON exchanged between systems by the RFC. Similarly, the text generated by the library is valid UTF-8.

The RFC does not allow byte order marks (BOM) to appear in JSON text, so the library considers BOM syntax errors.

The library supports several popular JSON extensions. These have to be explicitly enabled.

Visual Studio Solution

cmake -G "Visual Studio 16 2019" -A Win32 -B bin -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/msvc.cmake
cmake -G "Visual Studio 16 2019" -A x64 -B bin64 -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/msvc.cmake

Quality Assurance

The development infrastructure for the library includes these per-commit analyses:

Coverage reports
Benchmark performance comparisons
Compilation and tests on Drone.io, Azure Pipelines, Appveyor
Fuzzing using clang-llvm and machine learning

License

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt)

json's People

Contributors

Stargazers

Watchers

Forkers

vicfred pauldreik reddyarunreddy madmongo1 ayounes-synaptics sdkrystian manofsteal m8mble dendisuhubdy qis igxactly-forks gouguofei aerostun r1nside anticrisis rtobar eelis sdarwin akrzemi1 julien-blanc-tgcm djarek vinniefalco maximilianriemensberger vishalvibes ccf19881030 grisumbras appendme jwc1zkj samd2 nirvananimbusa mattkeeley stream009 evanlenz truonggiangtran opensourceeda wpmgprostotema fieryswampshire maksymilian-palka koalayt longgang oipo baby636 abitmore ninghejun anshika208 xtricman fatihsnsy zfail klemens-morgenstern sdebionne byeonghyeonyou enrico-samknows werto87 dimarusyy rangow org-mars bazelboost vasily-sviridov hixio-mh fancyop ekra-ltd sehe jingliang2005 kistlin classicvalues kamrann balusch gummif isabella232 alandefreitas clayne phoenixdigitalfx havealex maat2019 wh1t3lord grafikrobot gtton mborland superlokkus gochenryan fpelliccioni sthagen seanpm2001 phogatfirstgitrepo nigels-com sam-caldwell vaishnavkatiyar infinityofficialnetwork fjh226151 appcodegen goodstudio

json's Issues

Parser should error on overlong strings and containers

The current code throws an exception when strings, arrays, or objects are too large. It should return an error instead.

Maybe remove count from handler signature

The array/object counts are not strictly needed, and could affect parse performance.

Assertion in boost::json::detail::pow10(int)

This was already reported in https://github.com/vinniefalco/json/issues/13#issuecomment-560517119, but here is a minimized version and I think it is good to have it in a separate issue.

I is the string 0.00....... with a lot of zeros following. Here it is, base 64 encoded:

MC4wMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw
MDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw
MDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw
MDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw
MDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw
MDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDA=

Running it through the fuzzer gives the following output:

paul@tonfisk:~/code/delaktig/boost.json/fuzzing$ ./fuzzer old_crashes/crash_01.json 
INFO: Seed: 2483239189
INFO: Loaded 1 modules   (1092 inline 8-bit counters): 1092 [0x7d21a0, 0x7d25e4), 
INFO: Loaded 1 PC tables (1092 PCs): 1092 [0x5a1598,0x5a59d8), 
./fuzzer: Running 1 inputs 1 time(s) each.
Running: old_crashes/crash_01.json
fuzzer: ../include/boost/json/detail/impl/number.ipp:96: double boost::json::detail::pow10(int): Assertion `exp >= 0 && exp < 618' failed.

Add initial resource, clear function & maximum byte cap to monotonic_resource

Useful resources

http://seriot.ch/parsing_json.php
https://github.com/nst/JSONTestSuite
https://github.com/cyb70289/utf8/
https://github.com/lemire/fastvalidate-utf-8

Accessing parsed numbers

Currently a JSON number may be parsed as either kind::int64, kind::uint64 or kind::double_ depending on the actual value, e.g. 1 is parsed as kind::int64, 18446744073709551615 is of kind::uint64, and 1.0 is of kind::double_.

This makes it convoluted to parse into a specific type if a sub-range may be parsed as another type, for example, to uint64_t I have to write:

uint64_t get_as_uint64(const char* s) {
  const auto value = json::parse(s);
  if (const auto* n = value.if_int64(); n && *n >= 0) { // [ 0 .. 2^63 - 1 ]
    return *n;
  } else if (const auto* n = value.if_uint64()) { // [ 2^63 .. 2^64 - 1 ]
    return *n;
  } else {
    throw std::out_of_range{"not an uint64_t"};
  }
}

whereas I would expect to just write:

json::parse(s).as_uint64(); // should work for [0 .. 2^64 - 1] and throws on anything else

My intuition is that there should be a json::number type that can hold any parsed numbers and may be converted to the user-requested type upon access as long as there is no loss of precision:

const json::number n = json::parse("42").as_number();
assert(n.as_int64() == 42); // only perform expensive check if the number fits into the data type without loss of precision here
assert(n.as_uint64() == 42);
assert(n.as_double() == 42.0);
// value::as_x() would just be a shorthand for value.as_number().as_x();

assert(json::parse("42.0").as_number().as_int64(); == 42); // ???, debatable

Thoughts?

Add storage_ptr constructor to parser & serializer for internal use

Just one more customisation point and we're good.

Boost.JSON currently has two customisation points for the serialisation of UDTs to JSON values:

provide a member function T::to_json(boost::json::value&) const
specialise the template boost::json::to_value_traits<> in the namespace boost::json.

This issues lays out an argument why, in the author's view, there is a need for one more.

It is not possible to automatically provide a conversion for enums with the use of a macro.

It is common practice to use one of the recent, very useful header-only mini-libraries to provide missing utility to c++ enum types. One example of this is wise_enum (https://github.com/quicknir/wise_enum) although there are a few.

By declaring an enum with the WISE_ENUM_XXX macros, users gain proper c++ enums wich have full compile-time reflection. For example wise_enum::to_string(E) yields a string_view and wise_enum::from_string<T>(str) yields an optional<E>.

This has great utility in logging/serialisation etc.

I go further in my programs and provide an ADL to_string overload for all object types, essentially providing:

template< class E, wise_enum::is_wise_enum_v<E>* = nullptr >
string_view 
to_string(
    E const& e)  
{
    return wise_enum::to_string(e);
}

as part of the macro which declares the enum. Other such mini-functions are printed by this macro in order to provide operator<< and operator>>, which means, for example, that the enum is now parsable by boost::program_options as an option name rather than an integer.

The provision of an ADL version simplifies the writing of generic code, for example:

    template<class Impl>
    struct packet_enable_json
    {
        void to_json(boost::json::value& jv) const
        {
            auto& self = static_cast<Impl const&>(*this);
            auto& object = jv.emplace_object();
            object.reserve(2);
            object.emplace("type", to_string(self.id()));   // <<== HERE
            object.emplace("data", boost::json::to_value(Impl::as_nvps(self), object.storage()));
        }
    };

In the above real example, any Impl who's .id() method yields any type which has the ADL to_string function available will be correctly serialised to json.

This works when the objects are always represented as JSON strings. But in the above example I have a problem if the result type of Impl::as_nvps(self) yields an enum.

Currently boost.json will serialise the underlying integer representation (in the case of an old-style enum) or fail to compile with an enum class

The solution of course is to provide a specialisation of boost::json::to_value_traits but (prior to fully compliant c++20 compilers) I have the problem that my utility macro will need to:

end the current namespace,
open the boost::json namespace,
print the specialisation in terms of wise_enum::to_string and then,
re-establish the original namespace in which the enum is declared.

The final step is not possible without futher decorating the macro with namespace names, which is ugly.

###I offer ~~three~~ two solutions:

~~Allow the overloading of boost::json::to_value as a customisation point.~~ (this doesn't actually help)
Add one additional step to the deduction of behaviour of boost::json::to_value:

The library provides a set of customization points to enable conversions between objects of type T and a JSON value by:

A specialization of to_value_traits for T containing the public static member function void to_json_traits::assign( value&, T const& ),

A specialization of value_cast_traits for T containing the public static member function T value_cast_traits::construct( value const& ),

A free function declared in the namespace of T compatible with the following signature: void to_json(T const&, boost::json::value& jv , boost::json::storage_ptr = {})

A public non-static member function void T::to_json( value& ) const,

A public constructor T::T( value const& ),

Add an additional Enable template parameter to to_value_traits so that user programs can partially specialise the trait for all types which match a predicate.

For example:

template< class Enum >
struct to_value_traits < Enum, std::enable_if_t< wise_enum::is_wise_enum_v<Enum> > >
{
  static void assign( value& jv, Enum const& e )
  {
    jv = wise_enum::to_string(e);
  }
}

Opinions about ADL are as varied and as strong as opnions about exceptions, almost-always-auto and the use of goto.

I am in favour of the provision of ADL overloads, as they are simple and unobtrsive.

However, any of the above solutions would be acceptable to me as they would provide the necessary functionality.

@vinniefalco
@pdimov
@sdkrystian

Test standalone with libstdc++

The Travis configuration for standalone clang-9 does not work because the installed libstdc++ is not new enough to contain <memory_resource>

Parsing Failures in Corner Cases

I've benchmarked this JSON library with the Native JSON Benchmark Suite. This revealed a few interesting "Errors" in corner-cases. I'm not certain, whether the expected results are mandated by JSON, but wanted to share the findings nonetheless.

Doubles

* `[-0.0]`
  * expect: `-0 (0x0168000000000000000)`
  * actual: `0 (0x0160)`

* `[2.22507e-308]`
  * expect: `2.2250699999999998e-308 (0x016FFFFE2E8159D0)`
  * actual: `2.2250700000295652e-308 (0x016FFFFE2E824391)`

* `[-2.22507e-308]`
  * expect: `-2.2250699999999998e-308 (0x016800FFFFE2E8159D0)`
  * actual: `-2.2250700000295652e-308 (0x016800FFFFE2E824391)`

* `[4.9406564584124654e-324]`
  * expect: `4.9406564584124654e-324 (0x0161)`
  * actual: `0 (0x0160)`

* `[2.2250738585072009e-308]`
  * expect: `2.2250738585072009e-308 (0x016FFFFFFFFFFFFF)`
  * actual: `0 (0x0160)`

* `[2.2250738585072014e-308]`
  * expect: `2.2250738585072014e-308 (0x01610000000000000)`
  * actual: `0 (0x0160)`

* `[1e-10000]`
  * expect: `0 (0x0160)`
  * actual: `0 (0x0160)`

* `[0.9868011474609375]`
  * expect: `0.9868011474609375 (0x0163FEF93E000000000)`
  * actual: `0.98680114746093761 (0x0163FEF93E000000001)`

* `[2.2250738585072011e-308]`
  * expect: `2.2250738585072009e-308 (0x016FFFFFFFFFFFFF)`
  * actual: `0 (0x0160)`

* `[1e-214748363]`
  * expect: `0 (0x0160)`
  * actual: `0 (0x0160)`

* `[1e-214748364]`
  * expect: `0 (0x0160)`
  * actual: `0 (0x0160)`

* `[0.017976931348623157e+310]`
  * expect: `1.7976931348623157e+308 (0x0167FEFFFFFFFFFFFFF)`
  * actual: `inf (0x0167FF0000000000000)`

* `[2.2250738585072012e-308]`
  * expect: `2.2250738585072014e-308 (0x01610000000000000)`
  * actual: `0 (0x0160)`

* `[2.22507385850720113605740979670913197593481954635164564e-308]`
  * expect: `2.2250738585072009e-308 (0x016FFFFFFFFFFFFF)`
  * actual: `0 (0x0160)`

* `[2.22507385850720113605740979670913197593481954635164565e-308]`
  * expect: `2.2250738585072014e-308 (0x01610000000000000)`
  * actual: `0 (0x0160)`

* `[0.999999999999999944488848768742172978818416595458984374]`
  * expect: `0.99999999999999989 (0x0163FEFFFFFFFFFFFFF)`
  * actual: `1 (0x0163FF0000000000000)`

* `[1.00000000000000011102230246251565404236316680908203125]`
  * expect: `1 (0x0163FF0000000000000)`
  * actual: `1.0000000000000002 (0x0163FF0000000000001)`

* `[1.00000000000000011102230246251565404236316680908203124]`
  * expect: `1 (0x0163FF0000000000000)`
  * actual: `1.0000000000000002 (0x0163FF0000000000001)`

* `[7205759403792793199999e-5]`
  * expect: `72057594037927928 (0x016436FFFFFFFFFFFFF)`
  * actual: `72057594037927936 (0x0164370000000000000)`

* `[10141204801825834086073718800384]`
  * expect: `1.0141204801825834e+31 (0x016465FFFFFFFFFFFFF)`
  * actual: `1.0141204801825835e+31 (0x0164660000000000000)`

* `[1014120480182583464902367222169599999e-5]`
  * expect: `1.0141204801825834e+31 (0x016465FFFFFFFFFFFFF)`
  * actual: `1.0141204801825835e+31 (0x0164660000000000000)`

* `[5708990770823839207320493820740630171355185152]`
  * expect: `5.7089907708238395e+45 (0x0164970000000000000)`
  * actual: `5.7089907708238389e+45 (0x016496FFFFFFFFFFFFF)`

* `[5708990770823839207320493820740630171355185152001e-3]`
  * expect: `5.7089907708238395e+45 (0x0164970000000000000)`
  * actual: `5.7089907708238389e+45 (0x016496FFFFFFFFFFFFF)`

* `[2.225073858507201136057409796709131975934819546351645648023426109724822222021076945516529523908135087914149158913039621106870086438694594645527657207407820621743379988141063267329253552286881372149012981122451451889849057222307285255133155755015914397476397983411801999323962548289017107081850690630666655994938275772572015763062690663332647565300009245888316433037779791869612049497390377829704905051080609940730262937128958950003583799967207254304360284078895771796150945516748243471030702609144621572289880258182545180325707018860872113128079512233426288368622321503775666622503982534335974568884423900265498198385487948292206894721689831099698365846814022854243330660339850886445804001034933970427567186443383770486037861622771738545623065874679014086723327636718751234567890123456789012345678901e-308]`
  * expect: `2.2250738585072014e-308 (0x01610000000000000)`
  * actual: `0 (0x0160)`

Strings

* `["Hello\u0000World"]`
  * expect: `"Hello\0World"` (length: 11)
  * actual: `"Hello"` (length: 5)

* `["\uD834\uDD1E"]`
  * expect: `"𝄞"` (length: 4)
  * actual: `"턞"` (length: 3)

For the reference: The tests have been performed similarly to the following snippet:

#define BOOST_JSON_HEADER_ONLY 1
#include <boost/json.hpp>

// [...]

   namespace JSON = boost::json;
   bool ParseDouble(const char* json, double* d) const {
      try {
         const auto root = JSON::parse(JSON::string_view{json});
         const auto& element = root.get_array()[0];
         *d = [&]() -> double
         {
            switch (element.kind())
            {
               case JSON::kind::double_:
                  return element.get_double();
               case JSON::kind::uint64:
                  return element.get_uint64();
               case JSON::kind::int64:
                  return element.get_int64();
               default:
                  throw false;
            }
         }();
         return true;
      }
      catch (...) {
         return false;
      }
   }

   bool ParseString(const char* json, std::string& s) const {
      try {
         const auto root = JSON::parse(JSON::string_view{json});
         const auto& element = root.get_array()[0];
         s = element.get_string().c_str();
         return true;
      }
      catch (...) {
         return false;
      }
   }

In case I did a testing-mistake, I'd be happy to repeat the exercise.

Segfault when running bench

Build Host: Fedora 31
Target: Fedora 31 (Linux)
Build System: CMake
Toolchains: gcc-9 c++11, clang-9 c++14
Branch: develop

Synopsys
Misuse of RapidJSON public interface results in internal assertion in RapidJSON library

Steps to Reproduce

$ cmake -DCMAKE_BUILD_TYPE=Debug ...
build the project as normal
$ bench <source_root>/bench/data/canada.json

Detail

The problem can be isolated to the benchmark enabled by
vi.emplace_back(new rapidjson_crt_impl); in function main of bench.cpp

The parse member function of the benchmark contains the following lines:

            CrtAllocator alloc;
            GenericDocument<
                UTF8<>, CrtAllocator> d(&alloc);
            d.Clear();  <<<--- PROBLEM IS HERE

In RapidJSON, the implementation of GenericDocument<>::Clear() executes an assertion that the internal value representing the document is an Array:

    void Clear() {
        RAPIDJSON_ASSERT(IsArray()); 
        GenericValue* e = GetElementsPointer();
        for (GenericValue* v = e; v != e + data_.a.size; ++v)
            v->~GenericValue();
        data_.a.size = 0;
    }

In the above use pattern, the document is not an Array, as it has not been initialised at all. It is actually null.

The assertion fails, resulting in either SEGFAULT of a debugger break.

Solution

remove the line d.Clear();

Side Effects of Solution

None. The document is re-created each time around the loop in any case.

integer 32/64 bit files have the wrong range

The numbers in the files integers-32 have the wrong range, some don't fit into int. And there are no small numbers (like 1 or 2).

Update README.md

The README.md should include most or all of the contents of 01_overview.qbk

Extra array level when serializing member value

I'm observing weird output when testing with the following condensed minimal program:

#define BOOST_JSON_HEADER_ONLY 1
#include <iostream>
#include <boost/json.hpp>

class Wrapper
{
public:
   Wrapper(boost::json::value v) : wrapped{v} {}

   boost::json::value wrapped;
};

int main() {
   const auto input = boost::json::string_view{"[]"};
   const auto value = boost::json::parse(input);
   const auto wrapper = Wrapper{value};
   const auto output = boost::json::to_string(wrapper.wrapped);
   std::cout << input << " -> " << boost::json::to_string(wrapper.wrapped) << std::endl;
}

I'd expect output [] -> [] but I'm seeing [] -> [[]]. If I skip using the Wrapper, I get the desired output. Am I violating some requirement of the value type?

Zero-alloc initializer lists

We can have zero-allocation initializer_list of value_ref with a carefully designed type.

Parsing invalid JSON results in segmentation fault.

I have been trying to parse user generated json, which obviously could be faulty. In this case I tried parsing a json string, which is missing a separating comma between two values.

Example:

#include <string>
#include <boost/json.hpp>

int main(int argc, char *argv[]) {

    std::string test = "{"
                       "\"username\": \"user\"" // <- Missing separating ,
                       "\"password\": \"password\""
                       "}";
    boost::json::error_code ec;
    boost::json::parser parser;
    parser.start();
    parser.write(test.c_str(), test.size(), ec);
    parser.finish();

    if (ec) {
        std::cout << ec.message() << std::endl;
        return EXIT_FAILURE;
    }

    auto const jv = parser.release();

    if (ec) {
        std::cout << ec.message() << std::endl;
        return EXIT_FAILURE;
    } else {
        std::cout << jv << std::endl;
        return EXIT_SUCCESS;
    }
}

I tried both parse and parser, both resulting in segmentation fault. I checked the docs about how I should validate the json, but all I could find was either using error_codes or exceptions.

Am I missing a function to check untrusted json for errors or is there a bug in the library?

Heap-buffer-overflow when fuzzing with libFuzzer

Fuzzing the validate() function in example/validate.cpp results in heap-buffer-overflow:

==75832==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000005901 at pc 0x00000031fa46 bp 0x7fffffff8f50 sp 0x7fffffff8f48
READ of size 1 at 0x603000005901 thread T0
    #0 0x31fa45 in boost::json::basic_parser::write_some(char const*, unsigned long, std::__1::error_code&)
        boost/json/impl/basic_parser.ipp:456:16
    #1 0x30f73c in boost::json::basic_parser::write(char const*, unsigned long, std::__1::error_code&)
        boost/json/impl/basic_parser.ipp:894:9
    #2 0x309cf2 in boost::json::basic_parser::finish(char const*, unsigned long, std::__1::error_code&)
        boost/json/impl/basic_parser.ipp:924:5
    #3 0x30937e in (anonymous namespace)::validate(std::__1::basic_string_view<char, std::__1::char_traits<char> >)
        bjson.cc:73:11

Input:

\"~QQ36644632   {n
Base64: In5RUTM2NjQ0NjMyICAge24=

Please let me know if you need more details.

utf8 validation references

https://github.com/cyb70289/utf8/

https://github.com/lemire/fastvalidate-utf-8

Don't use submodules

bench should get nlohmann and rapidjson from a script not as a submodule

Invalid utf-8 strings are invalidly parsed as valid json

See attached file. It should not be parsed correctly.

invalid_json.json.gz

take the last duplicate key

"...most popular implementations (including the ECMAScript specification which is implemented in modern browsers) follow the rule of taking only the last key-value pair..."

Optional bench and build time cloning of nlohmann and rapidjson

Hello,
Thanks for this library, I find it very interesting, especially for deeply embedded platform (without OS).
I feel that having nlohmann and rapidjson imposed as git submodule just for the purpose of benchmarking is too much. Full cloning of nlohmann in particular is quite big...
What about moving that to build time retrieving thanks to cmake ExternalProject_Add?

include(ExternalProject)
ExternalProject_Add(bench/lib/nlohmann
GIT_REPOSITORY https://github.com/nlohmann/json.git
GIT_TAG 99d7518d21cbbfe91d341a5431438bf7559c6974
)

Also compilation of bench should be optional.

Thanks!

CMAKE_CXX_STANDARD in cmake/toolchains/common.cmake is ignored

cmake -B _build.gcc -DBOOST_JSON_STANDALONE=ON -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/gcc.cmake
cmake -B _build.gcc -DCMAKE_CXX_STANDARD=14 -DBOOST_JSON_STANDALONE=ON -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/gcc.cmake

Something seems fishy in how set(CMAKE_CXX_STANDARD 11 CACHE STRING "") is handled, the src/src.cpp is compiled with -std=c++17 but the test/limits.cpp with -std=c++11 or with -std=c++14 (the second command).

The fish has been fished:

https://github.com/CPPAlliance/json/blob/d92902c7cf259f4103a0b798050f50ae5b8aa289/CMakeLists.txt#L63

Then, the common.cmake needs something like this perhaps

if(BOOST_JSON_STANDALONE)
    set(CMAKE_CXX_STANDARD 17 CACHE STRING "")
else()
    set(CMAKE_CXX_STANDARD 11 CACHE STRING "")
endif()

Crash on input

Parser chokes on this (after base64 decode):

WyL//34zOVx1ZDg0ZFx1ZGM4M2RcdWQ4M2RcdWRlM2M4dWRlMTlcdWQ4M2RcdWRlMzlkZWUzOVx1
ZDg0ZFx1ZGM4M2RcdWQ4M2RcdWRlMzlcXHVkY2M4M1x1ZDg5ZFx1ZGUzOVx1ZDgzZFx1ZGUzOWRb
IGZhbHNlLDMzMzMzMzMzMzMzMzMzMzMzNDMzMzMzMTY1MzczNzMwLDMzMzMzMzMzMzMzMzMzMzMz
MzM3ODAsMzMzMzMzMzMzMzM0MzMzMzMxNjUzNzM3MzAsMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMz
MzM3ODAsMzMzMzMzMzMzMzMzMzQzMzMzMzE2NTM3MzczMCwzMzMzMzMzMzMzMzMzMzMzMzMzNzgw
LDMzMzMzMzM4MzU1MzMwNzQ3NDYwLDMzMTY2NTAwMDAzMzMzMzMwNzQ3MzMzMzMzMzc3OSwzMzMz
MzMzMzMzMzMzMzMzNDMzMzMzMzMwNzQ3NDYwLDMzMzMzMzMzMzMzMzMzMzMzMzMzNzgwLDMzMzMz
MzMzMzMzMzMzMzA4ODM1NTMzMDc0Mzc4MCwzMzMzMzMzMzMzMzMzMzMwODgzNTUzMzA3NDc0NjAs
MzMzMzMzMzMxNjY1MDAwMDMzMzMzNDc0NjAsMzMzMzMzMzMzMzMzMzMzMzMzMzc4MCwzMzMzMzMz
MzMzMzM3MzMzMzE2NjUwMDAwMzMzMzMzMDc0NzMzMzMzMzM3NzksMzMzMzMzMzMzMzMzMzMzMzQz
MzMzMzMwNzQ3NDYwLDMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzc4MCwzMzMzMzMzMzMzNzgw
LDMzMzMzMzMzMzMzMzA4ODM1NTMzMDc0NzQ2MCwzMzE2NjUwMDAwMzMzMzMzMDc0NzMzMzMzMzM3
NzksMzMzMzMzMzMzMzMzMzMzMzQzMzMzMzMwNzQ3NDYwLDMzMzMzMzMzMzMzMzMzMzMzMzMzNzgw
LDMzMzMzMzMzMzMzMzMzMzA4ODM1NTMzMDc0Mzc4MCwzMzMzMzMzMzMzMzMzMzMzMzMwODgzNTUz
MzA3NDM3ODAsMzMzMzMzMzMzMzMzMzMzMDg4MzU1MzMwNzQ3NDYwLDMzMzMzMzMzMzMzMDczMzM3
NDc0NjAsMzMzMzMzMzMzMzMzMzMzMzMzNzgwLDMzMzMzMzMzMzMzMzA4ODM1NTMzMDc0NzQ2MCwz
MzE2NjUwMDAwMzMzMzMzMDc0NzMzMzMzMzM3NzksMzMzMzMzMzMzMzMzMzMzMzQzMzMzMzMzMDc0
NzQ2MCwzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzMzM3ODAsMzMzMzMzMzMzMzMzMzMzMDg4
MzU1MzMwNzQzNzgwLDMzMzMzMzMzMzMzMzMzMzA4ODM1NTMzMDc0NzQ2MCwzMzMzMzMzMzMzMzMz
MzMzMzM0MjQ3LDMzMzMzMzMzMzMzMzMzMzQzMzMzMzMzMzMzMzMzMzM3MzMzMzQzMzMzMzMzMDc0
NzQ2MCwzMzMzMzMzMzMzMzMzMzMzMzMzNzgwLDMzMzMzMzMzMzMzMzA4ODM1NTMzMDc0NzQ2MCwz
MzE2NjUwMDAwMzMzMzMzMDc0NzMzMzMzMzM3NzksMzMzMzMzMzMzMzMzMzMzMzQzMzMzMzMwNzQ3
NDYwLDMzMzMzMzMzMzMzMzMzMzMzMzMzNzgwLDMzMzMzMzMzMzMzMzMzMzA4ODM1NTMzMDc0Mzc4
MCwzMzMzMzMzMzMzMzMzMzMwODgzNTUzMzA3NDc0NjAsMzMzMzMzMzMzLDMzMzMzMzMzMzMzMzMz
MzMzMzM3ODAsMzMzMzMzMzMzMzc4MCwzMzMzMzMzMzMzMzMwODgzNTUzMzA3NDc0NjAsMzMxNjY1
MDAwMDMzMzMzMzA3NDczMzMzMzMzNzc5LDMzMzMzMzMzMzM3ODAsMzMzMzMzMzgzNTUzMzA3NDc0
NjAsMzMxNjY1MDAwMDMzMzMzMzA3NDczMzMzMzMzNzc5LDMzMzMzMzMzMzMzMzMzMzM0MzMzMzMz
MzA3NDc0NjAsMzMzMzMzMzMzMzMzMzMzMzMzMzM3ODAsMzMzMzMzMzMzMzMzMzMzMDg4MzU1MzMw
NzQzNzgwLDMzMzMzMzMzMzMzMzMzMzA4ODM1NTMzMDc0NzQ2MCwzMzMzMzMzMzE2NjUwMDAwMzMz
MzM0NzQ2MCwzMzMzMzMzMzMzMzMzMzMzMzMzNzgwLDMzMzMzMzMzMzMzMzM0MzMzMzMxNjUzNzM3
MzAsMzMzMzMzMzMzMzMzMzMzMzMzMzc4MCwzMzMzMzMzODM1NTMzMDc0NzQ2MCwzMzE2NjUwMDAw
MzMzMzMzMDc0NzMzMzMzMzM3NzksMzMzMzMzMzMzMzMzMzMzMzQzMzMzMzMzMDc0NzQ2MCwzMzMz
MzMzMzMzMzMzMzMzMzMzMzc4MCwzMzMzMzMzMzMzMzMzMzMwODgzNTUzMzA3NDM3ODAsMzMzMzMz
MzMzMzMzMzMzMDg4MzU1MzMwNzQ3NDYwLDMzMzMzMzMzMTY2NTAwMDAzMzMzMzQ3NDYwLDMzMzMz
MzMzMzMzMzMzMzMzMzM3ODAsMzMzMzMzMzMzMzMzNzMzMzM0MzMzMzMzMzA3NDc0NjAsMzMzMzMz
MzMzMzMzMzMzMzMzMzc4MCwzMzMzMzMzMzMzMzMwODgzNTUzMzA3NDc0NjAsMzMxNjY1MDAwMDMz
MzMzMzA3NDczMzMzMzMzNzc5LDMzMzMzMzMzMzMzMzMzMzM0MzMzMzNcdWQ4N2RcdWRlZGV1ZGM4
ZGUzOVx1ZDg0ZFx1ZGM4M2RcdWQ4OGRcdWRlMzlcdWQ4OWRcdWRlMjM5MzMzZWUzOVxk

Add `callbacks` class

We can move the implementation of the callbacks in parser to a separate class to make the callbacks public, so that it can be re-used to allow third party parsers that support sax to produce json::value objects easily.

operator= wrong signatures

There should be 2 overloads: operator=(string const&) and operator=(string&&). The same is probably needed for the constructors, and other places where string appears as a parameter type. And this may be needed for array and object overloads.

Handle overflow in number parsing

In some areas of number parsing, e.g. ++dig_ or ++sig_ we need to check for overflow / wraparound and handle it. This will also need tests.

json::parse only returns the first digit from key-value pairs

The assertion

assert(json::parse("{\"port\": 12345}").as_object().at("port").as_int64() == 12345);

fails as only 1 is returned.

string initializer_list tests

The tests for initializer_list in string have to be uncommented once we figure out how to make them work with varying small-buffer sizes.

Use std::distance for forward range in string

Algorithms using iterators should use std::distance for ForwardIterator or better:
https://github.com/vinniefalco/json/blob/bc95b55c351248ca10062ad1febdfbf43d11b9b5/include/boost/json/detail/string_impl.hpp#L103

string::substr uncertainty about storage

Right now string::substr returns a string using the default storage. We might need to simply remove this overload. If users want a copy they can use subview in a constructor call, possibly explicitly specifying a storage pointer.

Remove static_stack

Review all error codes

Go through every line that assigns an error and make sure it is fine-grained. Group the error logically and assign conditions.

Use BOOST_JSON_DOCS

Instead of GENERATING_DOCUMENTATION

value_cast_traits javadoc is wrong

Dynamic cast causes code to fail to compile with -fno-rtti on gcc

https://github.com/CPPAlliance/json/blob/ad1a3378b6caac61da44bec403336a02d3fd381f/include/boost/json/detail/default_resource.hpp#L47

The above line prevents compilation if -fno-rtti is enabled in gcc (version 9, but I suspect that doesn't matter)

In member function ‘virtual bool boost::json::detail::default_resource::do_is_equal(const boost::container::pmr::memory_resource&) const’:
...boost/json/detail/default_resource.hpp:48:41: error: ‘dynamic_cast’ not permitted with ‘-fno-rtti’

This appears to be a regression on this commit:
a47b0f3

Backing up to one commit previous allows the same code to build and link just fine.

simplify parser

we dont need all the extra overloads of finish(), we can just make write work like finish and the user can deal with the error if they want to continue processing the extra data.

parser class bypasses storage class on finish() and write()

When a parser object is provided storage during the call to start(), subsequent calls to finish will bypass the storage object, and call operator new directly. It's not clear if this is intended, or an oversight. Digging into the code, it looks like the parser object holds a raw_stack object, which contains its own storage_ptr. Unfortunately (or intentionally), the parser constructor doesn't have an constructor that can take a reference to a storage_ptr, so raw_stack gets default initialized to a default_impl backed storage class in all cases, and bypasses the storage class entirely for the lifetime of the parser.

  boost::json::storage_ptr storage =
      boost::json::make_storage<boost::json::pool>(1024 * 1024);
  boost::json::parser p;
  p.start(storage);
  const std::array<char, 2> input = {'{', '}'};
  p.finish(input.data(), input.size());  // This calls operator new through boost::json::detail::default_impl::allocate

From the storage docs, this seems like unintended behavior, and one would expect the provided storage class to be used for memory during the parse operation.

placing this code:

 rs_ = detail::raw_stack(sp);

On the line below causes start to reconstruct the raw_stack object with the new storage, and causes the behavior I would expect, but it's not clear that's the intent, and might be inefficient to reconstruct the stack object if the same storage is reused repeatedly.
https://github.com/vinniefalco/json/blob/c08721906d67c17eb8bda679550ca7710856d917/include/boost/json/impl/parser.ipp#L165

array::unstable_erase

Using the swap technique we can erase an element in constant time

Trim down string

We can combine separate overloads in string into fewer functions accepting a string_view parameter. To check if the pointer lies within a range use std::less.

Update bintray credentials (was: add fuzzing)

This is a ticket to keep track of adding fuzz testing. I promised to help out, so here I am!
I wanted to highlight the plan so everyone is happy with the direction.

The plan is to

add a fuzz target (and build support, if necessary)
helper scripts to get an initial seed corpus
add a github action which runs the fuzzing for a short time (30 seconds or so) to make sure easy to find bugs are detected already at the pull request stage

Building

I had problem building the library - I suspect others wanting to try it out also may run into problems. Is there documentation somewhere that I missed? I expected the usual git clone, submodule recursive update, cmake to work out of the box but there seems to be a dependency on boost beast.

Fuzz target

There is already a fuzzer in #13 . @msimonsson are you ok that I incorporate your fuzzer code from that ticket under the boost license? I assume this is OK, but I am not a lawyer (well perhaps a C++ language lawyer wannabe :-).

Github Action

I have used this for the simdjson project recently, worked fine. I am not sure if it is possible to browse through it unless being a member, but here are some links:

the config: https://github.com/lemire/simdjson/blob/master/.github/workflows/fuzzers.yml
the jobs: https://github.com/lemire/simdjson/actions
partial motivation: simdjson/simdjson#370

For efficiency, it is good if the corpus can be stored somewhere between the runs. Otherwise it has to bootstrap each time which is inefficient. I use bintray for simdjson - @vinniefalco where would you be ok to store the corpus, do you perhaps already have a bintray account? In the meanwhile, I will use my own.

Implementation

I develop this in my clone for now.
See the Readme over here: https://github.com/pauldreik/json/tree/paul/fuzz/fuzzing

Tidy up error codes

Trim the error codes used by the parser and remove the unused error codes from the enum.

Faster hash table indexing

See:
https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog

More string/double test cases

Lots of juicy stuff here:

https://www.exploringbinary.com/17-digits-gets-you-there-once-youve-found-your-way/
https://www.exploringbinary.com/visual-c-plus-plus-strtod-still-broken/#comment-9403

error including <boost/json/value.hpp> by itself

A file with just the one include:

#include <boost/json/value.hpp>

fails to compile. This is in the header-only configuration.

#include <boost/config.hpp>
#ifndef BOOST_JSON_STANDALONE
# include <boost/assert.hpp>
# include <boost/system/error_code.hpp>
# include <boost/system/system_error.hpp>
# include <boost/utility/string_view.hpp>
#else
# include <cassert>
# include <string_view>
# include <system_error>
#endif
#include <stdint.h>

Compile as standalone static library

This patch adds a new cmake option to enable compilation of boost::json as a static standalone dependency without any dependency and also compile the examples:

mkdir b && cd b
cmake .. -DBOOST_JSON_STANDALONE=1
make

diff --git a/CMakeLists.txt b/CMakeLists.txt
index d0e383e..834e9ce 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -48,7 +48,13 @@ else()
     target_compile_definitions(boost_json PUBLIC BOOST_JSON_STATIC_LINK=1)
 endif()
 
-if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
+option(BOOST_JSON_STANDALONE "Build boost::json as a static standalone library" FALSE)
+
+if(BOOST_JSON_STANDALONE)
+    target_compile_features(boost_json PUBLIC cxx_std_17)
+    target_compile_definitions(boost_json PUBLIC BOOST_JSON_STANDALONE)
+    add_subdirectory(example)
+elseif(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
     if(${CMAKE_VERSION} VERSION_LESS 3.16)
         message(FATAL_ERROR "Boost.JSON development requires CMake 3.16 or newer.")
     endif()