tgockel / json-voorhees Goto Github PK
View Code? Open in Web Editor NEWA killer modern C++ library for interacting with JSON.
Home Page: http://tgockel.github.io/json-voorhees/
License: Apache License 2.0
A killer modern C++ library for interacting with JSON.
Home Page: http://tgockel.github.io/json-voorhees/
License: Apache License 2.0
If a json string is parsed and contains a string literal with a \u
that is not followed by hex digits, a range_error
is thrown by from_hex_digit
, but the surrounding code doesn't expect that.
The parse_string()
function, which calls the string decoding, only handles decode_error
, not range_error
. Probably this function should handle range_error
in the same way, or string_decode()
should already convert it into a decode_error
when parsing the \u
escape.
The exception can for example be triggered by this:
"\"\\uzzzz\""_json
I assume the problem is non-utf8 content, but I haven't debugged it in detail. Creating the value works, but not converting it back into a string:
jsonv::value v = "\"\xe4\""_json;
std::cout << v << std::endl;
I'm not completely sure if the json literal is supposed to parse without errors, but it does.
Converting the value back to a string then fails with an assertion:
#0 0x00007ffff71a0e37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff71a2528 in __GI_abort () at abort.c:89
#2 0x00007ffff7199ce6 in __assert_fail_base (fmt=0x7ffff72e9c08 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7ffff7b93984 "idx + length <= source_size", file=file@entry=0x7ffff7b93908 "/home/sth/src/extern/json-voorhees/src/jsonv/char_convert.cpp",
line=line@entry=217,
function=function@entry=0x7ffff7b93d80 <jsonv::detail::string_encode(std::ostream&, jsonv::detail::string_view)::__PRETTY_FUNCTION__> "std::ostream& jsonv::detail::string_encode(std::ostream&, jsonv::detail::string_view)") at assert.c:92
#3 0x00007ffff7199d92 in __GI___assert_fail (assertion=0x7ffff7b93984 "idx + length <= source_size",
file=0x7ffff7b93908 "/home/sth/src/extern/json-voorhees/src/jsonv/char_convert.cpp", line=217,
function=0x7ffff7b93d80 <jsonv::detail::string_encode(std::ostream&, jsonv::detail::string_view)::__PRETTY_FUNCTION__> "std::ostream& jsonv::detail::string_encode(std::ostream&, jsonv::detail::string_view)") at assert.c:101
#4 0x00007ffff7b2cae4 in jsonv::detail::string_encode (stream=..., source=...) at /home/sth/src/extern/json-voorhees/src/jsonv/char_convert.cpp:217
#5 0x00007ffff7b2c327 in jsonv::stream_escaped_string (stream=..., str=...) at /home/sth/src/extern/json-voorhees/src/jsonv/detail.cpp:102
#6 0x00007ffff7b7a4f2 in jsonv::ostream_encoder::write_string (this=0x7fffffffdbb0, value=...) at /home/sth/src/extern/json-voorhees/src/jsonv/encode.cpp:154
#7 0x00007ffff7b7a1de in jsonv::encoder::encode (this=0x7fffffffdbb0, source=...) at /home/sth/src/extern/json-voorhees/src/jsonv/encode.cpp:77
#8 0x00007ffff7b890bb in jsonv::operator<< (stream=..., val=...) at /home/sth/src/extern/json-voorhees/src/jsonv/value.cpp:523
#9 0x0000000000400ca5 in main () at /home/sth/src/extern/json-voorhees/test.cpp:9
With default settings, this test should work:
PARSE_TEST(invalid_utf8_input)
{
ensure_throws(jsonv::parse_error, jsonv::parse("\"\xe4\""));
}
Unfortunately, jsonv::parse
erroneously succeeds.
Some people working with JSON are used to the JavaScript type conversion rules (aka: crazy time). We should appease these people with to_X
functions which will attempt to do convert the value to what they want.
Currently, the code does not compile with USE_BOOST_STRING_REF=1
. boost::string_ref
's cast operator to a std::basic_string<char, ...>
is marked as explicit
, while jsonv::detail::string_ref
provides an implicit conversion operator. N3442 wishes the C++ Standard Library to provide an explicit conversion operator and the current implementation of std::experimental::string_view
also uses an explicit conversion. I need to fix my implementation of string_ref
and need to think of a way to make the object_impl
use of std::map
work when C++14 is enabled.
value::operator==(const value&) const
uses an epsilon value for equality of two kind::decimal
s, but value::compare(const value&) const
uses exact equality. This is not okay.
A single slash is interpreted as the potential start of a comment and is ignored even if a full comment couldn't be matched. This causes "/1"_json
or R"({"a": ////////"b"})"_json
to be parsed without errors, while they aren't valid json.
When a slash is encountered, match_pattern()
is called to parse the comment, which will return match_result::unmatched
. This result is then ignored in tokenizer::next()
and the token will be treated as valid. The single slash will then be treated as a full, valid comment.
The unmatched
result should result in a parse error instead.
Annoying, but I would prefer to take care of this change sooner rather than later.
position_in_buffer
hits assert
if a string is successfully read at the end of a buffer.
If you encounter a decode_error
in a string, it shoots out of the parsing system and into your grill. It should end up in the list and be subject to parse_options::failure_mode
and parse_options::max_failures
.
I need to write a make install
task. Not difficult, but definitely a TODO before v0.3.
All std::string
-> JSON string conversion goes through the jsonv::detail::string_encode
function, which performs numeric encoding for characters which do not fit into the ASCII encoding. This is not entirely necessary, since a JSON document can validly contain UTF-8 sequences of characters. The library should allow replacement of this encoding function if the user knows the decoding side can handle UTF-8 encoded JSON.
To the tune of: std::function<void (const jsonv::path& toHere, const jsonv::value& item)>
.
I don't want to deal with mutating a jsonv::value
while being visited, so that's its own can of worms.
If I am performing a destructive operation on some kind::object
and wish to move the std::string
keys from that object and make them my own, there is no way to do this. I can move the values associated with the key, but not the key itself.
This is a shortcoming in C++ in general, so it's probably not terribly urgent to do this. However, if there is a good proposal for having this for std::map
and whatnot, JSON Voorhees should follow that pattern.
We should support JSONPath queries for a value
.
The use of CXX_COMPILER
and INSTALL_DIR
is not the Unix convention.
The Unicode Windows API is 100% UTF-16 based. Despite the fact that UTF-16 is a complete trainwreck of an encoding system, people still need to use it if they're working with the Windows API. Some degree of support for char16_t
s (std::wstring
, std::wistream
, std::wostream
) would probably be appreciated by those folks that have to deal with the Windows API.
I would prefer it if value
could continue to store things as std::string
internally and convert on demand when requested, but I'm not sure about the implications for the keys of objects (there is no convenient wrapper there).
Reflective languages have this, why shouldn't C++? (aside from the obvious)
This library should work on Windows...eventually. I'm mostly going to put this off until MSVC implements C++11.
The hand-rolled parser is fast, but it is a major pain to maintain. Any LALR parser generator can parse JSON, I just need to find one that works well.
Using cmake -DBENCHMARK=1
and running ./json-benchmark
is a good start, but there should be some more formal reporting on this...graphs and whatnot.
for (auto x : jsonv::array({ "a", "b", "c" }).as_array()) ...
will access freed memory.
It takes quite a while to parse objects and arrays containing large strings and most of this time is spent copying (the underlying strcpy
takes ~12x longer than the discovery of what needs to be copied for ASCII strings). If a user does not end up needing the value (a common use case), then we have wasted time performing the copy.
Use Case
In web development, upon request, it is common to take a collection of paths and values and reconstruct an entire JSON tree.
In web development, upon response, it is common to take a value and flatten it into a collection of paths and values.
This is performed in PHP and in jQuery which adds to their success. In the case of jQuery this was a breaking change but they did it any way because it was so important. Many [C++] JSON libraries omit this capability.
Your library does have traverse that could be used to turn a json object into a map<jsonv::path, jsonv::value>, though a convenience function to do this common use case would be appreciated. So the former use case is mostly taken care of; except for the preceding "." common to your paths.
Now what about the former use case? What is the inverse of traverse function?
value has the following
value& at_path (const path &p)
but it doesn't seem to have
void set_value_at_path (const path &p, ?value?)
"value & path (const path &p)" comes close but doesn't take an additional ?value? parameter.
For convenience it would be great if there was a batch void paths|set_values_at_paths(std::map<path, value> values). It would be beneficial to perform this to an existing non primitive values or statically to create a new value.
In both use cases strongly type [de]serialization to and from map<path,value> is also welcomed for convenience.
Though at a minimal a set_value_at_path function is needed and all the others could be derived. If this can be performed now, please document how due to the importance of the proliferation of the web and its contribution to the popularity of JSON.
We should still encode finite, subnormal values.
Allow access to the underlying code the parser is parsing so people can go into their own AST.
This is an issue when parsing larger JSON input.
Hello,
while compiling you top-notch library on OS X, two files use std::isfinite and it causes a compile error because "cmath" is not included.
jsonv/encode.cpp
jsonv/util.cpp
Please keep up the good work on this jewel !
This naming is obnoxious.
This is currently an experimental "library fundamentals" library, but when it becomes more concrete, it would make sense to use it in value
.
Add and pass all the tests from the json_checker suite.
When compiling the library on a 32-bit target, it generates this static assert.
../../thirdparty/json-voorhees/src/jsonv/value.cpp:356:5: error: static_assert
failed "!!"
static_assert(sizeof _data == sizeof _data.object, "!!");
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am not sure if this library supports 32-bit architecture. It was compiled successfully on 64-bit in the same platform (OS X).
Honor INSTALL_DIR
!
This unit test will cause a segmentation fault, as jsonv::detail::string_decode
will dereference NULL
:
TEST(string_decode_invalid_escape_sequence)
{
string_decode_static("\\y");
}
This should throw a jsonv::decode_error
instead.
JSON Voorhees should have a real suite of benchmarks to pit it against other C++ JSON libraries with real-world JSON.
Consider the JSON object which contains a key "x"
with the numeric value 2^64+1
:
{ "x": 18446744073709551617 }
How might I access this in C++?
// Let's assume the call to parse doesn't throw...it will in the current code
jsonv::value v = jsonv::parse("18446744073709551617");
int64_t i = v.as_integer(); // can't represent the value
double d = v.as_decimal(); // loss of precision
std::string s = v.as_string(); // throws kind_error
How does one get access to the value in this perfectly valid JSON? The parser can verify the JSON string 18446744073709551617
represents a valid numeric value, but there is no way to store it or get it back to the user!
The converse of this also does not work. If I have an int128_t
, there is no way to emit 18446744073709551617
without resorting to stringification (which is incorrect).
One solution would be choose a C++ arbitrary-precision arithmetic library and go with that. The biggest drawback to this approach is: Which one? Boost.Multiprecision, TTMath, GNU Multi-Precision Library, MAPM, InfInt, MathX... Using any of these libraries would force a dependency on any library user, even if they never intend to deal with non-representable integers.
I believe the correct solution is to allow direct access to the shared_buffer
which ultimately backs the parsing. This will allow users to deal with the values in their own way.
jsonv::value v = jsonv::parse("18446744073709551617");
jsonv::shared_buffer buff = v.get_encoded_buffer();
// buffer contains 18446744073709551617 (no null termination)
my_int x = my_int::parse(buffer.cbegin(), buffer.size());
For setting:
jsonv::shared_buffer buffer(to_string(x));
auto v = jsonv::value::from_buffer(jsonv::kind::integer, buffer);
There are some strange caveats to this, all of which are still up in the air...
as_integer
on a non-representable (but still integer) value, what should happen? If you throw, there is a major drawback that x.get_kind() == kind::integer ? x.as_integer() : 0
can throw.
as_decimal
, the answer is clear -- by asking for a double
, you're assuming a degree of precision loss.jsonv::value::from_buffer
validate that the input buffer
is a valid encoding for the supplied kind
?get_encoded_buffer
, what should happen if it does not exist?get_encoded_buffer
on the aggregate types array
and object
?When serializing a value
into a std::ostream
imbued with a German locale (or any other locale that uses ,
for a decimal), an invalid JSON string will be emitted.
Bleh.
Parsing outputs a UTF-8 encoded std::string
from JSON numeric encodings (\uNNNN
). This might not be what the user desires.
This is only on my radar because the workaround for this is so absurdly inconvenient for platforms that do not support output of UTF-8 encoded strings...the only workaround I can think of forces the user to convert at every string access ร la convert_utf8_to_utf1(val.as_string())
. That said, if every platform you'd use JsonVoorhees on supports UTF-8, this isn't worth dealing with. I'm going to wait until somebody actually cares about this to address it.
At a more general level: It might be completely pointless to support not-UTF-8 when the resultant string representation is the sequence of single bytes std::string
. In Windows, where UCS-2 seems to be the norm for presentation, I should more to having strings be backed by std::wstring
before addressing this sort of thing.
The match_string
function can incorrectly escape early if a "
is preceded by a \
which was preceded by a \
. In other words, the string "\\\" and keep going"
will not include the and keep going
bit.
TEST(token_attempt_match_string_double_reverse_solidus_before_escaped_quote)
{
static const char tokens[] = R"("\\\" and keep going")";
token_kind kind;
std::size_t length;
match_result result = static_attempt_match(tokens, kind, length);
ensure(result == match_result::complete);
ensure_eq(token_kind::string, kind);
ensure_eq(sstrlen(tokens), length);
}
value(1.0) == value(1)
Hi,
just starting to give JSON Voorhees a try...
by trying an example based on the following link.
This following causes a "Segmentation Fault" (64 bit Debian GNU/Linux)
#include <iostream>
#include <string>
#include <jsonv/value.hpp>
#include <jsonv/serialization_builder.hpp>
#include <jsonv/parse.hpp>
struct foo
{
int a;
int b;
std::string c;
};
struct bar
{
foo x;
foo y;
std::string z;
std::string w;
};
std::ostream& operator<<(std::ostream& os, const foo &f)
{
os << " a: " << f.a << '\n'
<< " b: " << f.b << '\n'
<< " c: " << f.c << std::endl;
return os;
}
std::ostream& operator<<(std::ostream& os, const bar &b)
{
os << "x:\n" << b.x << '\n'
<< "y:\n" << b.y << '\n'
<< "z: " << b.z << '\n'
<< "w: " << b.w << std::endl;
return os;
}
int main()
{
jsonv::formats local_formats =
jsonv::formats_builder()
.type<foo>()
.member("a", &foo::a)
.member("b", &foo::b)
.default_value(10)
.member("c", &foo::c)
.type<bar>()
.member("x", &bar::x)
.member("y", &bar::y)
.member("z", &bar::z)
.since(jsonv::version(2, 0))
.member("w", &bar::w)
.until(jsonv::version(5, 0))
;
// "aaaaa" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
std::string json_string = R"({
"x": { "aaaaa": 50, "b": 20, "c": "Blah" },
"y": { "a": 10, "c": "No B?" },
"z": "Only serialized in 2.0+",
"w": "Only serialized before 5.0"
}
)";
jsonv::formats format = jsonv::formats::compose({ jsonv::formats::defaults(), local_formats });
jsonv::value val = jsonv::parse(json_string);
bar x = jsonv::extract<bar>(val, format);
std::cout << x << std::endl;
return 0;
}
I'd expect some kind of exception or error handling here!
The fix is
// "a" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
std::string json_string = R"({
"x": { "a": 50, "b": 20, "c": "Blah" },
"y": { "a": 10, "c": "No B?" },
"z": "Only serialized in 2.0+",
"w": "Only serialized before 5.0"
}
)";
Does this library have exceptions for parse errors?
Can I print a nice message to the user explaining:
"a"
) and actual incorrectly supplied property ("aaaaa"
)Thanks
Most of JSON libraries re-invent boost::variant<> to store polymorphic values.
Why?
The funciton value parse(const char* input, std::size_t length, const parse_options&)
should be removed in favor of a value parse(string_ref, const parse_options&)
function. This is the entire purpose of string_ref
.
If an exception is thrown during the copy, it is propagated out. This instance will be set to a null value - any value will be disposed. We should use a stronger exception guarantee here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.