spotify / spotify-json Goto Github PK

View Code? Open in Web Editor NEW

190.0 20.0 40.0 7.52 MB

Fast and nice to use C++ JSON library.

License: Apache License 2.0

CMake 2.69% C++ 96.70% Batchfile 0.19% Shell 0.42%

json json-libraries json-parser json-serialization

spotify-json's Introduction

spotify-json

A C++17 JSON writer and parser library. It

parses and serializes directly to and from statically typed C++ objects,
requires very little boilerplate code,
is fast and makes use of vectorization,
supports UTF-8,
comes with a good suite of tests,
is deployed and in active use on over 250 million devices,
and has API documentation.

spotify-json depends on Google's double-conversion library, which must be linked in to the code that uses spotify-json.

Example

#include <iostream>
#include <map>
#include <string>

#include <spotify/json.hpp>

using namespace spotify::json;

struct Track {
  std::string uri;
  std::string uid;
  std::map<std::string, std::string> metadata;
};

namespace spotify {
namespace json {

// Specialize spotify::json::default_codec_t to specify default behavior when
// encoding and decoding objects of certain types.
template <>
struct default_codec_t<Track> {
  static object_t<Track> codec() {
    auto codec = object<Track>();
    codec.required("uri", &Track::uri);
    codec.optional("uid", &Track::uid);
    codec.optional("metadata", &Track::metadata);
    return codec;
  }
};

}  // namespace json
}  // namespace spotify

int main() {
  const auto parsed_track = decode<Track>(R"({ "uri": "spotify:track:xyz", "metadata": { "a": "b" } })");
  std::cout << "Parsed track with uri " << parsed_track.uri << std::endl;

  Track track;
  track.uri = "spotify:track:abc";
  track.uid = "a-uid";
  const auto json = encode(track);
  std::cout << "Encoded the track into " << json << std::endl;

  return 0;
}

Usage

spotify-json offers a range of codec types that can serialize and parse specific JSON values. There are codecs for each of the basic data types that JSON offers: strings, numbers, arrays, booleans, objects and null.

Constructing and composing codecs

A codec for integers can be made using codec::number<int>(). The codec for strings can be instantiated with codec::string().

Codecs are composable. It is for example possible to construct a codec for parsing and serialization of JSON arrays of numbers, such as [1,4,2]: codec::array<std::vector<int>>(codec::number<int>()).

Constructing deeply nested codecs manually as above can become tedious. To ease this pain, default_codec is a helper function that makes it easy to construct codecs for built-in types. For example, default_codec<int>() is a codec that can parse and serialize numbers, and default_codec<std::vector<int>>() is one that works on arrays of numbers.

It is possible to work with JSON objects with arbitrary keys. For example, default_codec<std::map<std::string, bool>>() is a codec for JSON objects with strings as keys and booleans as values.

Parsing and serialization

Parsing is done using the decode function:

try {
  decode(codec::number<int>(), "123") == 123;
  decode<int>("123") == 123;  // Shortcut for decode(default_codec<int>(), "123")
  decode<std::vector<int>>("[1,2,3]") == std::vector{ 1, 2, 3 };
} catch (const decode_exception &e) {
  std::cout << "Failed to decode: " << e.what() << std::endl;
}

decode throws decode_exception when parsing fails. There is also a function try_decode that doesn't throw on parse errors:

int result = 0;
if (try_decode(result, "123")) {
  result == 123;
} else {
  // Decoding failed!
}

Similarly, serialization is done using encode:

encode(codec::number<int>(), 123) == "123";
encode(123) == "123";  // Shortcut for encode(default_codec<int>(), 123)
encode(std::vector<int>{ 1, 2, 3 }) == "[1,2,3]";

Working with rich objects

Working with basic types such as numbers, strings, booleans and arrays is all nice and dandy, but most practical applications need to deal with rich JSON schemas that involve objects.

Many JSON libraries work by parsing JSON strings into a tree structure that can be read by the application. In our experience, this approach often leads to large amounts of boilerplate code to extract the information in this tree object into statically typed counterparts that are practical to use in C++. This boilerplate is painful to write, bug-prone and slow due to unnecessary copying. SAX-style event based libraries such as yajl avoid the slowdown but require even more boilerplate.

spotify-json avoids these issues by parsing the JSON directly into statically typed data structures. To explain how, let's use the example of a basic two-dimensional coordinate, represented in JSON as {"x":1,"y":2}. In C++, such a coordinate may be represented as a struct:

struct Coordinate {
  Coordinate() = default;
  Coordinate(int x, int y) : x(x), y(y) {}

  int x = 0;
  int y = 0;
};

With spotify-json, it is possible to construct a codec that can convert Coordinate directly to and from JSON:

auto coordinate_codec = object<Coordinate>();
coordinate_codec.required("x", &Coordinate::x);
coordinate_codec.required("y", &Coordinate::y);

The use of required will cause parsing to fail if the fields are missing. There is also an optional method. For more information, see object_t's API documentation.

This codec can be used with encode and decode:

encode(coordinate_codec, Coordinate(10, 0)) == R"({"x":10,"y":0})";

const Coordinate coord = decode(coordinate_codec, R"({ "x": 12, "y": 13 })");
coord.x == 12;
coord.y == 13;

Objects can be nested. To demonstrate this, let's introduce another data type:

struct Player {
  std::string name;
  std::string instrument;
  Coordinate position;
};

A codec for Player might be created with

auto player_codec = object<Player>();
player_codec.required("name", &Player::name);
player_codec.required("instrument", &Player::instrument);
// Because there is no default_codec for Coordinate, we need to pass in the
// codec explicitly:
player_codec.required("position", &Player::position, coordinate_codec);

// Let's use it:
Player player;
player.name = "Daniel";
player.instrument = "guitar";
encode(player_codec, player) == R"({"name":"Daniel","instrument":"guitar","position":{"x":0,"y":0}})";

Since codecs are just normal objects, it is possible to create and use several different codecs for any given data type. This makes it possible to parameterize parsing and do other fancy things, but for most data types there will only really exist one codec. For these cases, it is possible to extend the default_codec helper to support your own data types.

namespace spotify {
namespace json {

template <>
struct default_codec_t<Coordinate> {
  static object_t<Coordinate> codec() {
    auto codec = object<Coordinate>();
    codec.required("x", &Coordinate::x);
    codec.required("y", &Coordinate::y);
    return codec;
  }
};

template <>
struct default_codec_t<Player> {
  static object_t<Player> codec() {
    auto codec = object<Player>();
    codec.required("name", &Player::name);
    codec.required("instrument", &Player::instrument);
    codec.required("position", &Player::position);
    return codec;
  }
};

}  // namespace json
}  // namespace spotify

Coordinate and Player can now be used like any other type that spotify-json supports out of the box:

encode(Coordinate(10, 0)) == R"({"x":10,"y":0})";
decode<std::vector<Coordinate>>(R"([{ "x": 1, "y": -1 }])") == std::vector<Coordinate>{ Coordinate(1, -1) };

Player player;
player.name = "Martin";
player.instrument = "drums";
encode(player) == R"({"name":"Martin","instrument":"drums","position":{"x":0,"y":0}})";

Advanced usage

The examples above cover the most commonly used parts of spotify-json. The library supports more things that sometimes come in handy:

Most STL containers, including array, vector, deque, list, set, unordered_set, pair, tuple, map and unordered_map
C++ enums and similar types
Arbitrary conversion logic, for example when a raw binary hash in C++ is represented as a hex coded string in JSON
Dealing with versioning
Ignoring values that are of the wrong type instead of failing the parse
Values wrapped in unique_ptrs and shared_ptrs
boost::optional
boost::chrono and std::chrono types
Dealing with virtual classes / type erasure
Floating point numbers with lossless serialize/parse roundtrip

Detailed API documentation

Linking against the library in a project

If your project is built with CMake, it is easy to use spotify-json. Here is an example of how it can be done:

Add spotify-json as a git submodule under vendor/
Add the following lines to the CMakeLists.txt of your project:

add_subdirectory(vendor/spotify-json)
target_link_libraries([YOUR TARGET] spotify-json)

Building and running tests

Requirements

CMake (http://www.cmake.org)
Boost (http://www.boost.org)

1. Make CMake find Boost

export BOOST_ROOT=/path/to/boost
export BOOST_LIBRARYDIR=/path/to/boost/lib/

2. Run CMake

mkdir build
cd build
cmake -G <generator-name> ..

Run "cmake --help" for a list of generators available on your system.

3. Build project with Visual Studio / Xcode / Ninja

4. Run CTest

cd build
ctest -j 8

Code of conduct

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

spotify-json's People

Contributors

Stargazers

Watchers

Forkers

punchfox fxb stewdog emil-e inrustwetrust jatinmistry13 per-gron wangscape 8w9ag oskmeister tensh gnified oencoding santeri mcsanchez093 gcrosby5269 easydeps parbo surinkim encom-one supertanglang maoxingda dev-resources ivangalkin wade1990 matt-deboer lwj0723 blockspacer doytsujin michaeleisel britus davidkorczynski jnohlgard isabella232 makesoftwaresafe mayhemheroes dualword tungth07 abhishekmishragithub

spotify-json's Issues

(Known) Mixed type vectors

What would be the proper way to decode something like the following? If the first position can only ever be "buy" or "sell" I figure you can probably do some kind of intermediate processing, but this isn't documented as far as I can tell. Any help appreciated!

{"changes": [["buy", "1", "3"]]}

Determine how many bits in pointers we can safely use for tags

The only way we can get reasonable performance and memory usage from a graph based API is to use tagged pointers and nanboxing to store the values in 64-bit words. It's important that we follow the linear address space rules for each platform, so that we don't end up with invalid pointers after an operating system or update. The spotify-json library is deployed on literary hundreds of millions of devices, from desktop systems to mobile phones to servers, some harder to upgrade than others. We need to ensure that we are correct on our four supported platforms: ARM, ARM-64, x86, x86-64.

Simplify the encoded_value type

We have found that it is a bit too cumbersome to use the encoded_value<T> struct. It has several limitations in some of our common uses cases:

It is often not very interesting to know how the data is stored in the value
When the caller wants a std::string it can just call json::encoded instead of json::encoded_value
Creating an encoded_value with the intention of writing to another JSON string requires an extra copy of the encoded JSON (encode_context → std::string → encode_context)
json::encode_value is unsafe when used incorrectly (json::encode<json::ref>(...))
It is difficult to write an function that takes an encoded value, since the value has the storage type in a template parameter (encoded_value<std::string>, encoded_value<ref> or encoded_value<whatever>?)

We would like the API to have the following characteristics:

Safe
Simple
As fast as possible but not faster

I feel that the current variant of encoded_value<T> fails all three of those. 😢

I propose that we simplify the encoded_value<T> to just encoded_value (no <T>). The new version will always own the JSON data, but we don't allow the caller to std::move the storage out from it. The most common use case for this was to steal the underlying std::string value, but the caller can do that with json::encode anyway. The new type would allow the json::encode_value function to move the encode_buffer into the returned value, avoiding the extra copy mentioned above.

This would make the API safe and simple and in many cases fast. We should also let advanced users of the API avoid the copy completely when there is an opportunity for that. I propose that we introduce a new type json_value_ref that is similar to the json_value (the any_value_t codec should work with both types), but does not have its own storage of the JSON, just referring to another JSON buffer. The caller is then responsible for keeping the JSON buffer alive for as long as they need to use the encoded values.

By letting the encoded_value and encoded_value_ref types implicitly convert, the usage stays simple:

...
json::encoded_value save_some_state() const;
void restore_some_state(const json::encoded_value_ref &saved_state);
...
const auto saved_state = save_some_state();
restore_some_state(saved_state);

Compare this to today's situation:

...
json::encoded_value<> save_some_state() const;  // T is std::string
void restore_some_state(const json::encoded_value<> &saved_state);
...
const auto saved_state = save_some_state();  // Copying the JSON into a std::string
restore_some_state(saved_state);

or:

...
json::encoded_value<> save_some_state() const;  // T is std::string
void restore_some_state(const json::encoded_value<json::ref> &saved_state);
...
const auto saved_state = save_some_state();  // Copying the JSON into a std::string
restore_some_state(saved_state);  // Mismatching storage types

Summary

~~json::encoded_value<T>~~
json::encoded_value and json::encoded_value_ref

Proposal: Rename the raw_t codec and change its return type (and alter the functionality a bit ...)

I would like to explore some changes to the raw_t codec type before we settle on version 1.0. I feel that the name is almost an implementation detail rather than describing what it really is. Yes, it allows you to access the raw bytes of the value as encoded in the file, but what it is really useful for is to represent any value that can later be decode by someone else.

I think it would be better named something like value_t, any_value_t, anything_t etc., returning a encoded_value<T>, where T is the storage type, e.g., ref, std::string, std::vector. The encoded_value will have an implicit casting operator to const &T (see the section about json::encoded for why this is useful).

Question: Would we allow moving a T out of the encoded_value<T>?

Using a special type (encoded_value) instead of a raw std::string or std::vector allows us to build in validation that the JSON we are writing is in fact correct. The current raw_t codec will happily trust that the caller is passing valid JSON, something which we already have found led to a bug (not deployed). Anything that is already an encoded_value is known to be valid JSON, so validation is free. When creating an encoded_value from a custom std::string or std::vector we'll validate immediately in the constructor.

I also propose that we change the return type of json::encode() to encoded_value<std::string> (instead of std::string). With the implicit cast to std::string, using it should feel almost the same as today, assigning directly into an std::string. It would be more expensive though, since we might not be able to move the string out of the value type (see question above).

The encoded_value type might have a type property to identify what kind of JSON value it stores. This is nice to have but optional. The type of the type property would be something like:

enum class type {
  null,
  boolean,
  number,
  string,
  object,
  array
};

With all these changes (not considering the type property), it would be easier to store and work with values that the decoder does not know the type of. A Spotify-internal example of this would be track metadata, which we now could store like this (here encoded_value has a default storage type):

std::unordered_map<std::string, json::encoded_value<>> metadata = json::decode(...);
...
const auto duration = json::decode<std::chrono::milliseconds>(metadata["duration"]);
const auto title = json::decode<std::string>(metadata["title"]);
...
metadata["explicit"] = json::encode(true);

Question: In the example above, encoded_value is default constructible. That what? "null"?

Example of JSON validation:

encoded_value<>("[1, 2, 3]") == "[1, 2, 3]";
encoded_value<>("[1, 2, 3");  // throws an exception

What do you think about this? Many details to work out. I do think it is a nicer and more powerful system than what we have in place today.

Can we declare version 1.0?

Is there anything more we need to do until we can call this version 1.0? We can definitely hold off on any additions until future versions. Are we happy with the interface and naming of the codecs and functions? Those are the things we cannot change later without breaking source compatibility.

Documentation for optional_t is out of date

https://github.com/spotify/spotify-json/blob/master/doc/api.md#optional_t

It refers to none_as_null, which was removed some time ago. The documentation needs to be updated to refer to the empty_as_null codec instead.

Documentation for tuple_t is wrong

https://github.com/spotify/spotify-json/blob/master/doc/api.md#tuple_t

The example that uses transform passed pair(int(), int()) as the first argument, which seems wrong. transform is supposed to take a codec as the first argument.

dealing with Pointers

Hi,

How do you deal with members of classes which are pointers. i.e.

class Bar
{ int a;}

static  spotify::json::codec::object_t<Bar> codec()
{
    auto codec = spotify::json::codec::object<Bar>();
    codec.required("a", &Bar::a);
    return codec;
}

class Foo
{
int b;
Bar *c;
}

static  spotify::json::codec::object_t<Foo> codec()
{
    auto codec = spotify::json::codec::object<Foo>();
    codec.required("b", &Foo::b);

// how to represent Bar *c;
return codec;
}

How do I represent *c in the codec. I cant find any example which deals with a pointer.

Thanks

Paul

Decoding floats and ints wrapped in quotes

I'm working with an API that returns floats and ints wrapped in quotes. A typical response is included below. In the examples (and in canonical JSON afaik) ints and floats aren't quote wrapped. I get an "invalid floating point number" error, which I guess is expected given that. Is there a way to add an inner encoder that strips the quotes to get them ready for parsing by the double-conversion lib?

["buy", "1", "3"]

Remove `where` parameter from transform lambda

Version: 1.0.0

The transform_t codec takes a function with two arguments to transform the decoded value. I propose that we remove the second parameter, where. It is there so that there is something to pass into the decode_exception constructor. It makes the function more boilerplatey and also means that you cannot easily use transform functions not specifically written for our JSON library, e.g., using an int2str function that takes an int and returns a std::string.

The original use case for the where parameter can be handled by catching any decode_exception thrown by the transform function and just setting the where parameter as appropriate before re-throwning the exception.

Using simdjson as an SAX tokenizer

simdjson seems to be the gold standard in terms of JSON-parsing performance. It's always being updated with state-of-the-art algorithms for parsing, makes excellent use of intrinsics, and supports both arm and x86_64. It's also in use by many different organizations and has extensive testing via fuzzing etc. . I don't know what the performance needs are for JSON parsing here at Spotify, but if there's any desire for more speed, simdjson would be a great choice. It could be used as an SAX tokenizer, or simply forked to have spotify-json's high-level API built on top of it.

Look into faster integer to string conversion

https://github.com/jeaiii/itoa

How to use optionals?

I see codec has required and optional methods.
I tried to add a boost::optional<...> field:

#include <string>
#include <iostream>
#include <boost/optional.hpp>
#include <spotify/json.hpp>

struct MyMessage
{
    std::string required;
    boost::optional<std::string> optional;
};

namespace spotify {
namespace json {
template <>
struct default_codec_t<MyMessage> {
  static codec::object_t<MyMessage> codec() {
    auto codec = codec::object<MyMessage>();
    codec.required("required", &MyMessage::required);
    codec.optional("optional", &MyMessage::optional);
    return codec;
  }
};
} // namespace json
} // namespace spotify

int main(int argc, char **argv)
{
    const auto msg = spotify::json::decode<MyMessage>(R"({ "required": "1", "optional": "foo" })");
    MyMessage msg2;
    msg2.optional = "bar";
    const auto json = spotify::json::encode(msg2);
    std::cout << "Re-encoded:" << std::endl << json << std::endl;
    return 0;
}

However it won't compile because there is no template specialization (of boost::optionalstd::string I suppose?):

In file included from test/tests/json.cpp:4:
In file included from test/include/spotify-json/include/spotify/json.hpp:19:
In file included from test/include/spotify-json/include/spotify/json/json.hpp:19:
In file included from test/include/spotify-json/include/spotify/json/codec.hpp:19:
In file included from test/include/spotify-json/include/spotify/json/codec/codec.hpp:24:
In file included from test/include/spotify-json/include/spotify/json/codec/chrono.hpp:21:
test/include/spotify-json/include/spotify/json/codec/number.hpp:477:3: error: static_assert
      failed "No default_codec_t specialization for type T"
  static_assert(
  ^
test/include/spotify-json/include/spotify/json/default_codec.hpp:32:10: note: in
      instantiation of template class 'spotify::json::default_codec_t<boost::optional<std::__1::basic_string<char>
      > >' requested here
decltype(default_codec_t<T>::codec()) default_codec() {
         ^
test/include/spotify-json/include/spotify/json/codec/object.hpp:268:43: note: while
      substituting explicitly-specified template arguments into function template 'default_codec'
    add_field(name, required, member_ptr, default_codec<value_type>());
                                          ^
test/include/spotify-json/include/spotify/json/codec/object.hpp:64:5: note: in instantiation
      of function template specialization
      'spotify::json::codec::object_t<MyMessage>::add_field<boost::optional<std::__1::basic_string<char> >,
      MyMessage>' requested here
    add_field(name, false, std::forward<args_type>(args)...);
    ^
test/tests/json.cpp:19:11: note: in instantiation of function template specialization
      'spotify::json::codec::object_t<MyMessage>::optional<boost::optional<std::__1::basic_string<char> >
      MyMessage::*>' requested here
    codec.optional("optional", &MyMessage::optional);
          ^

Remove try_decode_partial

Can we remove try_decode_partial? I don't really understand when we would want to use this? I do see several uses of it in the Spotify codebase, but I don't see why those places don't just use try_decode.

Is it a common use case to expect the correctly formatted JSON to followed by garbage that should not be parsed? This seems dangerous, since we support parsing plain JSON values, not just JSON objects, e.g., parsing the correctly formatted JSON 123 followed by the garbage 456, resulting in the number 123456, not 123.