Giter Site home page Giter Site logo

cran / rcppsimdjson Goto Github PK

View Code? Open in Web Editor NEW
0.0 5.0 0.0 2.25 MB

:exclamation: This is a read-only mirror of the CRAN R package repository. RcppSimdJson — 'Rcpp' Bindings for the 'simdjson' Header-Only Library for 'JSON' Parsing. Homepage: https://github.com/eddelbuettel/rcppsimdjson/ Report bugs for this package: https://github.com/eddelbuettel/rcppsimdjson/issues

R 5.23% C++ 94.74% Makefile 0.01% Shell 0.01% C 0.01%

rcppsimdjson's Introduction

RcppSimdJson: Rcpp Bindings for the simdjson Header Library

CI License CRAN Dependencies Downloads Code Coverage Last Commit

Motivation

simdjson by Daniel Lemire (with contributions by Geoff Langdale, John Keiser and many others) is an engineering marvel. Through very clever use of SIMD instructions, it manages to parse JSON files faster than disc access. Wut? Yes you read that right: parallel processing with so little overhead that the net throughput is limited only by disk speed.

Moreover, it is implemented in neat modern C++ and can be accessed as a header-only library. (Well, one library in two files, really.) Which makes R packaging easy and convenient and compelling. So here we are.

For further introduction, see the arXiv paper by Langdale and Lemire (out/to appear in VLDB Journal 28(6) as well) and/or the video of the recent talk by Daniel Lemire at QCon (voted best talk).

Example

jsonfile <- system.file("jsonexamples", "twitter.json", package="RcppSimdJson")
library(RcppSimdJson)
validateJSON(jsonfile)                  # validate a JSON file
res <- fload(jsonfile)                  # parse a JSON file

Comparison

A simple parsing benchmark against four other R-accessible JSON parsers:

R> res
Unit: milliseconds
     expr      min       lq     mean   median       uq       max neval  cld
 simdjson  1.87118  2.03252  2.24351  2.17228  2.27756   6.57145   100 a
  jsonify  8.91694  9.20124  9.58652  9.46077  9.73692  13.41707   100  b
  RJSONIO 10.49187 11.09410 11.69109 11.42555 11.95780  17.93653   100  b
   ndjson 27.04830 28.62251 31.44330 29.51343 32.05847 146.88221   100   c
 jsonlite 34.93334 36.54784 38.67843 37.74890 40.19555  46.32444   100    d
R>

Or in chart form:

Status

All three major OSs are supported, and JSON can be parsed from file and string under a variety of settings. A C++17 compiler is required for ease of setup (though the upstream can fall back to older compiler; one can edit src/Makevars accordingly if need be).

Contributing

Any problems, bug reports, or features requests for the package can be submitted and handled most conveniently as Github issues in the repository.

Before submitting pull requests, it is frequently preferable to first discuss need and scope in such an issue ticket. See the file Contributing.md (in the Rcpp repo) for a brief discussion.

See Also

For standard JSON work on R, as well as for other nicely done C++ libraries, consider these:

Author

For the R package, Dirk Eddelbuettel and Brendan Knapp.

For everything pertaining to simdjson, Daniel Lemire (and many contributors).

rcppsimdjson's People

Contributors

eddelbuettel avatar

Watchers

Gábor Csárdi avatar James Cloos avatar CRAN robot avatar  avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.