Giter Site home page Giter Site logo

simdjson_ruby's People

Contributors

roxasshadow avatar saka1 avatar sirupsen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

simdjson_ruby's Issues

Consider upgrading to simdjson 0.3.1

Version 0.3 of simdjson is now available

Highlights

  • Multi-Document Parsing: Read a bundle of JSON documents (ndjson) 2-4x faster than doing it individually. API docs / Design Details
  • Simplified API: The API has been completely revamped for ease of use, including a new JSON navigation API and fluent support for error code and exception styles of error handling with a single API. Docs
  • Exact Float Parsing: Now simdjson parses floats flawlessly without any performance loss (simdjson/simdjson#558).
    Blog Post
  • Even Faster: The fastest parser got faster! With a shiny new UTF-8 validator
    and meticulously refactored SIMD core, simdjson 0.3 is 15% faster than before, running at 2.5 GB/s (where 0.2 ran at 2.2 GB/s).

Minor Highlights

  • Fallback implementation: simdjson now has a non-SIMD fallback implementation, and can run even on very old 64-bit machines.
  • Automatic allocation: as part of API simplification, the parser no longer has to be preallocated-it will adjust automatically when it encounters larger files.
  • Runtime selection API: We've exposed simdjson's runtime CPU detection and implementation selection as an API, so you can tell what implementation we detected and test with other implementations.
  • Error handling your way: Whether you use exceptions or check error codes, simdjson lets you handle errors in your style. APIs that can fail return simdjson_result, letting you check the error code before using the result. But if you are more comfortable with exceptions, skip the error code and cast straight to T, and exceptions will be thrown automatically if an error happens. Use the same API either way!
  • Error chaining: We also worked to keep non-exception error-handling short and sweet. Instead of having to check the error code after every single operation, now you can chain JSON navigation calls like looking up an object field or array element, or casting to a string, so that you only have to check the error code once at the very end.

Error compiling...

I was looking to contribute to the gem, but it blows up after cloning when trying to get up and running... A bit of docs on dev setup or what might be missing to build from scratch would be helpful.

bundle exec rake
Copy singleheader files to ext/simdjson...
rake aborted!
Errno::ENOENT: No such file or directory @ rb_sysopen - /Users/danmayer/projects/simdjson_ruby/vendor/simdjson/singleheader/simdjson.h
/Users/danmayer/projects/simdjson_ruby/Rakefile:22:in `block in <top (required)>'
/Users/danmayer/.rvm/gems/ruby-2.6.2/gems/rake-13.0.1/exe/rake:27:in `<top (required)>'
/Users/danmayer/.rvm/gems/ruby-2.6.2/bin/ruby_executable_hooks:24:in `eval'
/Users/danmayer/.rvm/gems/ruby-2.6.2/bin/ruby_executable_hooks:24:in `<main>'
Tasks: TOP => default => compile => before_compile
(See full trace by running task with --trace)

Consider updating to 0.4.0

Version 0.4.0 of simdjson is now available

Highlights

  • Test coverage has been greatly improved and we have resolved many static-analysis warnings on different systems.

New features:

  • We added a fast (8GB/s) minifier that works directly on JSON strings.
  • We added fast (10GB/s) UTF-8 validator that works directly on strings (any strings, including non-JSON).
  • The array and object elements have a constant-time size() method.

Performance:

  • Performance improvements to the API (type(), get<>()).
  • The parse_many function (ndjson) has been entirely reworked. It now uses a single secondary thread instead of several new threads.
  • We have introduced a faster UTF-8 validation algorithm (lookup3) for all kernels (ARM, x64 SSE, x64 AVX).

System support:

  • C++11 support for older compilers and systems.
  • FreeBSD support (and tests).
  • We support the clang front-end compiler (clangcl) under Visual Studio.
  • It is now possible to target ARM platforms under Visual Studio.
  • The simdjson library will never abort or print to standard output/error.

Improve efficiency with 'SAX' API.

The current implementation has two steps:

  1. Parse JSON string and construct a tape
  2. Generate ruby instances from the tape

According to the roadmap(simdjson/simdjson#997), SAJ API will be available someday.
We would bypass the tape construction with SAJ, which improves efficiency.

Encoding changed after parsing

I've run into a problem where a UTF-8 encoded string is parsed by Simdjson.parse and one of the resulting strings is encoded in ASCII-8BIT. I can reproduce this like so:

# run with ruby --encoding=UTF-8 if UTF-8 isn't your system default.
x = '{"m":" – "}' # note the non-ASCII character in the value
puts x.encoding # => #<Encoding::UTF-8>
y = Simdjson.parse(x)
puts y['m'].encoding # => #<Encoding:ASCII-8BIT>

It seems like the encoding of the output strings should remain the same as the encoding of the input string, right? I'm not sure if this is an issue that belongs here or in the main simdjson repository but I appreciate you taking a look either way.

add support for `symbol_keys` equivalent to OJ / JSON.parse

I recently wrote up a post about how much faster simdjson_ruby can be than OJ and other options.

see blog post

While that is all true, when you need things to have symbols due to the expected upstream usage, the benchmark falls apart... I was looking to add support to create symbols while building up the hash vs having to convert to them after.

My benchmark updated to have symbolized keys for all implementations...

require 'benchmark/ips'
require 'json'
require 'oj'
require 'simdjson'
require 'memory_profiler'
require 'rails'

json = File.read("./json_data.json")

puts "ensure these match"
puts  Oj.load(json.dup, symbol_keys: true) == Simdjson.parse(json.dup).deep_symbolize_keys! && Simdjson.parse(json.dup).deep_symbolize_keys! == JSON.parse(json.dup, symbolize_names: true)

Benchmark.ips do |x|
  x.config(:time => 15, :warmup => 3)

  x.report("oj parse") { Oj.load(json.dup, symbol_keys: true) }
  x.report("simdjson parse") { Simdjson.parse(json.dup).deep_symbolize_keys }
  x.report("stdlib JSON parse") { JSON.parse(json.dup, symbolize_names: true) }

  x.compare!
end

The resulting output shows that all the perf improvements of the parser are lost to having to do a second pass for symbolizing, at least in the case of large JSON files.

ensure these match
true
Warming up --------------------------------------
            oj parse   101.000  i/100ms
      simdjson parse    44.000  i/100ms
   stdlib JSON parse    58.000  i/100ms
Calculating -------------------------------------
            oj parse	      1.016k (± 4.9%) i/s -     15.251k in  15.051368s
      simdjson parse    420.256  (± 6.7%) i/s -      6.292k in  15.052436s
   stdlib JSON parse    503.879  (±11.1%) i/s -      7.482k in  15.037979s

Comparison:
            oj parse:     1016.2 i/s
   stdlib JSON parse:      503.9 i/s - 2.02x  (± 0.00) slower
      simdjson parse:      420.3 i/s - 2.42x  (± 0.00) slower

I haven't wrote a C extension for Ruby for years, but happy to help if I can get the full build / test cycle working... Or if this all makes sense to you happy to review a PR if you think this is a good idea and know-how to tackle it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.