Giter Site home page Giter Site logo

Comments (6)

dakrone avatar dakrone commented on July 19, 2024

I would love this feature. :)

If it doesn't cause a performance hit, it should definitely be the default, if it does, having a new lazy-parser is definitely viable also.

Added you to the repo, now you're on the hook Zach :)

from cheshire.

gfredericks avatar gfredericks commented on July 19, 2024

It's worth noting that this should be possible (though harder) for maps as well.

Example use case: by default couchdb returns data structured like so:

{"some":"resultset","meta":"data","rows":[
{...},
{...},
{...},
{...}
]}

And presumably requiring that maps be parsed eagerly means there's no straightforward way to consume these kinds of documents lazily.

I know there are a few (maybe only half-baked) implementations of a lazy map floating around.

from cheshire.

ztellman avatar ztellman commented on July 19, 2024

An implementation of this can be found at 2a3b2bc, in the branch lazy-top-level-arrays. I'd appreciate a code review.

Interestingly, this appears to be faster than the existing approach:

cheshire.core> (def (generate-string (range 128)))
; Evaluation aborted.
cheshire.core> (def s (generate-string (range 128)))
#'cheshire.core/s
cheshire.core> (use 'criterium.core)
nil
cheshire.core> (quick-bench (dorun (parse-string s)))
WARNING: Final GC required 30.225372725730608 % of runtime
Evaluation count : 50562 in 6 samples of 8427 calls.
             Execution time mean : 12.141253 µs
    Execution time std-deviation : 108.685624 ns
   Execution time lower quantile : 12.046142 µs ( 2.5%)
   Execution time upper quantile : 12.319445 µs (97.5%)
                   Overhead used : 2.629471 ns

Found 1 outliers in 6 samples (16.6667 %)
    low-severe   1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

;; disable lazy parsing here

cheshire.core> (quick-bench (dorun (parse-string s)))
WARNING: Final GC required 31.019576852051898 % of runtime
Evaluation count : 39966 in 6 samples of 6661 calls.
             Execution time mean : 15.046137 µs
    Execution time std-deviation : 300.837064 ns
   Execution time lower quantile : 14.811363 µs ( 2.5%)
   Execution time upper quantile : 15.488795 µs (97.5%)
                   Overhead used : 2.629471 ns
nil

Apparently chunked-seqs are more efficient to construct than transient vectors. I could see an argument for using chunked seqs everywhere, except for a few downsides:

  • they're not indexed
  • they're not counted
  • they must be sized ahead of time, so small arrays will waste memory

I think the current approach (lazy-seqs at the top level, vectors everywhere else) is a pretty decent compromise, but I think that's worth discussing in more detail.

Anyway, let me know what you think.

from cheshire.

dakrone avatar dakrone commented on July 19, 2024

This looks great. I added a long-running benchmark just for top-level array parsing.

Here are the benchmarks:
pre: http://p.sa2s.us/1369196973422f5b1db5f.txt
post: http://p.sa2s.us/13691969896500361ea03.txt

The code looked pretty good to me, I did remove an unused variable that was being let-bound by replacing it with a do, but other that that it looked good. I also noticed your =/identical? change and replaced = in a few other places in the parsing code, good catch.

I agree with you that I think lazy-seqs at the top level, eager everywhere else sounds like a good idea, and I imagine it's the case most often used.

Let me know what you think, and thanks again for the help and contribution!

from cheshire.

ztellman avatar ztellman commented on July 19, 2024

Oh, whoops, should have caught the let thing. Your changes all seem good, feel free to merge it in whenever you like.

from cheshire.

dakrone avatar dakrone commented on July 19, 2024

Sweet, released 5.2.0 with this, thanks again!

from cheshire.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.