Giter Site home page Giter Site logo

charred's Introduction

Charred

Efficient character-based file parsing for csv and json formats.

Clojars Project

Usage

user> (require '[charred.api :as charred])
nil
user> (charred/read-json "{\"a\": 1, \"b\": 2}")
{"a" 1, "b" 2}
user> (charred/read-json "{\"a\": 1, \"b\": 2}" :key-fn keyword)
{:a 1, :b 2}
user> (println (charred/write-json-str *1))
{
  "a": 1,
  "b": 2
}

A Note About Efficiency

If you are reading or writing a lot of small JSON objects the best option is to create a specialized parse fn to exactly the options that you need and pass in strings or char[] data. A similar pathway exists for high performance writing of json objects. The returned functions are safe to use in multithreaded contexts.

The system is overall tuned for large files. Small files or input streams should be setup with :async? false and smaller :bufsize arguments such as 8192 as there is no gain for async loading when the file/stream is smaller than 1MB. For smaller streams slurping into strings in an offline threadpool will lead to the highest performance. For a particular file size if you know you are going to parse many of these then you should gridsearch :bufsize and :async? as that is a tuning pathway that I haven't put a ton of time into. In general the system is tuned towards larger files as that is when performance really does matter.

All the parsing systems have mutable options. These can be somewhat faster and it is interesting to look at the tradeoffs involved. Parsing a csv using the raw supplier interface is a bit faster than using the Clojure sequence pathway into persistent vectors and it probably doesn't really change your consume pathway so it may be worth trying it.

Development

Before running a REPL you must compile the java files into target/classes. This directory will then be on your classpath.

scripts/compile

Tests can be run with scripts/run-tests which will compile the java and then run the tests.

Lies, Damn Lies, and Benchmarks!

See the fast-json project. These times are for parsing a 100k json document using keywords for map keys - :key-fn keyword.

Intel JDK-8

method performance µs
data.json 4275
jsonista 754
charred 638
charred-hamf 486

Intel JDK-19

method performance µs
data.json 5608
jsonista 856
charred 673
charred-hamf 531

Mac m-1 JDK-19

method performance µs
data.json 3164
jsonista 285
charred 249
charred-hamf 227

License

MIT license.

charred's People

Contributors

cnuernber avatar fiv0 avatar kawas44 avatar matus-laslofi-otm avatar ryantate avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.