Giter Site home page Giter Site logo

Comments (3)

vtermanis avatar vtermanis commented on May 26, 2024

Initial solution - remove buffering completely

Overview

Read the exact requested character count from file-like objects read() method. This results in significantly more calls to said method (as e.g. type markers result in a one byte read) and leads to considerable performance degradation.

Potential alternative

If file-like object is seekable() (e.g. file on local filesystem), enable buffering, otherwise use no buffering (e.g. socket).

General testing

  • full suite with added test for issue
  • coverage_test.sh with compiled extension
  • pympler reference leak test (see bottom of test.py)

Performance

Medium-size varied-content decoding

Steps

python3 -mubjson fromjson test/data/CouchDB4k.compact.json /tmp/CouchDB4k.compact.ubjson
python -mtimeit --n 50000 \
 -s "
from io import BytesIO
from ubjson import load, __version__
print(__version__)
raw = BytesIO()
with open('/tmp/CouchDB4k.compact.ubjson', 'rb') as f:
    raw.write(f.read())
"\
 'raw.seek(0); load(raw)'

Python3 output

Before

0.13.0
0.13.0
0.13.0
50000 loops, best of 3: 23.4 usec per loop

After

0.14.0
0.14.0
0.14.0
50000 loops, best of 3: 73.8 usec per loop

Python2 output

Before

0.13.0
0.13.0
0.13.0
50000 loops, best of 3: 25.5 usec per loop

After

0.14.0
0.14.0
0.14.0
50000 loops, best of 3: 92.5 usec per loop

Large file decoding (62MB with small fields)

python3 -mtimeit -r1 -n1 -s "
from ubjson import load, __version__
print(__version__)
" "
with open('DEFRA.uk_air.ubj', 'rb') as f:
    load(f, intern_object_keys=True)
"

Python3 output

Before

0.13.0
1 loops, best of 1: 2.08 sec per loop

After

0.14.0
1 loops, best of 1: 5.53 sec per loop

from py-ubjson.

vtermanis avatar vtermanis commented on May 26, 2024

#11 addresses performance concerns by using three methods for reading input from:

  1. Fixed single-dimension byte sequence (as before)
  2. Buffered from a seek()-able file-like object (as before)
  3. Unbuffered from a file-like object (new)

from py-ubjson.

vtermanis avatar vtermanis commented on May 26, 2024

Fixed in 0.14.0.

from py-ubjson.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.