Giter Site home page Giter Site logo

Comments (4)

itkach avatar itkach commented on August 29, 2024

(Commented by itkach on Feb 11, 2009 at BitBucket)

http://tukaani.org/xz has been suggested as a possible implementation,
although it doesn't seem to have Python bindings. http://www.joachim-
bauch.de/projects/python/pylzma/
looks more promising. In any case,
benefits of using LZMA need to be explored further.

from tools.

itkach avatar itkach commented on August 29, 2024

(Commented by itkach on Feb 17, 2009 at BitBucket)

Initial evaluation didn't indicate any substantial improvements from using
LZMA compression. Compiled with LZMA , Simple English wiki 20081126 dump is
55Mb instead of 56 Mb, first volume of English Wikipedia 2337 is Mb instead of
2384 Mb - in both cases size is reduced only by ~ 2%. This is with pylzma
0.3. Decompression is also only marginally faster then bz2 - ~ 5% on
medium size articles (~15 Kb).

from tools.

itkach avatar itkach commented on August 29, 2024

(Commented by anonymous on Mar 12, 2009 at BitBucket)

aha, I'm dissapointed, are you using the default compression or you use -9
i.e. maximum compression? And do you also compress using pyhton or python is
used just to decompress in the reader?

also i have found another python implementation, which seems to support also
the new format xz called pyliblzma

https://launchpad.net/pyliblzma

from tools.

itkach avatar itkach commented on August 29, 2024

(Commented by itkach on Mar 16, 2009 at BitBucket)

pylzma was used both for compression and decompression, with default
compression parameters. I tried some variations, but defaults seemed to yield
best results.

I'll see if pyliblzma can do better. I wouldn't hold my breath though: each
article is compressed individually, so neither bzip2 nor lzma demonstrate the
same data compression ratios as with gigantic files. In fact, a significant
number of articles is just too short to benefit from any compression:
compressed text plus compression format headers is bigger than original
uncompressed text. LZMA compression not being part of Python standard library
is also a significant obstacle: adopting it would mean compiling and packaging
it for Windows and Maemo and possibly other platforms where it's not easy for
users to get or build binaries.

from tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.