Giter Site home page Giter Site logo

rbzip2's Introduction

RBzip2

RBzip2 is a gem providing various implementations of the bzip2 algorithm used for compression and decompression. Currently, it includes a FFI-based implementation and a pure Ruby implementation that's slower but works on any Ruby VM. Additionally, there's a JRuby specific implementation that's based on Commons Compress.

The pure Ruby implementations is based on the code of the Apache Commons Compress project and adds a straight Ruby-like API. There are no external dependencies like other gems or libraries. Therefore it will run on any Ruby implementation and the respective operating systems supported by those implementations.

The FFI implementation is using libbz2 and provides fast performance on platforms where both libbz2 and FFI are available. It is derived from this Gist by Brian Lopez.

The Java-based implementation can use the Commons Compress Java library if it is available in the classpath.

Features

  • Compression of raw data into bzip2 compressed IOs (like File or StringIO)
  • Decompression of bzip2 compressed IOs (like File or StringIO)

Usage

require 'rbzip2'

Compression

data = some_data
file = File.new 'somefile.bz2'      # open the target file
bz2  = RBzip2.default_adapter::Compressor.new file  # wrap the file into the compressor
bz2.write data                      # write the raw data to the compressor
bz2.close                           # finish compression (important!)

Decompression

file = File.new 'somefile.bz2'        # open a compressed file
bz2  = RBzip2.default_adapter::Decompressor.new file  # wrap the file into the decompressor
data = io.read                        # read data into a string

Future plans

  • Simple decompression of strings
  • Simple creation of compressed files
  • Two-way compressed IO that will (de)compress as you read/write

Installation

To install RBzip2 as a Ruby gem use the following command:

$ gem install rbzip2

To use it as a dependency managed by Bundler add the following to your Gemfile:

gem 'rbzip2'

If you want to use the FFI implementation on any non-JRuby VM, be sure to also install the ffi gem.

Performance

The bzip2-ruby gem is a Ruby binding to libbz2 and offers best performance, but it is only available for MRI < 2.0.0 and Rubinius.

The FFI implementation binds to libbz2 as well and has almost the same performance as bzip2-ruby.

The Java implementation uses a native Java library and is slower by a factor of about 2/10 while compressing/decompressing.

The pure Ruby implementation of RBzip2 is inherently slower than bzip2-ruby. Currently, this is a plain port of Apache Commons' Java code to Ruby and no effort has been made to optimize it. That's why the Ruby implementation of RBzip2 is slower by a factor of about 130/100 while compressing/decompressing (on Ruby 1.9.3). Ruby 1.8.7 is even slower.

License

This code is free software; you can redistribute it and/or modify it under the terms of the new BSD License. A copy of this license can be found in the included LICENSE file.

Credits

  • Sebastian Staudt -- koraktor(at)gmail.com
  • Brian Lopez -- seniorlopez(at)gmail.com

See Also

rbzip2's People

Contributors

dmorrill10 avatar koraktor avatar rykov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rbzip2's Issues

New version release?

The most recently released version of this gem on Rubygems is 0.2.0. I'd like to use this gem as a dependency in BrickAndMortar but I can't get 0.2.0 to work. The current commit, f8ab1ea, seems to work just fine for me (decompression only).

Is someone working on making a new release soon, or could I help to accomplish this?

If you're concerned about the gem not being stable enough yet, how about a release candidate version like 0.3.0-rc1?

FFI backend doesn't work on Windows

BZip2's public API functions use the stdcall calling convention on Windows, but rbzip2 tries to load them using the default cdecl convention. This leads to the functions not being found by FFI:

C:/Ruby225/lib/ruby/gems/2.2.0/gems/ffi-1.9.18-x86-mingw32/lib/ffi/library.rb:275:in `attach_function': Function 'BZ2_bzRead' not found in [bz2] (FFI::NotFoundError)
        from C:/Ruby225/lib/ruby/gems/2.2.0/gems/rbzip2-0.3.0/lib/rbzip2/ffi/decompressor.rb:20:in `<class:Decompressor>'
        from C:/Ruby225/lib/ruby/gems/2.2.0/gems/rbzip2-0.3.0/lib/rbzip2/ffi/decompressor.rb:7:in `<top (required)>'
        from C:/my_script.rb:45:in `<main>'

Very bad performance of JRuby 1.7.4

I am trying to use this gem under JRuby 1.7.4 for decompressing file compressed in bzip2.

The size of compressed file is 422Kb. The decompressed file have 7.9M. Decompression by bzip2 in Linux take less than 1s.

When I am using the gem, the part of code RBzip2::Decompressor.new file take around 30s and the read part take more than 30mins ( yes minutes ). During the decompression Jruby process is taking 100% of one core on my PC. I am running it on Fedora 16. Java version 1.7.0.

gets method not implemented

I am switching from MRI to JRuby and as such, I was replacing bzip2-ruby with rbzip2. Unfortunately this didn't work out because I am passing the resulting IO object to fastest-csv, which uses gets. bzip2-ruby implements this but rbzip2 doesn't. It's possibly less than trivial to implement and I may not have time right now.

RBZip2 never finishes on large file

Even for a moderately large file, RBzip2::Decompressor.new never returns, because it seems to sit endlessly in get_and_move_to_front_decode.

Code snippet that reproduces the problem (on linux)

#!/usr/bin/env ruby

`dd if=/dev/urandom of=output.dat bs=1M count=10`
`tar cjvf bigTar.tar.bz2 output.dat`
require 'rbzip2'
dc = RBzip2::Decompressor.new(File.new('bigTar.tar.bz2'))

Calling #write more than once is ignored

When calling the write method more than once is ignoring new data:

def bz2_read_file(file_name)
  File.open(file_name) do |file|
    io   = RBzip2.default_adapter::Decompressor.new(file)
    data = io.read
    io.close
    data
  end
end

File.open("test3.bz2", "wb") do |file|
  writer  = RBzip2.default_adapter::Compressor.new(file)
  writer.write("line1\n")
  writer.write("line2\n")
  writer.write("line3\n")
  writer.close
end

bz2_read_file("test3.bz2")

Only the first line is returned from the output file:

"line1\n"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.