Comments (6)
This comment says there's documentation somewhere on how to write a simple inflate/deflate function: #25 (comment)
I think just linking to where that is from this section of the readme would help a lot: https://github.com/mirage/decompress#higher-interface
I can understand the difficulty of making tradeoffs here, but it think it's generally fine to just pick something "good enough" for a high level interface and in the documentation/comments explain to people why they might want to use a lower-level version. I can understanding leaving that for a higher level library though, although you might want to mention that people looking for a higher-level interface could try [insert library here] in the docs too.
Note: Based on what you said above, I ended up making my high-level interface look like:
let inflate_bigstring_to_bigbuffer
?(buffer_size_bytes = 10000)
?(expected_compression_ratio = 3)
input
=
let buffer = Bigstring.create buffer_size_bytes
and output = Bigbuffer.create (Bigstring.length input * expected_compression_ratio) in
let refill s = Bigstring.length s
and flush s n = Bigbuffer.add_bigstring output (Bigstring.sub_shared s ~pos:0 ~len:n) in
Gz.Higher.uncompress ~refill ~flush input buffer |> Result.map ~f:(fun _ -> output)
;;
let inflate_bigstring ?buffer_size_bytes ?expected_compression_ratio input =
inflate_bigstring_to_bigbuffer ?buffer_size_bytes ?expected_compression_ratio input
|> Result.map ~f:Bigbuffer.big_contents
;;
let inflate_string ?buffer_size_bytes ?expected_compression_ratio input =
Bigstring.of_string input
|> inflate_bigstring_to_bigbuffer ?buffer_size_bytes ?expected_compression_ratio
|> Result.map ~f:Bigbuffer.contents
;;
I think the biggest downside of this is that it uses Bigbuffer from Core_kernel, but that lets it be fairly efficient except for a copy at the end to get the contents.
from decompress.
Not sure if you want to go there, but it would also be useful to have an interface for deflating an async input, like:
val inflate_pipe : string Async.Pipe.t -> string Async.Pipe.t
from decompress.
Hi, thanks to try to use decompress
!
This question about an even simpler interface is well know (see #25). I agree with you that documentation is not the best and we definitely need to show a simple example like to_string
/of_string
as you said.
However, I would not like to propose such function in the package when responsibilities come. Indeed, the triptych memory/ratio/speed remains for any kind of inputs. For me, it's important to let the end-user to control such details.
- For example, an
inflate_string : string -> string
needs to arbitrary choose the length of the output buffer. However, we don't have any clue about the length. We should say that the usual ratio of the compression is 1/3 and allocate a buffer such asBigstringaf.create (String.length input * 3)
. But it's not true. Sometimes, the needed output buffer can be smaller than the input and sometimes it can be largely bigger (ratio of 1/5). - Such situation can be fixed by a use of a
Queue
as you did or a simpleBuffer
. We can use aropes
too. In anyway, several choices appears and none of them are perfects. A queue fits better for a fully unknowable input. A buffer can take the advantage of an hypothetical length, a ropes is the same advantage as the queue. However, for most of them, we delete the advantage of underlying usedBigarray
and we must do a copy betweenBigarray
/string
(or these data-structures are abstracted enough to directly manipulateBigarray
). - Again, the question about
string
/Bigarray
is not clear. we know that for the context of the inflation/deflation, it's fair to useBigarray
, but for upper dependencies. In some contexts, for small objects, it better to manipulatesstring
and use and re-use underlyingBigarray
(due to the cost of the allocation). In some other contexts, it's better to keepBigarray
to be able to do some others treatments (such as ahash
computing) and take the opportunity of the non-relocatable aspect of them. - Finally, may be you want to print out the output buffer. In that context, it's definitely better to allocate a large output buffer (independently the length of the input) and print out via a syscall.
So, several questions comes and none of them can perfectly fit into any uses. If we provide such function, we take the responsibility of its implementation - and, as I explained, such implementation leads several choices. We strictly knows that these choices can be really bad for some use-cases and to avoid any problems about that, we prefer to respect, at least, the interface provided by camlzip
.
The current Higher
module lets a full control of these details and even if most of them can be replaced by default values, we don't want to bring the user to a silly use of decompress
- and it's when we should provide a better documentation! It implies several hidden choices and we don't want to hide the magic too much, and we don't want to take the responsibility of them.
Another aspect is about time, decompress
is a full re-implementation of zlib
(and gzip
). By this way, we are more focus on algorithms/speed/memory consumption than a simpler interface. For such purpose, you can take a look on ezgzip
which wants to experiment with decompress
. I think, it's a better area if you want a simpler API.
And about async
, decompress
is a bedrock project of some others MirageOS-compatible projects. So we have a strict constraint about dependencies (no async
, no lwt
, no unix
).
I will try to improve documentation next week and if you want to participate, I will happy to introduce your incomes 👍.
from decompress.
and flush s n = Bigbuffer.add_bigstring output (Bigstring.sub_shared s ~pos:0 ~len:n) in
I'm not sure about sub_shared
. Concretely, s
is buffer
. decompress
never allocates temporary buffers and you should copy bytes from s
/buffer
into output
. I don't know the semantic of Bigbuffer.add_bigstring
(if it does the copy or not) but if it re-use s
, the output buffer will be wrong.
Then, I added a documentation, add examples and clear some aspects of decompress
, you can see it here: #116
from decompress.
Bigbuffer.add_bigstring
copies the input:
So I think what I was doing make sense (I'm trying to let the function do a copy of the substring without doing an additional copy).
Thanks for adding more documentation to this!
from decompress.
Thanks for the feedback 👍.
from decompress.
Related Issues (20)
- Explain `i_rem`
- Brotli support
- Compression support HOT 1
- ocamlformat uses one column about static int array HOT 2
- Provide the stream API on the C side
- decompress 1.3.0 tests fail to build due to bigarry pulling in dune HOT 2
- Exception raised when flushing buffers HOT 9
- lto1: fatal error: bytecode stream in file ‘~/.opam/default/lib/checkseum/c/libcheckseum_c_stubs.a’ generated with LTO version 9.2 instead of the expected 9.4 HOT 2
- Buffer overflow in De.Inf.Ns.inflate HOT 9
- Unable to build benchmark locally HOT 2
- BGZF support
- Fix for OCaml 4.08
- Find the data end without actual unpacking HOT 4
- Gzip support HOT 1
- bin/pipe.ml isn't fixed for the new API
- Index out of bounds in de.ml HOT 2
- Library "decompress" not found with dune.2.5.1 and Multicore OCaml 4.10.0 HOT 3
- Read 2 bytes per 2 bytes instead 1 and 1 byte. HOT 1
- Non full-stream interface HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from decompress.