Comments (15)
Could you give a little more detail on this issue as well? Some code examples or file samples would go a long way for me :)
from flate2-rs.
The following file is what I'm using to parse a git packfile (Any old git packfile should do)
use std::old_io::ByRefReader;
use std::num::FromPrimitive;
use flate2::reader::ZlibDecoder;
#[derive(FromPrimitive, Copy)]
enum GitObjectType {
Commit = 1,
Tree = 2,
Blob = 3,
Tag = 4,
OfsDelta = 6,
RefDelta = 7,
}
pub fn build_index(raw: Vec<u8>) {
let mut data = &*raw;
if data.read_exact(4).unwrap() != b"PACK" {
panic!("Not a valid packfile");
}
let version = data.read_be_u32().unwrap();
let objects = data.read_be_u32().unwrap();
println!("Packfile version: {}", version);
println!("Objects in packfile: {}", objects);
while let Ok(mut c) = data.read_u8() {
let obj_type: GitObjectType = FromPrimitive::from_u8((c >> 4) & 7).unwrap();
let mut obj_size = (c & 0xf) as u64;
let mut shift = 4;
while c & 0x80 != 0 {
c = data.read_u8().unwrap();
obj_size += (c as u64 & 0x7f) << shift;
shift += 7;
}
print!("Object type {} size {}", obj_type as u8, obj_size);
println!(" data:\n{}",
String::from_utf8(
ZlibDecoder::new(ByRefReader::by_ref(&mut data)).read_exact(obj_size as usize).unwrap()
).unwrap()
);
}
}
Note that despite requesting read_exact
, because the default buffer size is 128 * 1024
it will eat that many bytes from the input stream whether you told it to or not.
from flate2-rs.
Unfortunately I don't think there's really much that can be done about this. To read only the precisely correct amount of data from the underlying stream would either require feedback from miniz.c (not implemented) or reading only one byte at a time (not exactly efficient).
I think your best option here would be to use the new take
adapter (or the old LimitReader
) with an explicit size if you know it ahead of time.
from flate2-rs.
Well that's the problem - with git pack files you only know the decompressed size ahead of time, not the input size
from flate2-rs.
In this case you would want to structure your code like:
let data = {
let mut s = String::new();
ZlibDecoder::new(data.by_ref().take(obj_size)).read_to_string(&mut s).unwrap();
s
};
The call to take
will ensure that the ZlibDecoder
can't read more than obj_size
bytes.
from flate2-rs.
Again, I know the output size not the input size.
That example will take the output size (Too many) and decompress, after which there's no way to determine the offset to continue parsing the packfile from. (This is less an issue with flate2 than with the git packfile format)
feedback from miniz.c (not implemented) or reading only one byte at a time (not exactly efficient).
I'd like to request that feedback as a feature
from flate2-rs.
Ah sorry, I think I have indeed misunderstood.
from flate2-rs.
Oh one thing I just remembered is that you may want to try to use the flate2::write
interfaces instead of the flate2::read
interfaces. That should enable you to call write
and the stream should not actually consume any bytes beyond the end (telling you precisely where the end is).
from flate2-rs.
The main problem is that there is no feedback on how many bytes the decoder really needed from the underlying stream. So there is no possibility to rewind the reader if needed.
Ideally one would have an interface like this, then one could manage the buffering at the consumer side:
impl Decoder {
// option 1
fn decode<'a>(&'a mut self, input: &[u8]) -> io::Result<(usize, &'a [u8])> {
...
Ok(consumed, &self.internal_buffer)
}
// option 2
fn decode(&mut self, input: &[u8], output: &mut &mut [u8]) -> io::Result<(usize)> {
...
Ok(consumed)
}
}
Especially the second corresponds to the API already exposed by miniz (next_in, next_out, etc…).
from flate2-rs.
I tried to experiement myself, the second variant (basic C pattern) is not expressible in safe Rust because you end up in some unbreakable borrow-cycle.
fn inflate<'a>(&mut self, input: &[u8], output: &'a mut [u8], flush: Flush) -> io::Result<(usize, &'a mut [u8])> ;
will do it.
from flate2-rs.
@alexcrichton are you interested in including such an interface into flate2? If so I can clean up a little bit and file a PR for this: https://github.com/nwin/png/blob/master/src/deflate.rs
from flate2-rs.
Yeah I'd be totally fine exposing the raw miniz interface which doesn't go through Read
or Write
at all, it's actually basically already done! I'm not super happy with the API (which is why it's not public yet), but I've taken a similar route in the bzip-rs crate.
from flate2-rs.
My concern with your interfaces is, that you have do to all the bookkeeping about how much has been written yourself (by monitoring the change in total_in/out
after every function call)
I would either return this information
fn decompress(&mut self, input: &[u8], output: &mut [u8], flush: Flush) -> Result<(usize, usize)>;
update the slices
fn decompress<'a, 'b>(&mut self, input: &'a mut &'a[u8], output: &'b mut &'b mut [u8], flush: Flush) -> Result<()>;
or use a hybrid approach
fn decompress<'a>(&mut self, input: &[u8], output: &'a mut [u8], flush: Flush) -> Result<(usize, &'a mut [u8])>;
Probably the second approach is the best, but I’m afraid that it will cause most troubles with the borrow checker. The tuple in the first one is a bit ambiguous.
from flate2-rs.
I'd probably be in favor of just exposing the raw underlying interface as I also found it difficult to write a nice safe API on top, but I will admit I haven't given it too inordinate amounts of though :)
from flate2-rs.
Ok, I've finally gotten around to adding raw bindings to the in-memory streams which should provide a much greater level of control over things like buffering, and feel free to let me know about any API pain points!
from flate2-rs.
Related Issues (20)
- Cannot compile when `default-features` are disabled (`false`) HOT 1
- can't find crate for `flate2`
- support 'deflate64'? HOT 6
- `GzDecoder` eager reading in the constructor blocks IO HOT 9
- Error on compiling flate2 on rust 1.57.0 HOT 2
- flate2::bufread::GzDecoder doesn't impl BufRead? HOT 3
- unsafe review: Potential (not actual) dangling pointers after inflate/deflate HOT 2
- total_in(&self) / total_out(&self) implementation for GzDecoder / GzEncoder / MultiGzDecoder HOT 2
- Issues with newly created file in read-write mode HOT 7
- Implement BufRead/Write for en/decoders alongside Read/Write
- rapidgzip
- Zlib succes while miniz_oxide fails HOT 5
- Testing validity of the data without the actual decompression
- Tree borrows violation occurs when using zlib backend HOT 5
- Some compressed files can only read a portion of the lines using GzDecoder. HOT 3
- question: Slowdown after upgrading from 1.0.26 to 1.0.28 HOT 8
- Using MultiGzDecoder for file with garbage after gzip data HOT 3
- Decoding a zip file returns the Error "corrupt deflate stream" HOT 2
- why GzDecoder can't read stream correct HOT 1
- Continue reading a stream after ZlibDecoder streams finishes HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flate2-rs.