Comments (10)
I'll take a look, thanks for the report.
from rust-base64.
I see. It's doable to add an error, certainly, but in your case you don't even need that to do it cheaply. This problem can only happen in an incomplete block of < 4 chars, which can only happen at the end, so I would just decode/encode/compare the last 1-3 chars which should be vanishingly small overhead assuming you're doing anything other than base64ing. :)
from rust-base64.
I don't mind the breaking change. I don't think there is much of this data out there, and if there is, people should probably know. People can stay on old versions if they don't like new stuff; the old ones aren't broken after all...
I'm happy to implement it. I should be able to get to it within the next week or so since it's a small feature, but if you'd like to tackle it, that's fine too.
I see your point about configurability, but it might well be the case that basically nobody has this invalid base64 in the wild. I'd be in favor of using strict validation always, with the possible future option to add configurability if people then emerge from the woodwork complaining that they need to extract data from their legacy invalid-but-decodable base64. After all, we don't lose anything by not implementing that config flag yet.
from rust-base64.
Given the encoded input length of 43 base64 characters + 1 padding:
- 10x base64 quads should decode to 10x decoded byte triples, so the complete quads result in 30 bytes of decoded output
- 3 trailing base64 chars decode to 2 bytes
- 32 byte length is correct
The last 3 input chars: "iYW", aka 0x69 0x59 0x57. Each input char provides 6 bits (hence 4 chars per 3 bytes: 4 * 6 = 3 * 8). Looking only at the trailing 3 chars, that provides 18 bits -- 2 more bits than are actually needed by the resulting 2 bytes = 16 bits. Or, in pictorial form (top line is bits from chars, bottom line is resulting byte layout):
111111222222333333
<byte 1><byte 2>
Note that the last 2 bits contributed by char 3 are ignored.
The last char 'W' maps to a value of 0x16, or 0b00010110. One of the last 2 bits (the ignored bits) is nonzero. The encoded values happen to be decoded okay because we ignore those bits, but it's certainly not canonical. For chars that decode to 0b000101xx, 'U' would be the natural choice (trailing bits 00), but V (01), W (10), or X (11) would technically "work".
The RFC covers this issue, and states that:
For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below.
And later on:
When fewer than 24 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 6-bit groups.
So, this looks like base64 encoded by a broken encoder, and that online decoder is clearly incorrect since it's producing a multiple of 3 bytes from input that has padding!
I'm kind of curious; where did you get that base64?
from rust-base64.
I'm kind of curious; where did you get that base64?
From a fuzzer :P
I need to reliably detect invalid/non-canonical base64 input to a system. Would you be open to adding a mode that emits an error in these cases? Otherwise, my best bet seems to be to decode, re-encode and check for equality of reencoding and original encoding. But I suppose that is much less efficient (even when only checking the last byte(s)) than directly detecting and reporting this in the decoding stage.
from rust-base64.
I think it still makes sense to include this in the library. I'll need this in more than one place, so I'd have to fork/wrap the library (or create a really, really small crate). But the best place for emitting an error on invalid input is probably the library that decodes base64 and errors on invalid input - i.e. this crate.
Would you accept a pull-request that adds a boolean for rejecting noncanonical input to Config
(but setting it to false
for all provided configs to keep backwards-compatibility)?
from rust-base64.
I'll have to think about it a bit from an API perspective but while I think it's reasonable to barf on this type of invalid input I'm not sure how configurability would work for that -- the enum variant for it would have to exist regardless, so people would have to handle it from a type system perspective whether or not there was a boolean controlling whether it was ever returned. So, at the moment, I think it makes more sense to just always return that error. More config = more to think about = more complexity...
from rust-base64.
But won't that be a silent breaking change? Updating the dependency could result in previously-accepted data to get rejected. So there should probably be a major version bump in any case.
Do you want to implement this, or should I? I'll need longer to get into the code base, but it's my feature request so that's fine.
As for the boolean flag: Since the spec explicitly talks about both behaviors and that implementations can choose to adhere to either one, I'd lean towards making it configurable. I'm personally fine canonicity being enforced in all cases, but at some point someone will need to implement a system that has to accept noncanonical encodings for legacy reasons...
from rust-base64.
Ok, I looked at the code and I'm afraid of all the optimizations, I'd probably break something. So it would be great if you could implement that.
Oh, and thank you for putting the time into these detailed responses 💚
from rust-base64.
Yeah it's mildly terrifying in the decode pathway ;)
from rust-base64.
Related Issues (20)
- Restore base64::{encode, decode} functions HOT 11
- Thank you
- How do I change the padding character? HOT 1
- Using this crate easier HOT 2
- DecoderReader accepts incorrect input HOT 2
- Design choices HOT 5
- How come I can't decode this string? HOT 1
- `DecoderReader` probably should accept `BufRead` instead of `Read` HOT 1
- make Alphabet::from_str_unchecked public HOT 3
- Replacement for base64::decode()? HOT 12
- How to generate the base64 format like openssl command HOT 3
- `DecoderReader` does not respect `with_decode_allow_trailing_bits` HOT 4
- how to encode image bytes to string? HOT 1
- GeneralPurpose engine should implement Clone and Debug HOT 6
- Make `Alphabet::from_str_unchecked` `pub const unsafe` HOT 5
- Calling `EncoderStringWriter::write` successively does not equal `EncoderStringWriter::write_all` HOT 2
- Make `encoded_len` const HOT 3
- Add encode_vec() HOT 4
- Question: best way to access inner field values HOT 5
- SIMD support? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-base64.