Giter Site home page Giter Site logo

Comments (10)

marshallpierce avatar marshallpierce commented on May 30, 2024 1

I'll take a look, thanks for the report.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 30, 2024 1

I see. It's doable to add an error, certainly, but in your case you don't even need that to do it cheaply. This problem can only happen in an incomplete block of < 4 chars, which can only happen at the end, so I would just decode/encode/compare the last 1-3 chars which should be vanishingly small overhead assuming you're doing anything other than base64ing. :)

from rust-base64.

marshallpierce avatar marshallpierce commented on May 30, 2024 1

I don't mind the breaking change. I don't think there is much of this data out there, and if there is, people should probably know. People can stay on old versions if they don't like new stuff; the old ones aren't broken after all...

I'm happy to implement it. I should be able to get to it within the next week or so since it's a small feature, but if you'd like to tackle it, that's fine too.

I see your point about configurability, but it might well be the case that basically nobody has this invalid base64 in the wild. I'd be in favor of using strict validation always, with the possible future option to add configurability if people then emerge from the woodwork complaining that they need to extract data from their legacy invalid-but-decodable base64. After all, we don't lose anything by not implementing that config flag yet.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 30, 2024

Given the encoded input length of 43 base64 characters + 1 padding:

  • 10x base64 quads should decode to 10x decoded byte triples, so the complete quads result in 30 bytes of decoded output
  • 3 trailing base64 chars decode to 2 bytes
  • 32 byte length is correct

The last 3 input chars: "iYW", aka 0x69 0x59 0x57. Each input char provides 6 bits (hence 4 chars per 3 bytes: 4 * 6 = 3 * 8). Looking only at the trailing 3 chars, that provides 18 bits -- 2 more bits than are actually needed by the resulting 2 bytes = 16 bits. Or, in pictorial form (top line is bits from chars, bottom line is resulting byte layout):

111111222222333333
<byte 1><byte 2>

Note that the last 2 bits contributed by char 3 are ignored.

The last char 'W' maps to a value of 0x16, or 0b00010110. One of the last 2 bits (the ignored bits) is nonzero. The encoded values happen to be decoded okay because we ignore those bits, but it's certainly not canonical. For chars that decode to 0b000101xx, 'U' would be the natural choice (trailing bits 00), but V (01), W (10), or X (11) would technically "work".

The RFC covers this issue, and states that:

For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below.

And later on:

When fewer than 24 input bits are available in an input group, bits with value zero are added (on the right) to form an integral number of 6-bit groups.

So, this looks like base64 encoded by a broken encoder, and that online decoder is clearly incorrect since it's producing a multiple of 3 bytes from input that has padding!

I'm kind of curious; where did you get that base64?

from rust-base64.

AljoschaMeyer avatar AljoschaMeyer commented on May 30, 2024

I'm kind of curious; where did you get that base64?

From a fuzzer :P

I need to reliably detect invalid/non-canonical base64 input to a system. Would you be open to adding a mode that emits an error in these cases? Otherwise, my best bet seems to be to decode, re-encode and check for equality of reencoding and original encoding. But I suppose that is much less efficient (even when only checking the last byte(s)) than directly detecting and reporting this in the decoding stage.

from rust-base64.

AljoschaMeyer avatar AljoschaMeyer commented on May 30, 2024

I think it still makes sense to include this in the library. I'll need this in more than one place, so I'd have to fork/wrap the library (or create a really, really small crate). But the best place for emitting an error on invalid input is probably the library that decodes base64 and errors on invalid input - i.e. this crate.

Would you accept a pull-request that adds a boolean for rejecting noncanonical input to Config (but setting it to false for all provided configs to keep backwards-compatibility)?

from rust-base64.

marshallpierce avatar marshallpierce commented on May 30, 2024

I'll have to think about it a bit from an API perspective but while I think it's reasonable to barf on this type of invalid input I'm not sure how configurability would work for that -- the enum variant for it would have to exist regardless, so people would have to handle it from a type system perspective whether or not there was a boolean controlling whether it was ever returned. So, at the moment, I think it makes more sense to just always return that error. More config = more to think about = more complexity...

from rust-base64.

AljoschaMeyer avatar AljoschaMeyer commented on May 30, 2024

But won't that be a silent breaking change? Updating the dependency could result in previously-accepted data to get rejected. So there should probably be a major version bump in any case.

Do you want to implement this, or should I? I'll need longer to get into the code base, but it's my feature request so that's fine.

As for the boolean flag: Since the spec explicitly talks about both behaviors and that implementations can choose to adhere to either one, I'd lean towards making it configurable. I'm personally fine canonicity being enforced in all cases, but at some point someone will need to implement a system that has to accept noncanonical encodings for legacy reasons...

from rust-base64.

AljoschaMeyer avatar AljoschaMeyer commented on May 30, 2024

Ok, I looked at the code and I'm afraid of all the optimizations, I'd probably break something. So it would be great if you could implement that.

Oh, and thank you for putting the time into these detailed responses 💚

from rust-base64.

marshallpierce avatar marshallpierce commented on May 30, 2024

Yeah it's mildly terrifying in the decode pathway ;)

from rust-base64.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.