Giter Site home page Giter Site logo

Comments (26)

mitsuhiko avatar mitsuhiko commented on May 29, 2024 2

Why can this not be an optional mode that is only executed if people need this? Seems silly to force everybody to do this on their side instead of by this crate.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024 2

base64 with embedded whitespace is very common.

So I've heard. I have a passing familiarity with base64, as it happens.

If this crate is not going to provide it, sooner or later someone is going to make a crate for it.

OK, sounds good. You can go make your unsupported assertions about efficiency on that project instead. No doubt such a crate will support many different flavors of whitespace filtering, and JSON because JWT uses base64 with JSON, and maybe let's throw in some message digest implementations too since base64 often represents a hash, because you wouldn't want to have users need multiple different libraries when it could all be jammed into one.

Instead, this crate will continue to avoid such layering violations and focus on providing fast, correct base64 that avoids overheads or API complexity that would render it unsuitable for broad applicability. Maybe someday there will be whitespace filtering again, but not until many other things are finished first.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024 1

It's here: https://github.com/marshallpierce/rust-base64/blob/master/src/decode.rs#L214

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024 1

It's not that the last byte is the only one that could be invalid, it's that it's the only invalid byte that is potentially interesting to report on as a side effect of detecting invalid length.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

Stripping whitespace (or null bytes or special chars or whatever else someone wants to remove) is quite slow compared to how fast we can decode valid base64. Since such things typically only occur in workloads where performance is not a concern (like parsing small text like private keys), we favor being fast over silently filtering out whitespace or other invalid base64.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

p.s. this will remove whitespace: filter(|b| !b" \n\t\r\x0b\x0c".contains(b)

from rust-base64.

mitsuhiko avatar mitsuhiko commented on May 29, 2024

I have noticed that an older version of base64 supported that so I am using that for now.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024
  • It's generally for callers to do whatever filtering they want, whitespace or otherwise, and this crate can't possibly handle every conceivable thing that the caller might want to filter
  • It complicates the API and implementation of this crate.
  • It's not well aligned with the goals of this project (performance, and eventually full no_std support).

If you think using an old unsupported version of this crate is better than adding 1 line of code to do the filtering that your particular use case for your particular text format needs, then I don't know what else to say to you than good luck.

from rust-base64.

mitsuhiko avatar mitsuhiko commented on May 29, 2024

base64 with embedded whitespace is very common. If this crate is not going to provide it, sooner or later someone is going to make a crate for it. With the current version you need to modify the value before hand and collect it into a buffer again which is unnecessarily wasteful.

I'm honestly not really sure why whitespace support had to be removed or what this has to do with no_std support or performance. All of this can be an optional flag. The problem I have right now is that it seems from the comments above that even patches for this would not be accepted.

from rust-base64.

dkasak avatar dkasak commented on May 29, 2024

Would it perhaps make sense to at least special case a trailing \n here?

As it stands now, passing a base64 payload with a trailing \n -- a very common situation when reading a payload from a text file -- will return InvalidLength which is rather uninformative. This is an especially confusing case because:

  • most common base64 parsers will report nothing wrong (including the base64 utility and Python's base64)
  • the trailing newline is invisible and added by common text editors
  • it's much less obvious than the payload being hard-wrapped into separate lines

This should either fail more informatively so it's easier to know what's wrong or ignore the trailing newline, IMO.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

While I sympathize with that confusing debugging, it is an invalid length if it has stray whitespace or anything else at the end.

Trailing newlines are only relevant in the case where base64 is contained in a text file. Swallowing newlines, or any other such thing, would not be suitable for a general purpose solution. For base64 in url parameters, or csv, or some weird newline delimited format etc, it would be entirely inappropriate for logic dedicated to base64 to have made the arbitrary decision that \n is special because some other way of holding base64 deemed it convenient.

"Python does it" is, to me, a signal of a poor API design more than the inverse ;) If this is a common use case for you, I suggest making a wrapper function that strips away bytes that are common yet problematic in your situation.

from rust-base64.

dkasak avatar dkasak commented on May 29, 2024

I actually agree with your reasoning, but isn't that an even stronger argument that the error is unnecessarily poor or perhaps even flat out wrong? Given that a newline is not a valid base64 character, shouldn't InvalidByte(i, b) be returned instead?

Also, I meant to post this under #107, but I see now that I got the issues mixed up.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

It certainly could be InvalidByte, but length is checked first, as it does not require scanning (much less decoding) all the input. Also, in this case it's arguably more of an invalid padding situation -- assuming the input is using padding, which we don't know until we scan it.

from rust-base64.

dkasak avatar dkasak commented on May 29, 2024

If at all possible, I suggest that we should handle this case as an InvalidByte due to usability concerns. It would just make this common case drastically simpler to debug.

Out of curiosity, how exactly does the length check (the one that is failing) work? Could you point me to the relevant code?

from rust-base64.

untitaker avatar untitaker commented on May 29, 2024

@marshallpierce would you accept a PR that gives you a better error message if the stirng contains whitespace, possibly in debug mode only? This just cost me a lot of time writing a simple CLI tool that reads base64 from stdin

I think that would be a fair compromise for perf, the plan would be to do some expensive stuff on the error path to figure out if the string contains whitespace, again possibly in debug mode only

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

I don't think special-casing whitespace is appropriate for a library like this; see above. (Fun thought experiment: what if someone makes a character set for base64 that includes whitespace?)

What I can do (and will do right now) is add documentation to InvalidLength explaining that a common cause of this is whitespace or other framing bytes.

from rust-base64.

untitaker avatar untitaker commented on May 29, 2024

Ok, like right in the error message? That does sound equivalently useful, yeah. I was just thinking of doing a check for whitespace right before the error is propagated to not have a "maybe" in the msg.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

But then why is whitespace special? What if there's a trailing #, and that's the separator for some other format?

from rust-base64.

untitaker avatar untitaker commented on May 29, 2024

So in my case it was a trailing newline on "pure" base64, such stuff is barely visible when reading stdout from another CLI tool.

I think the error message update or doc update is fine.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

#143 on the way.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

yeah, the overloading of newlines is a mess :( A shame we didn't start with some sort of structured way of dealing with input in shells 50 years ago.

from rust-base64.

dkasak avatar dkasak commented on May 29, 2024

But then why is whitespace special?

Whitespace is special because it's invisible (and often added automatically to the end of file).

What if there's a trailing #, and that's the separator for some other format?

I think that in all of these cases, including whitespace, an InvalidByte error should be returned (ideally with a line and column location of the offending byte). None of these bytes, including \n and #, are valid base64 characters so by the principle of least surprise, they should be called out as such by the error, IMO.

from rust-base64.

untitaker avatar untitaker commented on May 29, 2024

I'd agree... I don't know if this is worth the work but we could catch InvalidLength, scan the input string for invalid bytes and change the errortype based on that. Then whitespace is also no longer special.

I am not sure if performance of the error path is a concern, up to marshall

If this sounds not quite like overkill I can make a pr on the weekend

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

I don't think it's needed to scan the entire input, as that would only produce the same InvalidByte that would normally occur. Only the last byte is interesting, as invalid length occurs when total length % 8 = 1 or 5. I agree in principle that in the case that would currently be InvalidLength, we could look at the last byte and do something.

Is the proposal that if the last byte is not a member of the current character set, then it exposed as InvalidByte, and InvalidLength if it is not?

from rust-base64.

untitaker avatar untitaker commented on May 29, 2024

I don't really see how it could only be the last byte that is invalid but since it's the majority case anyway I think it works for the main usecase. I am not too familiar with the code, I'll just see what I can do tomorrow. But it would be something like that, yeah.

from rust-base64.

marshallpierce avatar marshallpierce commented on May 29, 2024

Released in v0.13.0.

from rust-base64.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.