Comments (26)
Why can this not be an optional mode that is only executed if people need this? Seems silly to force everybody to do this on their side instead of by this crate.
from rust-base64.
base64 with embedded whitespace is very common.
So I've heard. I have a passing familiarity with base64, as it happens.
If this crate is not going to provide it, sooner or later someone is going to make a crate for it.
OK, sounds good. You can go make your unsupported assertions about efficiency on that project instead. No doubt such a crate will support many different flavors of whitespace filtering, and JSON because JWT uses base64 with JSON, and maybe let's throw in some message digest implementations too since base64 often represents a hash, because you wouldn't want to have users need multiple different libraries when it could all be jammed into one.
Instead, this crate will continue to avoid such layering violations and focus on providing fast, correct base64 that avoids overheads or API complexity that would render it unsuitable for broad applicability. Maybe someday there will be whitespace filtering again, but not until many other things are finished first.
from rust-base64.
It's here: https://github.com/marshallpierce/rust-base64/blob/master/src/decode.rs#L214
from rust-base64.
It's not that the last byte is the only one that could be invalid, it's that it's the only invalid byte that is potentially interesting to report on as a side effect of detecting invalid length.
from rust-base64.
Stripping whitespace (or null bytes or special chars or whatever else someone wants to remove) is quite slow compared to how fast we can decode valid base64. Since such things typically only occur in workloads where performance is not a concern (like parsing small text like private keys), we favor being fast over silently filtering out whitespace or other invalid base64.
from rust-base64.
p.s. this will remove whitespace: filter(|b| !b" \n\t\r\x0b\x0c".contains(b)
from rust-base64.
I have noticed that an older version of base64 supported that so I am using that for now.
from rust-base64.
- It's generally for callers to do whatever filtering they want, whitespace or otherwise, and this crate can't possibly handle every conceivable thing that the caller might want to filter
- It complicates the API and implementation of this crate.
- It's not well aligned with the goals of this project (performance, and eventually full no_std support).
If you think using an old unsupported version of this crate is better than adding 1 line of code to do the filtering that your particular use case for your particular text format needs, then I don't know what else to say to you than good luck.
from rust-base64.
base64 with embedded whitespace is very common. If this crate is not going to provide it, sooner or later someone is going to make a crate for it. With the current version you need to modify the value before hand and collect it into a buffer again which is unnecessarily wasteful.
I'm honestly not really sure why whitespace support had to be removed or what this has to do with no_std support or performance. All of this can be an optional flag. The problem I have right now is that it seems from the comments above that even patches for this would not be accepted.
from rust-base64.
Would it perhaps make sense to at least special case a trailing \n
here?
As it stands now, passing a base64 payload with a trailing \n
-- a very common situation when reading a payload from a text file -- will return InvalidLength
which is rather uninformative. This is an especially confusing case because:
- most common base64 parsers will report nothing wrong (including the
base64
utility and Python'sbase64
) - the trailing newline is invisible and added by common text editors
- it's much less obvious than the payload being hard-wrapped into separate lines
This should either fail more informatively so it's easier to know what's wrong or ignore the trailing newline, IMO.
from rust-base64.
While I sympathize with that confusing debugging, it is an invalid length if it has stray whitespace or anything else at the end.
Trailing newlines are only relevant in the case where base64 is contained in a text file. Swallowing newlines, or any other such thing, would not be suitable for a general purpose solution. For base64 in url parameters, or csv, or some weird newline delimited format etc, it would be entirely inappropriate for logic dedicated to base64 to have made the arbitrary decision that \n is special because some other way of holding base64 deemed it convenient.
"Python does it" is, to me, a signal of a poor API design more than the inverse ;) If this is a common use case for you, I suggest making a wrapper function that strips away bytes that are common yet problematic in your situation.
from rust-base64.
I actually agree with your reasoning, but isn't that an even stronger argument that the error is unnecessarily poor or perhaps even flat out wrong? Given that a newline is not a valid base64 character, shouldn't InvalidByte(i, b)
be returned instead?
Also, I meant to post this under #107, but I see now that I got the issues mixed up.
from rust-base64.
It certainly could be InvalidByte
, but length is checked first, as it does not require scanning (much less decoding) all the input. Also, in this case it's arguably more of an invalid padding situation -- assuming the input is using padding, which we don't know until we scan it.
from rust-base64.
If at all possible, I suggest that we should handle this case as an InvalidByte
due to usability concerns. It would just make this common case drastically simpler to debug.
Out of curiosity, how exactly does the length check (the one that is failing) work? Could you point me to the relevant code?
from rust-base64.
@marshallpierce would you accept a PR that gives you a better error message if the stirng contains whitespace, possibly in debug mode only? This just cost me a lot of time writing a simple CLI tool that reads base64 from stdin
I think that would be a fair compromise for perf, the plan would be to do some expensive stuff on the error path to figure out if the string contains whitespace, again possibly in debug mode only
from rust-base64.
I don't think special-casing whitespace is appropriate for a library like this; see above. (Fun thought experiment: what if someone makes a character set for base64 that includes whitespace?)
What I can do (and will do right now) is add documentation to InvalidLength explaining that a common cause of this is whitespace or other framing bytes.
from rust-base64.
Ok, like right in the error message? That does sound equivalently useful, yeah. I was just thinking of doing a check for whitespace right before the error is propagated to not have a "maybe" in the msg.
from rust-base64.
But then why is whitespace special? What if there's a trailing #
, and that's the separator for some other format?
from rust-base64.
So in my case it was a trailing newline on "pure" base64, such stuff is barely visible when reading stdout from another CLI tool.
I think the error message update or doc update is fine.
from rust-base64.
#143 on the way.
from rust-base64.
yeah, the overloading of newlines is a mess :( A shame we didn't start with some sort of structured way of dealing with input in shells 50 years ago.
from rust-base64.
But then why is whitespace special?
Whitespace is special because it's invisible (and often added automatically to the end of file).
What if there's a trailing #, and that's the separator for some other format?
I think that in all of these cases, including whitespace, an InvalidByte
error should be returned (ideally with a line and column location of the offending byte). None of these bytes, including \n
and #
, are valid base64 characters so by the principle of least surprise, they should be called out as such by the error, IMO.
from rust-base64.
I'd agree... I don't know if this is worth the work but we could catch InvalidLength, scan the input string for invalid bytes and change the errortype based on that. Then whitespace is also no longer special.
I am not sure if performance of the error path is a concern, up to marshall
If this sounds not quite like overkill I can make a pr on the weekend
from rust-base64.
I don't think it's needed to scan the entire input, as that would only produce the same InvalidByte that would normally occur. Only the last byte is interesting, as invalid length occurs when total length % 8 = 1 or 5. I agree in principle that in the case that would currently be InvalidLength, we could look at the last byte and do something.
Is the proposal that if the last byte is not a member of the current character set, then it exposed as InvalidByte
, and InvalidLength
if it is not?
from rust-base64.
I don't really see how it could only be the last byte that is invalid but since it's the majority case anyway I think it works for the main usecase. I am not too familiar with the code, I'll just see what I can do tomorrow. But it would be something like that, yeah.
from rust-base64.
Released in v0.13.0.
from rust-base64.
Related Issues (20)
- Restore base64::{encode, decode} functions HOT 11
- Thank you
- How do I change the padding character? HOT 1
- Using this crate easier HOT 2
- DecoderReader accepts incorrect input HOT 2
- Design choices HOT 5
- How come I can't decode this string? HOT 1
- `DecoderReader` probably should accept `BufRead` instead of `Read` HOT 1
- make Alphabet::from_str_unchecked public HOT 3
- Replacement for base64::decode()? HOT 12
- How to generate the base64 format like openssl command HOT 3
- `DecoderReader` does not respect `with_decode_allow_trailing_bits` HOT 4
- how to encode image bytes to string? HOT 1
- GeneralPurpose engine should implement Clone and Debug HOT 6
- Make `Alphabet::from_str_unchecked` `pub const unsafe` HOT 5
- Calling `EncoderStringWriter::write` successively does not equal `EncoderStringWriter::write_all` HOT 2
- Make `encoded_len` const HOT 3
- Add encode_vec() HOT 4
- Question: best way to access inner field values HOT 5
- SIMD support? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-base64.