marshallpierce / rust-base64 Goto Github PK
View Code? Open in Web Editor NEWbase64, in rust
License: Apache License 2.0
base64, in rust
License: Apache License 2.0
While discussing the recent buffer overflow in this code, @sneves pointed me to the following line:
https://github.com/alicemaz/rust-base64/blob/master/src/lib.rs#L391
Which has a potential overflow for large inputs, when calculating the size of a buffer to allocate. Depending on further usage of that buffer, this could range from "unnecessarily failing" (as the /4 later ensures the result would have been smaller than the input, barring the overflow) to "potentially catastrophic".
Currently the Config is a struct that contains a CharacterSet and a boolean for padding. CharacterSet is an enum containing the 3 supported alphabets.
I've been toying around with the idea of changing the layout of the Config. The general idea is to make each supported alphabet a distinct type. Each type would then implement some traits to support the encoding/decoding algorithms. I believe with proper inlining this would allow the compiler to eliminate some branches and optimize the code better. I've sketched out a rough prototype of this and the initial results seem to align with expectations.
Encoding benchmarks:
bench_small_input/encode_slice/3
time: [8.1793 ns 8.2160 ns 8.2604 ns]
thrpt: [346.36 MiB/s 348.23 MiB/s 349.79 MiB/s]
change:
time: [-30.505% -29.744% -28.870%] (p = 0.00 < 0.05)
thrpt: [+40.588% +42.337% +43.895%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
bench_small_input/encode_slice/50
time: [37.009 ns 37.237 ns 37.508 ns]
thrpt: [1.2415 GiB/s 1.2505 GiB/s 1.2582 GiB/s]
change:
time: [-13.650% -12.534% -11.558%] (p = 0.00 < 0.05)
thrpt: [+13.068% +14.330% +15.808%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
bench_small_input/encode_slice/100
time: [60.019 ns 60.255 ns 60.534 ns]
thrpt: [1.5385 GiB/s 1.5456 GiB/s 1.5517 GiB/s]
change:
time: [-7.1787% -6.7605% -6.3371%] (p = 0.00 < 0.05)
thrpt: [+6.7658% +7.2507% +7.7339%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
bench_small_input/encode_slice/500
time: [280.64 ns 281.54 ns 282.68 ns]
thrpt: [1.6473 GiB/s 1.6540 GiB/s 1.6593 GiB/s]
change:
time: [-5.6106% -5.0493% -4.4584%] (p = 0.00 < 0.05)
thrpt: [+4.6665% +5.3179% +5.9440%]
Performance has improved.
decoding and larger sizes also improve but by much more marginal amounts ~2%. I'm happy to polish it and submit it as a pull request if you're interested, but seeing as it's a pretty substantial change I figured I would open an issue for discussion first.
https://github.com/rust-fuzz/cargo-fuzz
Seems like a nice way to stress test encoding and decoding random data. We have some unit tests that do random data, but it'd be nice to just leave this running for, say, an hour...
I'm going to bet that this is an upstream problem -- pretty sure you ARE actually using safemem
. But cargo +beta doc
doesn't think so in the newly-cut beta. :(
~/…/bug-reports/rust-base64 master$ rustc --version
rustc 1.28.0-beta.2 (5981061e5 2018-06-22)
~/…/bug-reports/rust-base64 master$ cargo doc
Updating registry `https://github.com/rust-lang/crates.io-index`
Downloading safemem v0.3.0
Checking byteorder v1.2.3
Checking safemem v0.3.0
Documenting safemem v0.3.0
Documenting byteorder v1.2.3
Documenting base64 v0.9.2 (file:///C:/Users/egubler/workspace/personal/bug-reports/rust-base64)
error: unused extern crate
--> src\line_wrap.rs:1:1
|
1 | extern crate safemem;
| ^^^^^^^^^^^^^^^^^^^^^ help: remove it
|
note: lint level defined here
--> src\lib.rs:59:57
|
59| missing_docs, trivial_casts, trivial_numeric_casts, unused_extern_crates, unused_import_braces,
| ^^^^^^^^^^^^^^^^^^^^
error: Compilation failed, aborting rustdoc
error: Could not document `base64`.
Caused by:
process didn't exit successfully: `rustdoc --crate-name base64 src\lib.rs -o C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\doc -L dependency=C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\debug\deps --extern byteorder=C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\debug\deps\libbyteorder-5d6212fcf45e1323.rmeta --extern safemem=C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\debug\deps\libsafemem-8a9b675cbece16a6.rmeta` (exit code: 101)
Hi,
I am not sure if this is known issue, but I tried upgrading from 0.9.3 to 0.10.1 in my project, and now base64 that I could decode before using:
base64::decode_config(&string, base64::MIME)
Now breaks with a DecodeError::InvalidLength in 0.10.1, using:
base64::decode_config(&string, base64::STANDARD_NO_PAD)
Same for:
```base64::STANDARD``.
I also tried:
base64::decode_config(&string, base64::STANDARD_NO_PAD.decode_allow_trailing_bits(true))
This is the base64 I tried to decode, which should be a valid DER x509 certificate when decoded.
MIID9TCCAt2gAwIBAgIJAN3oDHIqKat0MA0GCSqGSIb3DQEBCwUAMC0xKzApBgNV BAMTIkR1bW15IG5hbWUgZm9yIGNlcnRpZmljYXRlIHJlcXVlc3QwHhcNMTEwNzAx MDQwNzI0WhcNMTEwNzMxMDQwNzI0WjAtMSswKQYDVQQDEyJEdW1teSBuYW1lIGZv ciBjZXJ0aWZpY2F0ZSByZXF1ZXN0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB CgKCAQEArq33oZG7/cq8x0ywS/K5LlNeI5xk4LBZ3FVGzOpAOLAGrnxFR+q4WyJ3 mqlH2tPAjgf8tUiiXGmyBYehON36fK8jELs2k5YwwLoZQttS0UkCX6kY9Mzn5x47 W96NKjrBNiomGSiU+UTsmUn7BolI2/ZiwdZwel0WSCICh7EgAeraCJbGi2MDlGKQ +2GPhoiLrTCRge/DYvVtBtI7dLCp6jhvyVIv0KYSXtZ+z31AlukMW6khBtEAhve+ +yqkQXmG72Bn3MvuJ1vozX+pki3E/BfSnt3/luJVJByUuz22SF+Dh9BsVj8TtoT5 1g+DrFLitPtIpo1SWwKOXrs4XpBMbwIDAQABo4IBFjCCARIwDwYDVR0TAQH/BAUw AwEB/zAdBgNVHQ4EFgQUvVwQP/y4jndc/3/rNbih8OXT7E0wDgYDVR0PAQH/BAQD AgEGMGoGCCsGAQUFBwELBF4wXDAoBggrBgEFBQcwBYYccnN5bmM6Ly9sb2NhbGhv c3Q6NDQwNC9ycGtpLzAwBggrBgEFBQcwCoYkcnN5bmM6Ly9sb2NhbGhvc3Q6NDQw NC9ycGtpL3Jvb3QubW5mMCEGCCsGAQUFBwEIAQH/BBIwEKAOMAwwCgIBAAIFAP// //8wJwYIKwYBBQUHAQcBAf8EGDAWMAkEAgABMAMDAQAwCQQCAAIwAwMBADAYBgNV HSABAf8EDjAMMAoGCCsGAQUFBw4CMA0GCSqGSIb3DQEBCwUAA4IBAQABBgG40cDo Q0wEdfHjNpRMe3ibM6DFQKemQG9YSxgG1Cg9o8mjEZ1erF0PoqVmkdbmofkEBJMe 2UTly528jQhXf3qr4P/drs1otDHO/5y874SULA+eMACZ8o2bBzP/r/BFrgw211nY iw0TIiwXMagwzVHtMTe0bb8IDkPJ3f3QzpUBZWIPOuOx2UlB8WKE1glZ7YyHALPl FUWUfUVaVvo7Ylovmb+2OGMefl9JaJeODKOxrMW9b3duQ0dKgzgRpDd+uOLHlKPR CXKs4wBrMLaIVqxCgdR1fJIrGKxJX2qhGiouq1aH9KNr2V+gU23tbUBSjSbmNYbw IYc5/k2M7lo7
Would you be willing to take a backported 0.10 PR adding deprecation notices for APIs that require new features enabled in 0.11?
My idea was to add the same features as 0.11 has, but have these not affect the public API directly, instead they will just be used to enable deprecation notices on APIs that will require a feature flag:
#[cfg_attr(not(any(feature = "alloc", feature = "std")), deprecated(reason = "enable `std` or `alloc` feature"))]
It looks like the last release has been at the beginning of the year, so it's been quite a while and master contains the no_std support that I would like to use in my crate. I'm still blocked by a few other crates as well, but I'd like to merge no_std support in my crate sooner rather than later. So if you don't mind, I'd appreciate a v0.11 within the near future.
Either I'm misunderstanding how base64 should work (very possible), or the following non-roundtrip behavior is a bug:
let input = "FCX/tsDLpubCPKKfIrw4gc+SQkHcaD16s7GI6i/ziYW=";
println!("{:?}", input);
let dec = base64::decode_config(input, base64::STANDARD).unwrap();
println!("{:x?}", dec);
let enc = base64::encode_config(&dec, base64::STANDARD);
println!("{:?}", enc);
assert_eq!(enc, input);
Prints:
"FCX/tsDLpubCPKKfIrw4gc+SQkHcaD16s7GI6i/ziYW="
[14, 25, ff, b6, c0, cb, a6, e6, c2, 3c, a2, 9f, 22, bc, 38, 81, cf, 92, 42, 41, dc, 68, 3d, 7a, b3, b1, 88, ea, 2f, f3, 89, 85]
"FCX/tsDLpubCPKKfIrw4gc+SQkHcaD16s7GI6i/ziYU="
(last non-padding character differs)
The decoded data is 32 bytes long, whereas this online decoder claims that the output should be 33 bytes long, with an additional 0x80
byte at the end.
Version: base64 = "0.9.3"
Hello! I'm wondering if it might be possible to link to the RFC encoding specification for the different base64 Config options in the documentation?
Since the 0.10 release removed support for ignoring whitespace I looked into stripping whitespace myself and I found the snippet in the release notes:
filter(|b| !b" \n\t\r\x0b\x0c".contains(b)
We could avoid copying the content by having a function that can decode from an iterator so we can stream the filtered bytes.
This prevents creation of a new config from an external package. Is this intended ?
b"AA="
gets parsed into [0]
but is reencoded as b"AA=="
, violating canonicity.
Add #[derive(Debug, PartialEq)]
to the error so it could be compared in tests of other libraries/apps that use your library.
https://github.com/alicemaz/rust-base64/blob/master/src/lib.rs#L31
Thanks in advance!
Would you be amenable to adding CI to this project that runs tests automatically?
Additionally, would you be willing to be explicit about the minimum version of Rust that this crate supports? (e.g., By adding an explicit Rust version to the CI config.)
The work to remove the other uses of unsafe
was awesome. However, I do think it is worthwhile to try to remove all of the uses of unsafe
, of which there is currently only one. I think a lot of people are like me and think that unsafe
should only be used as a last resort. I think there are alternatives that are workable and even better in terms of usability for certain circumstances than what we have now.
Here are three ideas for removing the last use of unsafe
:
Just do it, on the assumption the difference in encode_config_buf()
performance isn't that critical, because very few applications need high-performance base64 encoding of large data.
Change the signature to use move semantics:
Old:
pub fn encode_config_buf<T: ?Sized + AsRef<[u8]>>(input: &T, config: Config, buf: &mut String) {
}
New:
pub fn encode_config_buf<T: ?Sized + AsRef<[u8]>>(input: &T, config: Config, buf: String) -> String {
let buf_bytes = buf.into_bytes();
....
String::from_utf8(buf_bytes)
}
&mut Vec
instead of &mut String
. A caller that has a String
would then be able to decide whether they want to use as_mut_vec()
for performance reasons, or whether they want to use move semantics (like suggestion 2 above), or whether they want to start with a Vec
. This also has the advantage that if you have a Vec<u8>
already, e.g. because you're writing base64 data into some kind of binary or otherwise non-UTF-8 array, you don't require any extra allocations.I was playing with https://github.com/kostya/benchmarks and upgraded the version of base64 for rust to 0.7.0 and got a big performance regression.
Experimenting further the problem happened between 0.5.2 and 0.6.0
This commit hints at a performance loss and how to fix it but it talks about 15% performance loss while the above benchmark for encoding went in my machine from 20 seconds to 120 seconds.
Maybe something else is going on?
The given sample code:
let mut hasher = Sha256::default();
hasher.input("Hello world!".as_bytes());
let base64_hash = base64::encode(hasher.result().as_slice());
println!("{} is: {:?}",
&base64_hash,
&base64::decode(&base64_hash).unwrap()[..]
);
Give the below correct output:
wFNeS+K3n/2TKRMFQ2v4iTFOSj+uwF7P/Lt98xrZ5Ro= is: [192, 83, 94, 75, 226, 183, 159, 253, 147, 41, 19, 5, 67, 107, 248, 137, 49, 78, 74, 63, 174, 192, 94, 207, 252, 187, 125, 243, 26, 217, 229, 26]
But when I tried decoding it as string
, using the below:
println!("{} is: {:?} \n {:?}",
&base64_hash,
&base64::decode(&base64_hash).unwrap()[..],
str::from_utf8(&base64::decode(&base64_hash).unwrap()[..]).unwrap() // <- new
);
I got the below:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
How can I decode the base64_hash
to give the original input Hello world!
If you add serialize
and deserialize
functions to this crate, Serde will be able to use it to easily serialize fields to base64 and deserialize from base64.
extern crate base64;
#[derive(Serialize, Deserialize)]
struct MyBytes {
#[serde(with = "base64")]
c: Vec<u8>,
}
There is one possible implementation in serde-rs/json#360.
I would expect this to be behind a Cargo feature, something like:
base64 = { version = "0.6", features = ["serde"] }
(for now I wrapped it in if orig_buf_len == 0
and published it as a patch so it doesn't mess up anyone's day)
so this is a thing that hadn't occurred to me at the time. since it just runs through the whole buffer adding line endings every n chars, data that's already there gets mutated, most often in an extremely undesirable manner. for instance mime-encoding a string that produces an output > 76, then encoding emptystring into the same buffer adds extra line seps
not sure the best way to handle this. the naive solution is to start counting from the starting point of the input, which could make sense if the user were using a large string for assorted data (though the fact that we unconditionally write to the end would make this cumbersome, and anyway there are so much more barriers in the way of this compared to buffer sharing to dodge mallocs in C I don't see why one would do it in rust)
more likely I assume the user would be incrementally building a single item, in which case we would want to count from the previous line sep, so as to always produce n-character lines with the final line unterminated (eg user produces 80 chars, we break after 76 for a length of 82, they add 80 more and we break again at 76*2+2 for 164). but making these kinds of assumptions about what already is in the buffer rather than encoding them into types makes the haskeller in me queasy and start wanting to implement yet another wrapper over strings
(incidental to this, the reserve calculation is wrong in the case where a non-empty buffer would be large enough to add even more line-endings, though only wrong in a "oops it doubles in capacity" and doesn't affect the fast loop)
Rust is at its best when you are using strong typing to indicate meaning. While in essence a base64 encoded string is always a [u8]
, functions that want a base64 encoded string would often rather be more specific than just taking String
, &str
, [u8]
, Vec<u8>
, etc. Hence, a Base64<C>
type - representing an encoded string, it could have convenience methods for converting into / from all the various string representations and collections, and would let library authors be more explicit in wanting a base64.
It also seems to be to be much more newbie friendly to show them an api like:
fn send(data: Base64<UrlSafe>)...
Instead of just asking for a string and saying it should be base64:
// Please give me a base64 encoded url-safe string!
fn send(data: String)...
There certainly is an argument publically facing APIs shouldn't be asking for data in base64, it should be doing that conversion internally. But even internal to consumer libraries, having concrete types is still very useful for readability and maintainability - you encode something somewhere, and then have to keep annotating your uses of it in other functions and structs as being base64.
This is similar to how Url
and Uri
types work in other crates. They internally use Strings or Vecs but expose a type to avoid ambiguity when handling it, and then have a range of convenience into's and from's to get coerce them into the standard types.
The <C>
generic is so that you can require the Default
or UrlSafe
encoding scheme - otherwise you have to manually inspect an unsanitized input string from anywhere you cannot perfectly trust to check for the illegal characters. Another option would have to have Base64
be an enum of Default
and UrlSafe
, but that becomes a runtime variant match check with cumbersome need to match on it.
A related issue is #17, where having an actual Base64 type would obviously implement the specialized display.
I wouldn't mind forking and trying to draft out a Base64 type if you are at all interested!
This crate provides no function for base64-encoding a &[u8]
into a String
. Instead, it provides the following two functions:
pub fn encode(input: &str) -> Result<String, Base64Error>
pub fn u8en(bytes: &[u8]) -> Result<Vec<u8>, Base64Error>
I.e., the crate does string-to-string encoding and blob-to-blob encoding, but not blob-to-string encoding—which is what base64-encoding is all about.
Furthermore, encoding cannot fail, so why do these functions return a Result
? This puts an undue burden on the application, which must either check for an error that won't happen or else call unwrap()
on the result.
I propose a single function like this:
pub fn encode(bytes: &[u8]) -> String
Rust already makes it easy to convert any string to a &[u8]
, so this one function handles every case the existing two functions already handle. Also:
Hey!
we're currently looking into packaging base64 for debian and noticed the most recently release version of safemem is using an outdated dependency. While we can technically work around this by uploading an outdated version of safemem we would rather just use the most recent version instead. The current master is using the most recent version of safemem already, could you please consider looking into releasing an update?
Thanks a lot!
In our projects, we use Hyper version 0.10 because from 0.11 onwards is started using tokio which blew up the dependency tree. We also use Rust v1.24 (the newest version available on Debian) and the newest Hyper version doesn't support that Rust version anymore.
Hyper v0.10 has a dependency on base64 v0.9.
Note that in base64 v0.9.0, safemem 0.2 is used which is compatible with Rust v1.24. However, in base64 v0.9.3, the safemem dependency is bumped to the new major version 0.3 without bumping the base64 major version.
Since safemem 0.3 released an update 2 weeks ago that breaks with Rust v1.24, the result is that Hyper v0.10.19 is no longer compatible with Rust v1.24 (and this with Debian).
I don't like to fingerpoint, but I would just want to encourage the maintainers of this library to be cautious with dependency management. Especially for a crate as trivial as base64, extreme care should be applied when bumping dependencies.
The result in this case is that it's no longer possible to use Hyper without tokio on Debian.
Requiring configuration just to decode breaks compatibility.
BTW, I'd suggest using quickcheck
in at least some tests.
quickcheck! {
fn base64_encode_decode_random(bytes: Vec<u8>) -> bool {
bytes == base64::decode(base64::encode_config(&bytes, base64::MIME).as_bytes()).unwrap()
}
}
Equivalent of the above code worked just fine with rustc_serialize
.
Behaviour of the crate is also broken when compared to standard utilities:
python -c 'print("0" * 80)' | base64 | base64 -d | base64 | tr -d '\n' | base64 -d
↑ works just fine
See https://github.com/alicemaz/rust-base64/pull/14/files#diff-b4aea3e418ccdb71239b96952d9cddb6R130
It seems unfortunate to force a user who has some (ascii) bytes that they wish to decode to String
-ify them first (at the cost of verifying that the bytes are UTF8, or unsafely creating the string). In the decode logic we simply access the bytes of the string anyway, so it would be little work to expose decoding from bytes.
We already check for every possible invalid byte because the decode tables have 256 entries (with suitable invalid markers), so it wouldn't incur any additional checking beyond what we already have. Thoughts?
This should be a short overview of per-release breaking changes and improvements, so that users know what to do when updating, and why.
Hi, this was just something that I found when trying to get to the documentation for base64.
The link on crates.io (https://crates.io/crates/base64) just points to the README in the project repo.
I then looked up the crate on doc.rs and found the generated documentation (https://docs.rs/base64/0.10.1/base64/).
This is just a little thing that you might want to update so that it's nicer for other people.
A string greater than 32 characters/length seems to be unsupported?
let a = b"abcdefghijklmnopqrstuvwxyzABCDEFG";
let b = encode(a);
When compiling I get the error:
error[E0277]: the trait bound `[u8; 33]: std::convert::AsRef<[u8]>` is not satisfied
the trait `std::convert::AsRef<[u8]>` is not implemented for `[u8; 33]`
help: the following implementations were found:
<[T] as std::convert::AsRef<[T]>>
<[T; 0] as std::convert::AsRef<[T]>>
<[T; 1] as std::convert::AsRef<[T]>>
<[T; 2] as std::convert::AsRef<[T]>>
and 30 others
= note: required by `base64::encode`
The implementations would suggest that I guess. I'm new to Rust but I'd assume there would be a way to not have to continue new implementations for each additional length of the string to be supported?
Now that there is no more unsafe code in version 0.10, would you consider adding #![forbid(unsafe_code)] ?
@marshallpierce I changed encoded_size to use checked arithmetic and return an Option with 24ead98. (there are a couple ways an attack using this could have been feasible, thank you to Andrew Ayer for reporting). presently its two callers will just panic on None
. I'm inclined to think this is better than adding Result
to encode returns since no non-malicious case of hitting usize max seems likely
also some of the linesize calculations involving are off slightly in non-dangerous ways, I'm gonna fix that and add more test cases after I can merge in your present PRs
The web, of course, needs sloppy decoding. https://infra.spec.whatwg.org/#forgiving-base64-decode
I just tried to decode a private key and there does not seem to be support for embedded whitespace. The only thing I can think of is to modify the string before passing it to decode which seems quite wasteful.
In this thread about depecating rustc-serialize
it was mentioned that this crate would be a good candidate for base64 encoding/decoding, but its license is not entirely compatible with the license of rustc-serialize
.
Is it possible to add the Apache2 license as well?
Thanks
rustc-serialize supports three types of encoding:
MIME: Configuration for RFC 2045 MIME base64 encoding
STANDARD: Configuration for RFC 4648 standard base64 encoding
URL_SAFE: Configuration for RFC 4648 base64url encoding
In particular, I have a use case for URL-safe encoding/decoding.
This might sound really silly but I wish there were a way to encode/decode buffers in-place.
My use-case is pretty simple, I have file which contains base64-encoded, zlib-compressed JSON ;)
Hello!
I am currently experimenting with the different config options, and I am running into an issue where only the MIME type seems to be decodable with openssl.
For example, I am writing out the base64-encoded bytes with:
let mut buf = String::new();
base64::encode_config_buf(&bytes_to_encode, base64::MIME, &mut buf);
fs::write("/tmp/base64_mime.test", buf.clone()).unwrap();
And then I am trying to decode them with:
openssl base64 -d -in /tmp/base64_mime.txt -out /tmp/out.test
However, if I use any other Config enum, the size of the resulting out.test file is 0.
The issue may not be with base64, but I'm wondering whether this should be expected, or if I am missing a step. The other candidate for causing this issue is in the way I am extracting the bytes_to_encode
, which is below:
unsafe fn struct_as_u8(p: &MyStruct, size_of_struct: usize) -> &[u8] {
::std::slice::from_raw_parts(
(p as *const MyStruct) as *const u8,
size_of_struct
}
Note: MyStruct
is a struct that contains other structs as well as byte arrays and primitive types.
Any help or intuition you could offer would be greatly appreciated!
Thanks!
It turns out that line wrapping is a major source of complexity for not a lot of user convenience, and I think we should get rid of it. (We're not alone in this stance on Base64 -- Java and Go stdlib base64 also doesn't support it.)
It makes ChunkedEncoder
and the work in #56 much more complex (and similar goals in #20), and I suspect it's not really necessary to support optimized (and complex) intermingling of encoding and line wrapping: are there really people out there who are encoding massive amounts of line-wrapped base64? It seems unlikely.
Anyway, here's what I'm thinking:
Config
. This would mean all line wrapping logic from various encoding routines could be stripped out, and the complication in ChunkedEncoder
could also be removed.Base64Display
would no longer be able to do line wrapping, but it already had only partial support (in that really weird line configurations would error). That will be tough for users to replicate unless we expose ChunkedEncoder
publicly, but I think the main use case of that was things like HTTP headers, which don't need line wrapping.
For Read
/Write
wrapper implementations, we can just provide an extra set of wrappers that only do line wrapping, which users can use as they see fit. Or maybe don't even bother with that?
For plain ol' "base64 these bytes with some line breaks", if we exposed the line_wrap
module more or less as-is, that's pretty much all that would be needed for, say, MIME.
Basically, I want Base64(&[u8])
that implements std::fmt::Display
that formats into destination buffer without intermediate memory allocations:
write!(file, "Sec-WebSocket-Accept: {}", Base64(value))
What what do you think about this?
I'd like to reuse the encoding tables for specialised uses, like fixed-length encodings.
It would be great if tables::XX_ENCODE
were public.
Here is the URL:
discourse://auth_redirect?payload=FZkVZvyhZylN%2FD2M9OKCeGvGyqxvuhYsLiyM3KGFDRYjHpLqn954hD0Xw5X2%0AniRlGOhf7Qyg9xHehjeinmy6GJc4x5Mny%2FJqnNCxeb5FChut3910PRqYbXMq%0ApR2kzp6ZOdE9JuBp7dS2ZGg6OfRbWpB%2FrQm%2BtO33gdQJQJw8VV5s7BbkNS8K%0AwVUzqyhKtbyEbDhve8J%2FG9htMtE9UhEpxWJIRfxrLoDSLBaq3vz4nETWb%2Byj%0AOf562313RYQ1Tn5S7Vi%2BS9emSCr9psFyHSDqa95yr4HSOu4I1rPtxH3Ew6yC%0AS%2FK%2BmPQMA%2B9ohdU33DqIr6u5Cn%2FpOGIx%2BVZPh5cHtg%3D%3D%0A
Here is my code:
let payload = [long string of base64 above];
let payload_bytes = base64::decode_config(&payload, base64::URL_SAFE).unwrap();
And it errors with:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidByte(12, 37)', /checkout/src/libcore/result.rs:906:4
In some cases, we need to use a non-standard character set, but the CharacterSet
enum only provides Standard
, UrlSafe
, and Crypt
crypt(3). The function encode_to_slice
in mod encode
and the function decode_chunk
in mod decode
can be used for encoding and decoding with our own charset tables. However, they are not public outside the crate. How about making encode
and decode
modules public?
Or just add a new trait like
pub trait CharacterSetTrait {
fn encode_table(&self) -> &'static [u8; 64];
fn decode_table(&self) -> &'static [u8; 256];
}
And make the INVALID_VALUE
constant public so that we can implement the trait for our structures to use our own charset tables.
Hi!
I'm trying to encode and decode a [u8; 64] like so:
let input_bytes = &input.to_bytes();
let mut output = String::new();
base64::encode_config_buf(&input_bytes[..32], base64::URL_SAFE, &mut output);
base64::encode_config_buf(&input_bytes[32..], base64::URL_SAFE, &mut output);
let mut decoded = Vec::new();
match base64::decode_config_buf(&output, base64::URL_SAFE, &mut decoded) {
Ok(_) => (),
Err(e) => {
return format!("{}", e);
}
}
And this keeps failing with
Invalid byte 61, offset 43.
Am I doing something egregiously wrong here? Thanks!
They'll accept rust-base64: google/oss-fuzz#2145
As per #75 (comment).
For when a Vec is inconvenient.
Hi,
Just wondering: Would you be interested in tackling AVX2 optimization within this crate?
There's some info here:
https://lemire.me/blog/2018/01/17/ridiculously-fast-base64-encoding-and-decoding/
See https://github.com/rust-lang-nursery/rustc-serialize/blob/master/src/base64.rs#L183-L187 for what rustc-seralize does
That means that right now any URL_SAFE base64 string generated by a library that does that will not be decoded properly.
I believe the encode should skip padding on URL_SAFE mode and the decode should work the same even if the padding if missing (ie decoding c0zGLzKEFWj0VxWuufTXiRMk5tlI5MbGDAYhzaxIYjo=
and c0zGLzKEFWj0VxWuufTXiRMk5tlI5MbGDAYhzaxIYjo1
should yield the same result)
A base64 implementation in Rust shouldn't need to use unsafe
for performance. The recent buffer overflow wouldn't have occurred if not for the use of unsafe
.
Even if we were not able to get the same level of performance with 100% safe code (I'm almost certain we can), I think most if not all users would trade a little performance for increased safety.
error: unused extern crate
--> /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/base64-0.9.2/src/line_wrap.rs:1:1
|
1 | extern crate safemem;
| ^^^^^^^^^^^^^^^^^^^^^ help: remove it
|
note: lint level defined here
--> /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/base64-0.9.2/src/lib.rs:59:57
|
59| missing_docs, trivial_casts, trivial_numeric_casts, unused_extern_crates, unused_import_braces,
| ^^^^^^^^^^^^^^^^^^^^
error: Compilation failed, aborting rustdoc
error: Could not document `base64`.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.