Giter Site home page Giter Site logo

rust-base64's Introduction

Docs CircleCI codecov unsafe forbidden

Made with CLion. Thanks to JetBrains for supporting open source!

It's base64. What more could anyone want?

This library's goals are to be correct and fast. It's thoroughly tested and widely used. It exposes functionality at multiple levels of abstraction so you can choose the level of convenience vs performance that you want, e.g. decode_engine_slice decodes into an existing &mut [u8] and is pretty fast (2.6GiB/s for a 3 KiB input), whereas decode_engine allocates a new Vec<u8> and returns it, which might be more convenient in some cases, but is slower (although still fast enough for almost any purpose) at 2.1 GiB/s.

See the docs for all the details.

FAQ

I need to decode base64 with whitespace/null bytes/other random things interspersed in it. What should I do?

Remove non-base64 characters from your input before decoding.

If you have a Vec of base64, retain can be used to strip out whatever you need removed.

If you have a Read (e.g. reading a file or network socket), there are various approaches.

  • Use iter_read together with Read's bytes() to filter out unwanted bytes.
  • Implement Read with a read() impl that delegates to your actual Read, and then drops any bytes you don't want.

I need to line-wrap base64, e.g. for MIME/PEM.

line-wrap does just that.

I want canonical base64 encoding/decoding.

First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to produce canonical output across all usage in the wild (hint: they don't). However, people are drawn to their own destruction like moths to a flame, so here we are.

There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits of the last encoded token in two or three token suffixes, and the = token used to inflate the suffix to a full four tokens.

The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens, with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set to 1 instead of 0.

The = pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been wasted on pointless = bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC, or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the url-safe alphabet).

All Engine implementations must at a minimum support treating non-canonical padding of both types as an error, and optionally may allow other behaviors.

Rust version compatibility

The minimum supported Rust version is 1.48.0.

Contributing

Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody wants to chase bugs in encoding of any sort.

All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the free time to give each PR the attention it deserves. I will get to everyone eventually!

Developing

Benchmarks are in benches/.

cargo bench

no_std

This crate supports no_std. By default the crate targets std via the std feature. You can deactivate the default-features to target core instead. In that case you lose out on all the functionality revolving around std::io, std::error::Error, and heap allocations. There is an additional alloc feature that you can activate to bring back the support for heap allocations.

Profiling

On Linux, you can use perf for profiling. Then compile the benchmarks with cargo bench --no-run.

Run the benchmark binary with perf (shown here filtering to one particular benchmark, which will make the results easier to read). perf is only available to the root user on most systems as it fiddles with event counters in your CPU, so use sudo. We need to run the actual benchmark binary, hence the path into target. You can see the actual full path with cargo bench -v; it will print out the commands it runs. If you use the exact path that bench outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want to cargo clean so you have only one benchmarks- binary (they tend to accumulate).

sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuse

Then analyze the results, again with perf:

sudo perf annotate -l

You'll see a bunch of interleaved rust source and assembly like this. The section with lib.rs:327 is telling us that 4.02% of samples saw the movzbl aka bit shift as the active instruction. However, this percentage is not as exact as it seems due to a phenomenon called skid. Basically, a consequence of how fancy modern CPUs are is that this sort of instruction profiling is inherently inaccurate, especially in branch-heavy code.

 lib.rs:322    0.70 :     10698:       mov    %rdi,%rax
    2.82 :        1069b:       shr    $0x38,%rax
         :                  if morsel == decode_tables::INVALID_VALUE {
         :                      bad_byte_index = input_index;
         :                      break;
         :                  };
         :                  accum = (morsel as u64) << 58;
 lib.rs:327    4.02 :     1069f:       movzbl (%r9,%rax,1),%r15d
         :              // fast loop of 8 bytes at a time
         :              while input_index < length_of_full_chunks {
         :                  let mut accum: u64;
         :
         :                  let input_chunk = BigEndian::read_u64(&input_bytes[input_index..(input_index + 8)]);
         :                  morsel = decode_table[(input_chunk >> 56) as usize];
 lib.rs:322    3.68 :     106a4:       cmp    $0xff,%r15
         :                  if morsel == decode_tables::INVALID_VALUE {
    0.00 :        106ab:       je     1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>

Fuzzing

This uses cargo-fuzz. See fuzz/fuzzers for the available fuzzing scripts. To run, use an invocation like these:

cargo +nightly fuzz run roundtrip
cargo +nightly fuzz run roundtrip_no_pad
cargo +nightly fuzz run roundtrip_random_config -- -max_len=10240
cargo +nightly fuzz run decode_random

License

This project is dual-licensed under MIT and Apache 2.0.

rust-base64's People

Contributors

alephalpha avatar alicemaz avatar atouchet avatar bdura avatar cryze avatar danieleades avatar eclipseo avatar frol avatar ggriffiniii avatar gilescope avatar guillaumegomez avatar imor avatar kpreid avatar linkmauve avatar marshallpierce avatar mina86 avatar mrijkeboer avatar nabijaczleweli avatar nox avatar resistor avatar robinst avatar rvolgers avatar shnatsel avatar silverlyra avatar stebalien avatar tamird avatar tcharding avatar tjamaan avatar untitaker avatar zireael-n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rust-base64's Issues

This crate broke Hyper v0.10's compatibility with Rust v1.24 (Debian latest)

In our projects, we use Hyper version 0.10 because from 0.11 onwards is started using tokio which blew up the dependency tree. We also use Rust v1.24 (the newest version available on Debian) and the newest Hyper version doesn't support that Rust version anymore.

Hyper v0.10 has a dependency on base64 v0.9.

Note that in base64 v0.9.0, safemem 0.2 is used which is compatible with Rust v1.24. However, in base64 v0.9.3, the safemem dependency is bumped to the new major version 0.3 without bumping the base64 major version.

Since safemem 0.3 released an update 2 weeks ago that breaks with Rust v1.24, the result is that Hyper v0.10.19 is no longer compatible with Rust v1.24 (and this with Debian).

I don't like to fingerpoint, but I would just want to encourage the maintainers of this library to be cautious with dependency management. Especially for a crate as trivial as base64, extreme care should be applied when bumping dependencies.

The result in this case is that it's no longer possible to use Hyper without tokio on Debian.

Remove built-in line wrapping

It turns out that line wrapping is a major source of complexity for not a lot of user convenience, and I think we should get rid of it. (We're not alone in this stance on Base64 -- Java and Go stdlib base64 also doesn't support it.)

It makes ChunkedEncoder and the work in #56 much more complex (and similar goals in #20), and I suspect it's not really necessary to support optimized (and complex) intermingling of encoding and line wrapping: are there really people out there who are encoding massive amounts of line-wrapped base64? It seems unlikely.

Anyway, here's what I'm thinking:

  • Remove line wrapping from Config. This would mean all line wrapping logic from various encoding routines could be stripped out, and the complication in ChunkedEncoder could also be removed.
  • Rejoice at the simplification of #56.

Base64Display would no longer be able to do line wrapping, but it already had only partial support (in that really weird line configurations would error). That will be tough for users to replicate unless we expose ChunkedEncoder publicly, but I think the main use case of that was things like HTTP headers, which don't need line wrapping.

For Read/Write wrapper implementations, we can just provide an extra set of wrappers that only do line wrapping, which users can use as they see fit. Or maybe don't even bother with that?

For plain ol' "base64 these bytes with some line breaks", if we exposed the line_wrap module more or less as-is, that's pretty much all that would be needed for, say, MIME.

`base64` fails at decoding

Requiring configuration just to decode breaks compatibility.

BTW, I'd suggest using quickcheck in at least some tests.

quickcheck! {
    fn base64_encode_decode_random(bytes: Vec<u8>) -> bool {
        bytes == base64::decode(base64::encode_config(&bytes, base64::MIME).as_bytes()).unwrap()
    }
}

Equivalent of the above code worked just fine with rustc_serialize.

Behaviour of the crate is also broken when compared to standard utilities:

python -c 'print("0" * 80)' | base64 | base64 -d | base64 | tr -d '\n' | base64 -d

↑ works just fine

Encoding should return string and never fail

This crate provides no function for base64-encoding a &[u8] into a String. Instead, it provides the following two functions:

pub fn encode(input: &str) -> Result<String, Base64Error>
pub fn u8en(bytes: &[u8]) -> Result<Vec<u8>, Base64Error>

I.e., the crate does string-to-string encoding and blob-to-blob encoding, but not blob-to-string encoding—which is what base64-encoding is all about.

Furthermore, encoding cannot fail, so why do these functions return a Result? This puts an undue burden on the application, which must either check for an error that won't happen or else call unwrap() on the result.

I propose a single function like this:

pub fn encode(bytes: &[u8]) -> String

Rust already makes it easy to convert any string to a &[u8], so this one function handles every case the existing two functions already handle. Also:

  1. It's more type-safe because it doesn't type-erase away the fact that the output is always UTF-8, and,
  2. It's more ergonomic because there's no error case for the application to deal with.

Big performance regression between 0.5.2 and 0.6.0

I was playing with https://github.com/kostya/benchmarks and upgraded the version of base64 for rust to 0.7.0 and got a big performance regression.
Experimenting further the problem happened between 0.5.2 and 0.6.0
This commit hints at a performance loss and how to fix it but it talks about 15% performance loss while the above benchmark for encoding went in my machine from 20 seconds to 120 seconds.
Maybe something else is going on?

Only supports strings of length equal to or less than 32?

A string greater than 32 characters/length seems to be unsupported?

let a = b"abcdefghijklmnopqrstuvwxyzABCDEFG";
let b = encode(a);

When compiling I get the error:

error[E0277]: the trait bound `[u8; 33]: std::convert::AsRef<[u8]>` is not satisfied

the trait `std::convert::AsRef<[u8]>` is not implemented for `[u8; 33]`

help: the following implementations were found:
             <[T] as std::convert::AsRef<[T]>>
             <[T; 0] as std::convert::AsRef<[T]>>
             <[T; 1] as std::convert::AsRef<[T]>>
             <[T; 2] as std::convert::AsRef<[T]>>
           and 30 others
   = note: required by `base64::encode`

The implementations would suggest that I guess. I'm new to Rust but I'd assume there would be a way to not have to continue new implementations for each additional length of the string to be supported?

`cargo doc` fails

error: unused extern crate
 --> /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/base64-0.9.2/src/line_wrap.rs:1:1
  |
1 | extern crate safemem;
  | ^^^^^^^^^^^^^^^^^^^^^ help: remove it
  |
note: lint level defined here
 --> /home/dpc/.cargo/registry/src/github.com-1ecc6299db9ec823/base64-0.9.2/src/lib.rs:59:57
  |
59|     missing_docs, trivial_casts, trivial_numeric_casts, unused_extern_crates, unused_import_braces,
  |                                                         ^^^^^^^^^^^^^^^^^^^^

error: Compilation failed, aborting rustdoc

error: Could not document `base64`.

line seperation behaves badly for non-empty buffers

(for now I wrapped it in if orig_buf_len == 0 and published it as a patch so it doesn't mess up anyone's day)

so this is a thing that hadn't occurred to me at the time. since it just runs through the whole buffer adding line endings every n chars, data that's already there gets mutated, most often in an extremely undesirable manner. for instance mime-encoding a string that produces an output > 76, then encoding emptystring into the same buffer adds extra line seps

not sure the best way to handle this. the naive solution is to start counting from the starting point of the input, which could make sense if the user were using a large string for assorted data (though the fact that we unconditionally write to the end would make this cumbersome, and anyway there are so much more barriers in the way of this compared to buffer sharing to dodge mallocs in C I don't see why one would do it in rust)

more likely I assume the user would be incrementally building a single item, in which case we would want to count from the previous line sep, so as to always produce n-character lines with the final line unterminated (eg user produces 80 chars, we break after 76 for a length of 82, they add 80 more and we break again at 76*2+2 for 164). but making these kinds of assumptions about what already is in the buffer rather than encoding them into types makes the haskeller in me queasy and start wanting to implement yet another wrapper over strings

(incidental to this, the reserve calculation is wrong in the case where a non-empty buffer would be large enough to add even more line-endings, though only wrong in a "oops it doubles in capacity" and doesn't affect the fast loop)

Standard Decoding with OpenSSL

Hello!

I am currently experimenting with the different config options, and I am running into an issue where only the MIME type seems to be decodable with openssl.

For example, I am writing out the base64-encoded bytes with:

let mut buf = String::new();
base64::encode_config_buf(&bytes_to_encode, base64::MIME, &mut buf);
fs::write("/tmp/base64_mime.test", buf.clone()).unwrap();

And then I am trying to decode them with:
openssl base64 -d -in /tmp/base64_mime.txt -out /tmp/out.test

However, if I use any other Config enum, the size of the resulting out.test file is 0.

The issue may not be with base64, but I'm wondering whether this should be expected, or if I am missing a step. The other candidate for causing this issue is in the way I am extracting the bytes_to_encode, which is below:

unsafe fn struct_as_u8(p: &MyStruct, size_of_struct: usize) -> &[u8] {
    ::std::slice::from_raw_parts(
        (p as *const MyStruct) as *const u8,
        size_of_struct
}

Note: MyStruct is a struct that contains other structs as well as byte arrays and primitive types.

Any help or intuition you could offer would be greatly appreciated!
Thanks!

Padding not needed on URL_SAFE

See https://github.com/rust-lang-nursery/rustc-serialize/blob/master/src/base64.rs#L183-L187 for what rustc-seralize does

That means that right now any URL_SAFE base64 string generated by a library that does that will not be decoded properly.

I believe the encode should skip padding on URL_SAFE mode and the decode should work the same even if the padding if missing (ie decoding c0zGLzKEFWj0VxWuufTXiRMk5tlI5MbGDAYhzaxIYjo= and c0zGLzKEFWj0VxWuufTXiRMk5tlI5MbGDAYhzaxIYjo1 should yield the same result)

Consider releasing a v0.11

It looks like the last release has been at the beginning of the year, so it's been quite a while and master contains the no_std support that I would like to use in my crate. I'm still blocked by a few other crates as well, but I'd like to merge no_std support in my crate sooner rather than later. So if you don't mind, I'd appreciate a v0.11 within the near future.

Got DecodeError::InvalidLength in 0.10.1, but not in 0.9.3

Hi,

I am not sure if this is known issue, but I tried upgrading from 0.9.3 to 0.10.1 in my project, and now base64 that I could decode before using:

base64::decode_config(&string, base64::MIME)

Now breaks with a DecodeError::InvalidLength in 0.10.1, using:
base64::decode_config(&string, base64::STANDARD_NO_PAD)

Same for:
```base64::STANDARD``.

I also tried:
base64::decode_config(&string, base64::STANDARD_NO_PAD.decode_allow_trailing_bits(true))

This is the base64 I tried to decode, which should be a valid DER x509 certificate when decoded.

MIID9TCCAt2gAwIBAgIJAN3oDHIqKat0MA0GCSqGSIb3DQEBCwUAMC0xKzApBgNV BAMTIkR1bW15IG5hbWUgZm9yIGNlcnRpZmljYXRlIHJlcXVlc3QwHhcNMTEwNzAx MDQwNzI0WhcNMTEwNzMxMDQwNzI0WjAtMSswKQYDVQQDEyJEdW1teSBuYW1lIGZv ciBjZXJ0aWZpY2F0ZSByZXF1ZXN0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB CgKCAQEArq33oZG7/cq8x0ywS/K5LlNeI5xk4LBZ3FVGzOpAOLAGrnxFR+q4WyJ3 mqlH2tPAjgf8tUiiXGmyBYehON36fK8jELs2k5YwwLoZQttS0UkCX6kY9Mzn5x47 W96NKjrBNiomGSiU+UTsmUn7BolI2/ZiwdZwel0WSCICh7EgAeraCJbGi2MDlGKQ +2GPhoiLrTCRge/DYvVtBtI7dLCp6jhvyVIv0KYSXtZ+z31AlukMW6khBtEAhve+ +yqkQXmG72Bn3MvuJ1vozX+pki3E/BfSnt3/luJVJByUuz22SF+Dh9BsVj8TtoT5 1g+DrFLitPtIpo1SWwKOXrs4XpBMbwIDAQABo4IBFjCCARIwDwYDVR0TAQH/BAUw AwEB/zAdBgNVHQ4EFgQUvVwQP/y4jndc/3/rNbih8OXT7E0wDgYDVR0PAQH/BAQD AgEGMGoGCCsGAQUFBwELBF4wXDAoBggrBgEFBQcwBYYccnN5bmM6Ly9sb2NhbGhv c3Q6NDQwNC9ycGtpLzAwBggrBgEFBQcwCoYkcnN5bmM6Ly9sb2NhbGhvc3Q6NDQw NC9ycGtpL3Jvb3QubW5mMCEGCCsGAQUFBwEIAQH/BBIwEKAOMAwwCgIBAAIFAP// //8wJwYIKwYBBQUHAQcBAf8EGDAWMAkEAgABMAMDAQAwCQQCAAIwAwMBADAYBgNV HSABAf8EDjAMMAoGCCsGAQUFBw4CMA0GCSqGSIb3DQEBCwUAA4IBAQABBgG40cDo Q0wEdfHjNpRMe3ibM6DFQKemQG9YSxgG1Cg9o8mjEZ1erF0PoqVmkdbmofkEBJMe 2UTly528jQhXf3qr4P/drs1otDHO/5y874SULA+eMACZ8o2bBzP/r/BFrgw211nY iw0TIiwXMagwzVHtMTe0bb8IDkPJ3f3QzpUBZWIPOuOx2UlB8WKE1glZ7YyHALPl FUWUfUVaVvo7Ylovmb+2OGMefl9JaJeODKOxrMW9b3duQ0dKgzgRpDd+uOLHlKPR CXKs4wBrMLaIVqxCgdR1fJIrGKxJX2qhGiouq1aH9KNr2V+gU23tbUBSjSbmNYbw IYc5/k2M7lo7

Deprecation notice for `std` feature in 0.10

Would you be willing to take a backported 0.10 PR adding deprecation notices for APIs that require new features enabled in 0.11?

My idea was to add the same features as 0.11 has, but have these not affect the public API directly, instead they will just be used to enable deprecation notices on APIs that will require a feature flag:

#[cfg_attr(not(any(feature = "alloc", feature = "std")), deprecated(reason = "enable `std` or `alloc` feature"))]

Decode from iterator

Since the 0.10 release removed support for ignoring whitespace I looked into stripping whitespace myself and I found the snippet in the release notes:

filter(|b| !b" \n\t\r\x0b\x0c".contains(b)

We could avoid copying the content by having a function that can decode from an iterator so we can stream the filtered bytes.

Having trouble decoding url-encoded base64

Here is the URL:
discourse://auth_redirect?payload=FZkVZvyhZylN%2FD2M9OKCeGvGyqxvuhYsLiyM3KGFDRYjHpLqn954hD0Xw5X2%0AniRlGOhf7Qyg9xHehjeinmy6GJc4x5Mny%2FJqnNCxeb5FChut3910PRqYbXMq%0ApR2kzp6ZOdE9JuBp7dS2ZGg6OfRbWpB%2FrQm%2BtO33gdQJQJw8VV5s7BbkNS8K%0AwVUzqyhKtbyEbDhve8J%2FG9htMtE9UhEpxWJIRfxrLoDSLBaq3vz4nETWb%2Byj%0AOf562313RYQ1Tn5S7Vi%2BS9emSCr9psFyHSDqa95yr4HSOu4I1rPtxH3Ew6yC%0AS%2FK%2BmPQMA%2B9ohdU33DqIr6u5Cn%2FpOGIx%2BVZPh5cHtg%3D%3D%0A

Here is my code:

let payload = [long string of base64 above];
let payload_bytes = base64::decode_config(&payload, base64::URL_SAFE).unwrap();

And it errors with:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidByte(12, 37)', /checkout/src/libcore/result.rs:906:4

Design Discussion: Make each Config a distinct type

Currently the Config is a struct that contains a CharacterSet and a boolean for padding. CharacterSet is an enum containing the 3 supported alphabets.

I've been toying around with the idea of changing the layout of the Config. The general idea is to make each supported alphabet a distinct type. Each type would then implement some traits to support the encoding/decoding algorithms. I believe with proper inlining this would allow the compiler to eliminate some branches and optimize the code better. I've sketched out a rough prototype of this and the initial results seem to align with expectations.

Encoding benchmarks:

bench_small_input/encode_slice/3                                                                             
                        time:   [8.1793 ns 8.2160 ns 8.2604 ns]
                        thrpt:  [346.36 MiB/s 348.23 MiB/s 349.79 MiB/s]
                 change:
                        time:   [-30.505% -29.744% -28.870%] (p = 0.00 < 0.05)
                        thrpt:  [+40.588% +42.337% +43.895%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
bench_small_input/encode_slice/50                                                                            
                        time:   [37.009 ns 37.237 ns 37.508 ns]
                        thrpt:  [1.2415 GiB/s 1.2505 GiB/s 1.2582 GiB/s]
                 change:
                        time:   [-13.650% -12.534% -11.558%] (p = 0.00 < 0.05)
                        thrpt:  [+13.068% +14.330% +15.808%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe
bench_small_input/encode_slice/100                                                                            
                        time:   [60.019 ns 60.255 ns 60.534 ns]
                        thrpt:  [1.5385 GiB/s 1.5456 GiB/s 1.5517 GiB/s]
                 change:
                        time:   [-7.1787% -6.7605% -6.3371%] (p = 0.00 < 0.05)
                        thrpt:  [+6.7658% +7.2507% +7.7339%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
bench_small_input/encode_slice/500                                                                            
                        time:   [280.64 ns 281.54 ns 282.68 ns]
                        thrpt:  [1.6473 GiB/s 1.6540 GiB/s 1.6593 GiB/s]
                 change:
                        time:   [-5.6106% -5.0493% -4.4584%] (p = 0.00 < 0.05)
                        thrpt:  [+4.6665% +5.3179% +5.9440%]
                        Performance has improved.

decoding and larger sizes also improve but by much more marginal amounts ~2%. I'm happy to polish it and submit it as a pull request if you're interested, but seeing as it's a pretty substantial change I figured I would open an issue for discussion first.

base64 with embedded whitespace is not supported

I just tried to decode a private key and there does not seem to be support for embedded whitespace. The only thing I can think of is to modify the string before passing it to decode which seems quite wasteful.

Adding Apache2 license

In this thread about depecating rustc-serialize it was mentioned that this crate would be a good candidate for base64 encoding/decoding, but its license is not entirely compatible with the license of rustc-serialize.

Is it possible to add the Apache2 license as well?
Thanks

Should `decode_*` methods support decoding from a `&[u8]?`

See https://github.com/alicemaz/rust-base64/pull/14/files#diff-b4aea3e418ccdb71239b96952d9cddb6R130

It seems unfortunate to force a user who has some (ascii) bytes that they wish to decode to String-ify them first (at the cost of verifying that the bytes are UTF8, or unsafely creating the string). In the decode logic we simply access the bytes of the string anyway, so it would be little work to expose decoding from bytes.

We already check for every possible invalid byte because the decode tables have 256 entries (with suitable invalid markers), so it wouldn't incur any additional checking beyond what we already have. Thoughts?

Link to RFCs for Config Types

Hello! I'm wondering if it might be possible to link to the RFC encoding specification for the different base64 Config options in the documentation?

Support for URL-safe decoding

rustc-serialize supports three types of encoding:

MIME: Configuration for RFC 2045 MIME base64 encoding
STANDARD: Configuration for RFC 4648 standard base64 encoding
URL_SAFE: Configuration for RFC 4648 base64url encoding

In particular, I have a use case for URL-safe encoding/decoding.

Add #![forbid(unsafe_code)]

Now that there is no more unsafe code in version 0.10, would you consider adding #![forbid(unsafe_code)] ?

Stop using unsafe code

A base64 implementation in Rust shouldn't need to use unsafe for performance. The recent buffer overflow wouldn't have occurred if not for the use of unsafe.

Even if we were not able to get the same level of performance with 100% safe code (I'm almost certain we can), I think most if not all users would trade a little performance for increased safety.

Adding a Base64 type

Rust is at its best when you are using strong typing to indicate meaning. While in essence a base64 encoded string is always a [u8], functions that want a base64 encoded string would often rather be more specific than just taking String, &str, [u8], Vec<u8>, etc. Hence, a Base64<C> type - representing an encoded string, it could have convenience methods for converting into / from all the various string representations and collections, and would let library authors be more explicit in wanting a base64.

It also seems to be to be much more newbie friendly to show them an api like:

fn send(data: Base64<UrlSafe>)...

Instead of just asking for a string and saying it should be base64:

// Please give me a base64 encoded url-safe string!
fn send(data: String)...

There certainly is an argument publically facing APIs shouldn't be asking for data in base64, it should be doing that conversion internally. But even internal to consumer libraries, having concrete types is still very useful for readability and maintainability - you encode something somewhere, and then have to keep annotating your uses of it in other functions and structs as being base64.

This is similar to how Url and Uri types work in other crates. They internally use Strings or Vecs but expose a type to avoid ambiguity when handling it, and then have a range of convenience into's and from's to get coerce them into the standard types.

The <C> generic is so that you can require the Default or UrlSafe encoding scheme - otherwise you have to manually inspect an unsanitized input string from anywhere you cannot perfectly trust to check for the illegal characters. Another option would have to have Base64 be an enum of Default and UrlSafe, but that becomes a runtime variant match check with cumbersome need to match on it.

A related issue is #17, where having an actual Base64 type would obviously implement the specialized display.

I wouldn't mind forking and trying to draft out a Base64 type if you are at all interested!

add CI

Would you be amenable to adding CI to this project that runs tests automatically?

Additionally, would you be willing to be explicit about the minimum version of Rust that this crate supports? (e.g., By adding an explicit Rust version to the CI config.)

Trouble encoding/decoding 64 character string

Hi!

I'm trying to encode and decode a [u8; 64] like so:

let input_bytes = &input.to_bytes();
let mut output = String::new();
base64::encode_config_buf(&input_bytes[..32], base64::URL_SAFE, &mut output);
base64::encode_config_buf(&input_bytes[32..], base64::URL_SAFE, &mut output);

let mut decoded = Vec::new();
match base64::decode_config_buf(&output, base64::URL_SAFE, &mut decoded) {
    Ok(_) => (),
    Err(e) => {
        return format!("{}", e);
    }
}

And this keeps failing with

Invalid byte 61, offset 43.

Am I doing something egregiously wrong here? Thanks!

Canonicity error

b"AA=" gets parsed into [0] but is reencoded as b"AA==", violating canonicity.

Potential overflow when calculating buffer size

While discussing the recent buffer overflow in this code, @sneves pointed me to the following line:

https://github.com/alicemaz/rust-base64/blob/master/src/lib.rs#L391

Which has a potential overflow for large inputs, when calculating the size of a buffer to allocate. Depending on further usage of that buffer, this could range from "unnecessarily failing" (as the /4 later ensures the result would have been smaller than the input, barring the overflow) to "potentially catastrophic".

Export the encoding tables

I'd like to reuse the encoding tables for specialised uses, like fixed-length encodings.
It would be great if tables::XX_ENCODE were public.

Broken Roundtrip

Either I'm misunderstanding how base64 should work (very possible), or the following non-roundtrip behavior is a bug:

let input = "FCX/tsDLpubCPKKfIrw4gc+SQkHcaD16s7GI6i/ziYW=";
println!("{:?}", input);
let dec = base64::decode_config(input, base64::STANDARD).unwrap();
println!("{:x?}", dec);
let enc = base64::encode_config(&dec, base64::STANDARD);
println!("{:?}", enc);
assert_eq!(enc, input);

Prints:

"FCX/tsDLpubCPKKfIrw4gc+SQkHcaD16s7GI6i/ziYW="
[14, 25, ff, b6, c0, cb, a6, e6, c2, 3c, a2, 9f, 22, bc, 38, 81, cf, 92, 42, 41, dc, 68, 3d, 7a, b3, b1, 88, ea, 2f, f3, 89, 85]
"FCX/tsDLpubCPKKfIrw4gc+SQkHcaD16s7GI6i/ziYU="

(last non-padding character differs)

The decoded data is 32 bytes long, whereas this online decoder claims that the output should be 33 bytes long, with an additional 0x80 byte at the end.

Version: base64 = "0.9.3"

Release update with recent safemem

Hey!

we're currently looking into packaging base64 for debian and noticed the most recently release version of safemem is using an outdated dependency. While we can technically work around this by uploading an outdated version of safemem we would rather just use the most recent version instead. The current master is using the most recent version of safemem already, could you please consider looking into releasing an update?

Thanks a lot!

How can I decode to String

The given sample code:

    let mut hasher = Sha256::default();
    hasher.input("Hello world!".as_bytes());
    let base64_hash = base64::encode(hasher.result().as_slice());
    println!("{} is: {:?}",
             &base64_hash,
             &base64::decode(&base64_hash).unwrap()[..]
             );

Give the below correct output:

wFNeS+K3n/2TKRMFQ2v4iTFOSj+uwF7P/Lt98xrZ5Ro= is: [192, 83, 94, 75, 226, 183, 159, 253, 147, 41, 19, 5, 67, 107, 248, 137, 49, 78, 74, 63, 174, 192, 94, 207, 252, 187, 125, 243, 26, 217, 229, 26]

But when I tried decoding it as string, using the below:

    println!("{} is: {:?} \n {:?}",
             &base64_hash,
             &base64::decode(&base64_hash).unwrap()[..],
             str::from_utf8(&base64::decode(&base64_hash).unwrap()[..]).unwrap()  // <- new
             );

I got the below:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

How can I decode the base64_hash to give the original input Hello world!

It's hard to use our own character set tables. How about making encode and decode modules public?

In some cases, we need to use a non-standard character set, but the CharacterSet enum only provides Standard, UrlSafe, and Crypt crypt(3). The function encode_to_slice in mod encode and the function decode_chunk in mod decode can be used for encoding and decoding with our own charset tables. However, they are not public outside the crate. How about making encode and decode modules public?

Or just add a new trait like

pub trait CharacterSetTrait {
     fn encode_table(&self) -> &'static [u8; 64];
     fn decode_table(&self) -> &'static [u8; 256];
}

And make the INVALID_VALUE constant public so that we can implement the trait for our structures to use our own charset tables.

Serde support

If you add serialize and deserialize functions to this crate, Serde will be able to use it to easily serialize fields to base64 and deserialize from base64.

extern crate base64;

#[derive(Serialize, Deserialize)]
struct MyBytes {
    #[serde(with = "base64")]
    c: Vec<u8>,
}

There is one possible implementation in serde-rs/json#360.

I would expect this to be behind a Cargo feature, something like:

base64 = { version = "0.6", features = ["serde"] }

Please add a ChangeLog file

This should be a short overview of per-release breaking changes and improvements, so that users know what to do when updating, and why.

Ideas for removing the last `unsafe`

The work to remove the other uses of unsafe was awesome. However, I do think it is worthwhile to try to remove all of the uses of unsafe, of which there is currently only one. I think a lot of people are like me and think that unsafe should only be used as a last resort. I think there are alternatives that are workable and even better in terms of usability for certain circumstances than what we have now.

/cc @marshallpierce @alicemaz

Here are three ideas for removing the last use of unsafe:

  1. Just do it, on the assumption the difference in encode_config_buf() performance isn't that critical, because very few applications need high-performance base64 encoding of large data.

  2. Change the signature to use move semantics:

Old:

pub fn encode_config_buf<T: ?Sized + AsRef<[u8]>>(input: &T, config: Config, buf: &mut String) {
}

New:

pub fn encode_config_buf<T: ?Sized + AsRef<[u8]>>(input: &T, config: Config, buf: String) -> String {
    let buf_bytes = buf.into_bytes();
    ....
    String::from_utf8(buf_bytes)
}
  1. Change the signature to use &mut Vec instead of &mut String. A caller that has a String would then be able to decide whether they want to use as_mut_vec() for performance reasons, or whether they want to use move semantics (like suggestion 2 above), or whether they want to start with a Vec. This also has the advantage that if you have a Vec<u8> already, e.g. because you're writing base64 data into some kind of binary or otherwise non-UTF-8 array, you don't require any extra allocations.

`cargo doc` breaks on 1.28.0-beta.2

I'm going to bet that this is an upstream problem -- pretty sure you ARE actually using safemem. But cargo +beta doc doesn't think so in the newly-cut beta. :(

Console log output
~/…/bug-reports/rust-base64 master$ rustc --version
rustc 1.28.0-beta.2 (5981061e5 2018-06-22)

~/…/bug-reports/rust-base64 master$ cargo doc
    Updating registry `https://github.com/rust-lang/crates.io-index`
 Downloading safemem v0.3.0
    Checking byteorder v1.2.3
    Checking safemem v0.3.0
 Documenting safemem v0.3.0
 Documenting byteorder v1.2.3
 Documenting base64 v0.9.2 (file:///C:/Users/egubler/workspace/personal/bug-reports/rust-base64)
error: unused extern crate
 --> src\line_wrap.rs:1:1
  |
1 | extern crate safemem;
  | ^^^^^^^^^^^^^^^^^^^^^ help: remove it
  |
note: lint level defined here
 --> src\lib.rs:59:57
  |
59|     missing_docs, trivial_casts, trivial_numeric_casts, unused_extern_crates, unused_import_braces,
  |                                                         ^^^^^^^^^^^^^^^^^^^^

error: Compilation failed, aborting rustdoc

error: Could not document `base64`.

Caused by:
  process didn't exit successfully: `rustdoc --crate-name base64 src\lib.rs -o C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\doc -L dependency=C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\debug\deps --extern byteorder=C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\debug\deps\libbyteorder-5d6212fcf45e1323.rmeta --extern safemem=C:\Users\egubler\workspace\personal\bug-reports\rust-base64\target\debug\deps\libsafemem-8a9b675cbece16a6.rmeta` (exit code: 101)

Display wrapper for base64 encoded value

Basically, I want Base64(&[u8]) that implements std::fmt::Display that formats into destination buffer without intermediate memory allocations:

write!(file, "Sec-WebSocket-Accept: {}", Base64(value))

What what do you think about this?

encoded_size fixes

@marshallpierce I changed encoded_size to use checked arithmetic and return an Option with 24ead98. (there are a couple ways an attack using this could have been feasible, thank you to Andrew Ayer for reporting). presently its two callers will just panic on None. I'm inclined to think this is better than adding Result to encode returns since no non-malicious case of hitting usize max seems likely

also some of the linesize calculations involving are off slightly in non-dangerous ways, I'm gonna fix that and add more test cases after I can merge in your present PRs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.