Comments (18)
I too would like to encourage you to stop using unsafe code.
Even if performance suffers, most projects using base64 are probably not bottle-necked on base64 encoding. Therefore, a general-purpose base64 crate (which the crate name implies this crate is) should choose safety over maximum possible performance.
If a project finds that they are bottle-necked on base64 encoding and they have to speed it up at any cost, then they can use a special-purpose implementation that uses unsafe
. They alone should bear the risks of the unsafe code, instead of making all users of this crate bear the risk.
from rust-base64.
Thanks for the link; I ended up using a similar approach with some hand unrolling to get pretty close:
name control.bcmp ns/iter variable.bcmp ns/iter diff ns/iter diff %
encode_100b 91 (1098 MB/s) 94 (1063 MB/s) 3 3.30%
encode_100b_reuse_buf 73 (1369 MB/s) 71 (1408 MB/s) -2 -2.74%
encode_10mib 9,250,468 (1133 MB/s) 9,321,944 (1124 MB/s) 71,476 0.77%
encode_10mib_reuse_buf 5,931,754 (1767 MB/s) 6,253,132 (1676 MB/s) 321,378 5.42%
encode_30mib 27,701,795 (1135 MB/s) 28,699,091 (1096 MB/s) 997,296 3.60%
encode_30mib_reuse_buf 17,828,503 (1764 MB/s) 19,389,466 (1622 MB/s) 1,560,963 8.76%
encode_3b 32 (93 MB/s) 38 (78 MB/s) 6 18.75%
encode_3b_reuse_buf 13 (230 MB/s) 19 (157 MB/s) 6 46.15%
encode_3kib 1,780 (1725 MB/s) 1,693 (1814 MB/s) -87 -4.89%
encode_3kib_reuse_buf 1,766 (1739 MB/s) 1,663 (1847 MB/s) -103 -5.83%
encode_3kib_reuse_buf_mime 2,047 (1500 MB/s) 1,945 (1579 MB/s) -102 -4.98%
encode_3mib 2,522,124 (1247 MB/s) 2,604,037 (1208 MB/s) 81,913 3.25%
encode_3mib_reuse_buf 1,764,377 (1782 MB/s) 1,767,964 (1779 MB/s) 3,587 0.20%
encode_500b 321 (1557 MB/s) 313 (1597 MB/s) -8 -2.49%
encode_500b_reuse_buf 304 (1644 MB/s) 283 (1766 MB/s) -21 -6.91%
encode_500b_reuse_buf_mime 369 (1355 MB/s) 356 (1404 MB/s) -13 -3.52%
encode_50b 62 (806 MB/s) 64 (781 MB/s) 2 3.23%
encode_50b_reuse_buf 42 (1190 MB/s) 50 (1000 MB/s) 8 19.05%
from rust-base64.
I had a play with removing the unsafe parts to see how the performance changed. First, a naive try:
$ cargo benchcmp master-bench safe-bench
name master-bench ns/iter safe-bench ns/iter diff ns/iter diff %
decode_100b 118 (847 MB/s) 116 (862 MB/s) -2 -1.69%
decode_100b_reuse_buf 94 (1063 MB/s) 94 (1063 MB/s) 0 0.00%
decode_10mib 10,487,708 (999 MB/s) 11,553,858 (907 MB/s) 1,066,150 10.17%
decode_10mib_reuse_buf 8,386,518 (1250 MB/s) 8,689,905 (1206 MB/s) 303,387 3.62%
decode_30mib 31,998,505 (983 MB/s) 33,568,279 (937 MB/s) 1,569,774 4.91%
decode_30mib_reuse_buf 25,030,342 (1256 MB/s) 26,220,747 (1199 MB/s) 1,190,405 4.76%
decode_3b 46 (86 MB/s) 44 (90 MB/s) -2 -4.35%
decode_3b_reuse_buf 21 (190 MB/s) 22 (181 MB/s) 1 4.76%
decode_3kib 2,418 (1270 MB/s) 2,455 (1251 MB/s) 37 1.53%
decode_3kib_reuse_buf 2,441 (1258 MB/s) 2,349 (1307 MB/s) -92 -3.77%
decode_3mib 3,345,550 (940 MB/s) 3,228,621 (974 MB/s) -116,929 -3.50%
decode_3mib_reuse_buf 2,587,417 (1215 MB/s) 2,480,115 (1268 MB/s) -107,302 -4.15%
decode_500b 417 (1199 MB/s) 422 (1184 MB/s) 5 1.20%
decode_500b_reuse_buf 397 (1259 MB/s) 385 (1298 MB/s) -12 -3.02%
decode_50b 80 (650 MB/s) 82 (634 MB/s) 2 2.50%
decode_50b_reuse_buf 55 (945 MB/s) 55 (945 MB/s) 0 0.00%
encode_100b 104 (961 MB/s) 343 (291 MB/s) 239 229.81%
encode_100b_reuse_buf 82 (1219 MB/s) 342 (292 MB/s) 260 317.07%
encode_10mib 10,487,488 (999 MB/s) 41,560,424 (252 MB/s) 31,072,936 296.29%
encode_10mib_reuse_buf 6,663,982 (1573 MB/s) 34,109,734 (307 MB/s) 27,445,752 411.85%
encode_30mib 31,732,310 (991 MB/s) 114,060,373 (275 MB/s) 82,328,063 259.45%
encode_30mib_reuse_buf 19,926,666 (1578 MB/s) 104,935,188 (299 MB/s) 85,008,522 426.61%
encode_3b 36 (83 MB/s) 44 (68 MB/s) 8 22.22%
encode_3b_reuse_buf 15 (200 MB/s) 21 (142 MB/s) 6 40.00%
encode_3kib 1,942 (1581 MB/s) 9,901 (310 MB/s) 7,959 409.84%
encode_3kib_reuse_buf 1,914 (1605 MB/s) 10,142 (302 MB/s) 8,228 429.89%
encode_3mib 3,075,370 (1022 MB/s) 11,194,362 (281 MB/s) 8,118,992 264.00%
encode_3mib_reuse_buf 1,987,322 (1582 MB/s) 10,523,365 (298 MB/s) 8,536,043 429.52%
encode_500b 349 (1432 MB/s) 1,629 (306 MB/s) 1,280 366.76%
encode_500b_reuse_buf 336 (1488 MB/s) 1,629 (306 MB/s) 1,293 384.82%
encode_50b 67 (746 MB/s) 202 (247 MB/s) 135 201.49%
encode_50b_reuse_buf 45 (1111 MB/s) 178 (280 MB/s) 133 295.56%
That seems like a steep drop.
An improved try:
$ cargo benchcmp master-bench safe5-bench
name master-bench ns/iter safe5-bench ns/iter diff ns/iter diff %
decode_100b 118 (847 MB/s) 116 (862 MB/s) -2 -1.69%
decode_100b_reuse_buf 94 (1063 MB/s) 98 (1020 MB/s) 4 4.26%
decode_10mib 10,487,708 (999 MB/s) 10,311,384 (1016 MB/s) -176,324 -1.68%
decode_10mib_reuse_buf 8,386,518 (1250 MB/s) 8,041,441 (1303 MB/s) -345,077 -4.11%
decode_30mib 31,998,505 (983 MB/s) 30,954,011 (1016 MB/s) -1,044,494 -3.26%
decode_30mib_reuse_buf 25,030,342 (1256 MB/s) 24,426,738 (1287 MB/s) -603,604 -2.41%
decode_3b 46 (86 MB/s) 45 (88 MB/s) -1 -2.17%
decode_3b_reuse_buf 21 (190 MB/s) 21 (190 MB/s) 0 0.00%
decode_3kib 2,418 (1270 MB/s) 2,475 (1241 MB/s) 57 2.36%
decode_3kib_reuse_buf 2,441 (1258 MB/s) 2,450 (1253 MB/s) 9 0.37%
decode_3mib 3,345,550 (940 MB/s) 3,037,194 (1035 MB/s) -308,356 -9.22%
decode_3mib_reuse_buf 2,587,417 (1215 MB/s) 2,309,580 (1362 MB/s) -277,837 -10.74%
decode_500b 417 (1199 MB/s) 401 (1246 MB/s) -16 -3.84%
decode_500b_reuse_buf 397 (1259 MB/s) 382 (1308 MB/s) -15 -3.78%
decode_50b 80 (650 MB/s) 77 (675 MB/s) -3 -3.75%
decode_50b_reuse_buf 55 (945 MB/s) 54 (962 MB/s) -1 -1.82%
encode_100b 104 (961 MB/s) 186 (537 MB/s) 82 78.85%
encode_100b_reuse_buf 82 (1219 MB/s) 161 (621 MB/s) 79 96.34%
encode_10mib 10,487,488 (999 MB/s) 17,835,534 (587 MB/s) 7,348,046 70.06%
encode_10mib_reuse_buf 6,663,982 (1573 MB/s) 14,078,296 (744 MB/s) 7,414,314 111.26%
encode_30mib 31,732,310 (991 MB/s) 54,274,642 (579 MB/s) 22,542,332 71.04%
encode_30mib_reuse_buf 19,926,666 (1578 MB/s) 44,353,283 (709 MB/s) 24,426,617 122.58%
encode_3b 36 (83 MB/s) 53 (56 MB/s) 17 47.22%
encode_3b_reuse_buf 15 (200 MB/s) 30 (100 MB/s) 15 100.00%
encode_3kib 1,942 (1581 MB/s) 4,269 (719 MB/s) 2,327 119.82%
encode_3kib_reuse_buf 1,914 (1605 MB/s) 4,200 (731 MB/s) 2,286 119.44%
encode_3mib 3,075,370 (1022 MB/s) 5,506,443 (571 MB/s) 2,431,073 79.05%
encode_3mib_reuse_buf 1,987,322 (1582 MB/s) 4,511,933 (697 MB/s) 2,524,611 127.04%
encode_500b 349 (1432 MB/s) 817 (611 MB/s) 468 134.10%
encode_500b_reuse_buf 336 (1488 MB/s) 838 (596 MB/s) 502 149.40%
encode_50b 67 (746 MB/s) 124 (403 MB/s) 57 85.07%
encode_50b_reuse_buf 45 (1111 MB/s) 107 (467 MB/s) 62 137.78%
I guess the remaining question is whether this crate has hard performance targets in the 10Gbps range.
from rust-base64.
Here's another approach with a smaller performance hit:
name control.bcmp ns/iter variable.bcmp ns/iter diff ns/iter diff %
encode_100b 90 (1111 MB/s) 114 (877 MB/s) 24 26.67%
encode_100b_reuse_buf 72 (1388 MB/s) 91 (1098 MB/s) 19 26.39%
encode_10mib 9,282,507 (1129 MB/s) 11,206,626 (935 MB/s) 1,924,119 20.73%
encode_10mib_reuse_buf 5,928,751 (1768 MB/s) 8,247,930 (1271 MB/s) 2,319,179 39.12%
encode_30mib 27,895,742 (1127 MB/s) 34,408,473 (914 MB/s) 6,512,731 23.35%
encode_30mib_reuse_buf 17,837,434 (1763 MB/s) 25,276,966 (1244 MB/s) 7,439,532 41.71%
encode_3b 32 (93 MB/s) 38 (78 MB/s) 6 18.75%
encode_3b_reuse_buf 13 (230 MB/s) 18 (166 MB/s) 5 38.46%
encode_3kib 1,781 (1724 MB/s) 2,309 (1330 MB/s) 528 29.65%
encode_3kib_reuse_buf 1,761 (1744 MB/s) 2,284 (1345 MB/s) 523 29.70%
encode_3kib_reuse_buf_mime 2,032 (1511 MB/s) 2,571 (1194 MB/s) 539 26.53%
encode_3mib 2,539,998 (1238 MB/s) 3,192,098 (985 MB/s) 652,100 25.67%
encode_3mib_reuse_buf 1,767,981 (1779 MB/s) 2,362,400 (1331 MB/s) 594,419 33.62%
encode_500b 321 (1557 MB/s) 403 (1240 MB/s) 82 25.55%
encode_500b_reuse_buf 303 (1650 MB/s) 384 (1302 MB/s) 81 26.73%
encode_500b_reuse_buf_mime 367 (1362 MB/s) 444 (1126 MB/s) 77 20.98%
encode_50b 61 (819 MB/s) 72 (694 MB/s) 11 18.03%
encode_50b_reuse_buf 42 (1190 MB/s) 53 (943 MB/s) 11 26.19%
This one is based off the #27. That PR introduces other usages of unsafe that apply only to line wrapping, but the restructuring of the encode path made it a little easier to work for this. It also included a reworked encoded_size()
that is precise in all line wrap / padding combos, which was needed for this approach. Now, other than the aforementioned line wrapping, the use of unsafe is limited to accessing the contents of the output String
as a Vec<u8>
.
from rust-base64.
It's hard to be too up in arms about performance when we just had a security issue. :(
I did see significant performance regressions without unsafe
when I was working on that some months ago, but it's certainly worth another look to see if things have improved in the intervening Rust releases. Maybe sprinkling a few key asserts around will help the optimizer get it right.
from rust-base64.
I wonder if further optimizations would be possible by limiting the customization of the character set such that A-Za-z0-9
are always fixed and only the non-digit, non-letter stuff could be customized. Then most table lookups could be avoided, probably?
from rust-base64.
Expanding on my previous comment: A few crypto-specific base64 implementations I've seen recently (e.g. BoringSSL) switched to a constant-time implementation that doesn't use table lookups at all. I imagine eliminating all table lookups entirely could improve performance, but it would be impractical unless the customization were strictly limited.
from rust-base64.
Actually, the table lookups (according to perf and friends) seem to be really cheap -- the slowdown here seems to be due to bounds checking when writing to the output slice. :/
from rust-base64.
Side note -- writing into the output slice directly, rather than packaging bytes into a u64
and using byteorder
to splat them out into the output slice, was significantly slower, because it involves more bounds checks.
from rust-base64.
You might find this useful then: https://users.rust-lang.org/t/how-to-zip-two-slices-efficiently/2048. Note that the suggestions to use zip()
won't work for this because the source and destination are different lengths, but some of the tricks about being explicit about lengths being the same or one being larger than another might work.
from rust-base64.
Also, maybe some of the tricks used to implement str
and String
UTF-8 support might be useful here. Base64 is a simpler transformation in terms of length-of-input/length-of-output differences than UTF-8.
from rust-base64.
line wrap is now also unsafe-free in the minimize-unsafe
branch. The only unsafe left is the one that allows treating a String as a Vec<u8>
. That one I think is defensible: it would not cause memory safety issues, only invalid UTF-8 byte sequences, if things were to go awry.
from rust-base64.
@marshallpierce Internally, the str
and String
types do a bunch of unsafe
stuff assuming that the text is valid UTF-8. For example, let's say the Vec<u8>
ends with a value 0b11110???. Then the internals of str
will assume there are at least 3 more bytes after it. And then it will interpret those (out-of-bounds) bytes assuming it is valid UTF-8. So I don't think it is clear that invalid UTF-8 is OK.
Regardless, I think it is worthwhile to try to do everything without unsafe
unless there's a really strong argument for not doing so, which I don't see.
from rust-base64.
An aside from someone who has been intending to use this library:
There are plenty of cases (e.g. storing binary data in JSON) where a base64 parser can be a performance bottleneck. So I think near-maximum speed is an important goal; not more important than correctness, of course, but in principle if a core functionality like this cannot be made fast enough without unsafe, then the burden of demonstrating correctness should be moved from the compiler to tests. (It looks as though the "fast enough" strategies have already been found, though.)
It's easy to import instincts from other languages, "Oh, it's probably not the bottleneck", but I think this is less true in Rust because if performance truly wasn't the bottleneck anywhere in their code, people probably wouldn't be using Rust in the first place.
from rust-base64.
Here are the options I see.
One approach would be to switch the lower level encoding logic to just String.push_str()
for each output byte. In other words, use String as the collection type rather than a [u8]
or Vec<u8>
. I did not bother to benchmark this (though I am not optimistic about it) because it would be contrary to my goal of avoiding heap-ful types. Encoding to u8
slices enables heap free niceties like the Display
implementation in display-wrapper
, which will also easily lead to a heap free implementation of #20. I also want to expose encoding to a Vec<u8>
(or maybe even a &mut [u8]
) in the public API for people who don't care about the String
-ness of the output, so for implementation sanity I don't want to push the String
type further down into the implementation innards.
One way to continue to encode into a [u8]
would be to encode into a slice (which could be to a stack-resident array from the caller) and then let the caller use str::from_utf8
after each batch of bytes is encoded and append the resulting str
to a String
. However, this has a pretty dreadful performance cost. Here's the performance loss just from doing utf8 validation on each batch of bytes (str::from_utf8
and throwing away the unwrapped result):
name control.bcmp ns/iter variable.bcmp ns/iter diff ns/iter diff %
encode_100b 94 (1063 MB/s) 118 (847 MB/s) 24 25.53%
encode_100b_reuse_buf 71 (1408 MB/s) 97 (1030 MB/s) 26 36.62%
encode_10mib 9,376,605 (1118 MB/s) 11,770,334 (890 MB/s) 2,393,729 25.53%
encode_10mib_reuse_buf 6,300,363 (1664 MB/s) 8,673,367 (1208 MB/s) 2,373,004 37.66%
encode_30mib 28,628,693 (1098 MB/s) 35,876,681 (876 MB/s) 7,247,988 25.32%
encode_30mib_reuse_buf 19,935,389 (1577 MB/s) 26,503,729 (1186 MB/s) 6,568,340 32.95%
encode_3b 40 (75 MB/s) 41 (73 MB/s) 1 2.50%
encode_3b_reuse_buf 18 (166 MB/s) 23 (130 MB/s) 5 27.78%
encode_3kib 1,691 (1816 MB/s) 2,394 (1283 MB/s) 703 41.57%
encode_3kib_reuse_buf 1,668 (1841 MB/s) 2,377 (1292 MB/s) 709 42.51%
encode_3kib_reuse_buf_mime 2,097 (1464 MB/s) 2,805 (1095 MB/s) 708 33.76%
encode_3mib 2,697,562 (1166 MB/s) 3,343,944 (940 MB/s) 646,382 23.96%
encode_3mib_reuse_buf 1,770,041 (1777 MB/s) 2,501,624 (1257 MB/s) 731,583 41.33%
encode_500b 309 (1618 MB/s) 445 (1123 MB/s) 136 44.01%
encode_500b_reuse_buf 286 (1748 MB/s) 426 (1173 MB/s) 140 48.95%
encode_500b_reuse_buf_mime 374 (1336 MB/s) 508 (984 MB/s) 134 35.83%
encode_50b 65 (769 MB/s) 77 (649 MB/s) 12 18.46%
encode_50b_reuse_buf 47 (1063 MB/s) 56 (892 MB/s) 9 19.15%
That's not counting the cost of copying the bytes into the String
, the additional bounds checking that would incur, or the overhead of switching to batch-by-batch encoding logic (which, judging by the Display
wrapper which basically does this, would be another 10-20%). So, that's not great.
Another approach would be to post-hoc str::from_utf8
on the slice that's been written to and otherwise leaving the logic as is. This has a less ruinous performance cost on my machine:
name control.bcmp ns/iter variable.bcmp ns/iter diff ns/iter diff %
encode_100b 94 (1063 MB/s) 102 (980 MB/s) 8 8.51%
encode_100b_reuse_buf 71 (1408 MB/s) 80 (1250 MB/s) 9 12.68%
encode_10mib 9,376,605 (1118 MB/s) 10,327,063 (1015 MB/s) 950,458 10.14%
encode_10mib_reuse_buf 6,300,363 (1664 MB/s) 7,206,090 (1455 MB/s) 905,727 14.38%
encode_30mib 28,628,693 (1098 MB/s) 32,278,805 (974 MB/s) 3,650,112 12.75%
encode_30mib_reuse_buf 19,935,389 (1577 MB/s) 22,757,454 (1382 MB/s) 2,822,065 14.16%
encode_3b 40 (75 MB/s) 42 (71 MB/s) 2 5.00%
encode_3b_reuse_buf 18 (166 MB/s) 23 (130 MB/s) 5 27.78%
encode_3kib 1,691 (1816 MB/s) 1,818 (1689 MB/s) 127 7.51%
encode_3kib_reuse_buf 1,668 (1841 MB/s) 1,821 (1686 MB/s) 153 9.17%
encode_3kib_reuse_buf_mime 2,097 (1464 MB/s) 2,232 (1376 MB/s) 135 6.44%
encode_3mib 2,697,562 (1166 MB/s) 2,782,393 (1130 MB/s) 84,831 3.14%
encode_3mib_reuse_buf 1,770,041 (1777 MB/s) 1,933,529 (1626 MB/s) 163,488 9.24%
encode_500b 309 (1618 MB/s) 329 (1519 MB/s) 20 6.47%
encode_500b_reuse_buf 286 (1748 MB/s) 306 (1633 MB/s) 20 6.99%
encode_500b_reuse_buf_mime 374 (1336 MB/s) 403 (1240 MB/s) 29 7.75%
encode_50b 65 (769 MB/s) 70 (714 MB/s) 5 7.69%
encode_50b_reuse_buf 47 (1063 MB/s) 50 (1000 MB/s) 3 6.38%
I could maybe groan and tolerate that if that wasn't pretty much a best-case scenario: I have a fast Xeon with tons of cache and memory bandwidth and a clever prefetcher. On a lesser system (which I don't have easy access to), it may be significantly worse. If anyone has interesting systems they want to try this on, let me know. Furthermore, I'm not really sure how valuable this runtime check is. We still have the unsafe .as_mut_vec()
; we're just checking what we wrote into it after the fact. Towards @Ichoran's point about handing off responsibility to tests -- could we not achieve a satisfactory level of reliability if instead of adding a check that runs after every encode, we instead have tests that check random inputs with random encoding configs to make sure that they're valid UTF-8? I think we can pretty effectively draw a boundary around the code that does potentially incorrect things with UTF-8 and test the snot out of them. The logic is not so complex as to deflect human inspection, and the correctness criteria for the output is straightforward and easy to exhaustively verify.
from rust-base64.
could we not achieve a satisfactory level of reliability if instead of adding a check that runs after every encode, we instead have tests that check random inputs with random encoding configs to make sure that they're valid UTF-8?
Sounds like setting up "cargo fuzz" for this crate, and having it run for a long time, would be useful. Regardless of how much unsafe'ness of the implementation can be prevented or not. See #21
from rust-base64.
See #26. :)
from rust-base64.
fixed with #31
from rust-base64.
Related Issues (20)
- Restore base64::{encode, decode} functions HOT 11
- Thank you
- How do I change the padding character? HOT 1
- Using this crate easier HOT 2
- DecoderReader accepts incorrect input HOT 2
- Design choices HOT 5
- How come I can't decode this string? HOT 1
- `DecoderReader` probably should accept `BufRead` instead of `Read` HOT 1
- make Alphabet::from_str_unchecked public HOT 3
- Replacement for base64::decode()? HOT 12
- How to generate the base64 format like openssl command HOT 3
- `DecoderReader` does not respect `with_decode_allow_trailing_bits` HOT 4
- how to encode image bytes to string? HOT 1
- GeneralPurpose engine should implement Clone and Debug HOT 6
- Make `Alphabet::from_str_unchecked` `pub const unsafe` HOT 5
- Calling `EncoderStringWriter::write` successively does not equal `EncoderStringWriter::write_all` HOT 2
- Make `encoded_len` const HOT 3
- Add encode_vec() HOT 4
- Question: best way to access inner field values HOT 5
- SIMD support? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-base64.