A base64 implementation in Rust shouldn't need to use unsafe

Here's <a href="https://github.com/marshallpierce/rust-base64/compare/line-wrap...mars

You might find this useful then: <a href="https://users.rust-lang.org/t/how-to-zip-two

Stop using unsafe code,about marshallpierce/rust-base64

Comments (18)

AGWA commented on May 29, 2024 11

I too would like to encourage you to stop using unsafe code.

Even if performance suffers, most projects using base64 are probably not bottle-necked on base64 encoding. Therefore, a general-purpose base64 crate (which the crate name implies this crate is) should choose safety over maximum possible performance.

If a project finds that they are bottle-necked on base64 encoding and they have to speed it up at any cost, then they can use a special-purpose implementation that uses unsafe. They alone should bear the risks of the unsafe code, instead of making all users of this crate bear the risk.

from rust-base64.

marshallpierce commented on May 29, 2024 7

Thanks for the link; I ended up using a similar approach with some hand unrolling to get pretty close:

 name                        control.bcmp ns/iter    variable.bcmp ns/iter   diff ns/iter  diff % 
 encode_100b                 91 (1098 MB/s)          94 (1063 MB/s)                     3   3.30% 
 encode_100b_reuse_buf       73 (1369 MB/s)          71 (1408 MB/s)                    -2  -2.74% 
 encode_10mib                9,250,468 (1133 MB/s)   9,321,944 (1124 MB/s)         71,476   0.77% 
 encode_10mib_reuse_buf      5,931,754 (1767 MB/s)   6,253,132 (1676 MB/s)        321,378   5.42% 
 encode_30mib                27,701,795 (1135 MB/s)  28,699,091 (1096 MB/s)       997,296   3.60% 
 encode_30mib_reuse_buf      17,828,503 (1764 MB/s)  19,389,466 (1622 MB/s)     1,560,963   8.76% 
 encode_3b                   32 (93 MB/s)            38 (78 MB/s)                       6  18.75% 
 encode_3b_reuse_buf         13 (230 MB/s)           19 (157 MB/s)                      6  46.15% 
 encode_3kib                 1,780 (1725 MB/s)       1,693 (1814 MB/s)                -87  -4.89% 
 encode_3kib_reuse_buf       1,766 (1739 MB/s)       1,663 (1847 MB/s)               -103  -5.83% 
 encode_3kib_reuse_buf_mime  2,047 (1500 MB/s)       1,945 (1579 MB/s)               -102  -4.98% 
 encode_3mib                 2,522,124 (1247 MB/s)   2,604,037 (1208 MB/s)         81,913   3.25% 
 encode_3mib_reuse_buf       1,764,377 (1782 MB/s)   1,767,964 (1779 MB/s)          3,587   0.20% 
 encode_500b                 321 (1557 MB/s)         313 (1597 MB/s)                   -8  -2.49% 
 encode_500b_reuse_buf       304 (1644 MB/s)         283 (1766 MB/s)                  -21  -6.91% 
 encode_500b_reuse_buf_mime  369 (1355 MB/s)         356 (1404 MB/s)                  -13  -3.52% 
 encode_50b                  62 (806 MB/s)           64 (781 MB/s)                      2   3.23% 
 encode_50b_reuse_buf        42 (1190 MB/s)          50 (1000 MB/s)                     8  19.05%

from rust-base64.

ctz commented on May 29, 2024 2

I had a play with removing the unsafe parts to see how the performance changed. First, a naive try:

$ cargo benchcmp master-bench safe-bench 
 name                    master-bench ns/iter    safe-bench ns/iter      diff ns/iter   diff % 
 decode_100b             118 (847 MB/s)          116 (862 MB/s)                    -2   -1.69% 
 decode_100b_reuse_buf   94 (1063 MB/s)          94 (1063 MB/s)                     0    0.00% 
 decode_10mib            10,487,708 (999 MB/s)   11,553,858 (907 MB/s)      1,066,150   10.17% 
 decode_10mib_reuse_buf  8,386,518 (1250 MB/s)   8,689,905 (1206 MB/s)        303,387    3.62% 
 decode_30mib            31,998,505 (983 MB/s)   33,568,279 (937 MB/s)      1,569,774    4.91% 
 decode_30mib_reuse_buf  25,030,342 (1256 MB/s)  26,220,747 (1199 MB/s)     1,190,405    4.76% 
 decode_3b               46 (86 MB/s)            44 (90 MB/s)                      -2   -4.35% 
 decode_3b_reuse_buf     21 (190 MB/s)           22 (181 MB/s)                      1    4.76% 
 decode_3kib             2,418 (1270 MB/s)       2,455 (1251 MB/s)                 37    1.53% 
 decode_3kib_reuse_buf   2,441 (1258 MB/s)       2,349 (1307 MB/s)                -92   -3.77% 
 decode_3mib             3,345,550 (940 MB/s)    3,228,621 (974 MB/s)        -116,929   -3.50% 
 decode_3mib_reuse_buf   2,587,417 (1215 MB/s)   2,480,115 (1268 MB/s)       -107,302   -4.15% 
 decode_500b             417 (1199 MB/s)         422 (1184 MB/s)                    5    1.20% 
 decode_500b_reuse_buf   397 (1259 MB/s)         385 (1298 MB/s)                  -12   -3.02% 
 decode_50b              80 (650 MB/s)           82 (634 MB/s)                      2    2.50% 
 decode_50b_reuse_buf    55 (945 MB/s)           55 (945 MB/s)                      0    0.00% 
 encode_100b             104 (961 MB/s)          343 (291 MB/s)                   239  229.81% 
 encode_100b_reuse_buf   82 (1219 MB/s)          342 (292 MB/s)                   260  317.07% 
 encode_10mib            10,487,488 (999 MB/s)   41,560,424 (252 MB/s)     31,072,936  296.29% 
 encode_10mib_reuse_buf  6,663,982 (1573 MB/s)   34,109,734 (307 MB/s)     27,445,752  411.85% 
 encode_30mib            31,732,310 (991 MB/s)   114,060,373 (275 MB/s)    82,328,063  259.45% 
 encode_30mib_reuse_buf  19,926,666 (1578 MB/s)  104,935,188 (299 MB/s)    85,008,522  426.61% 
 encode_3b               36 (83 MB/s)            44 (68 MB/s)                       8   22.22% 
 encode_3b_reuse_buf     15 (200 MB/s)           21 (142 MB/s)                      6   40.00% 
 encode_3kib             1,942 (1581 MB/s)       9,901 (310 MB/s)               7,959  409.84% 
 encode_3kib_reuse_buf   1,914 (1605 MB/s)       10,142 (302 MB/s)              8,228  429.89% 
 encode_3mib             3,075,370 (1022 MB/s)   11,194,362 (281 MB/s)      8,118,992  264.00% 
 encode_3mib_reuse_buf   1,987,322 (1582 MB/s)   10,523,365 (298 MB/s)      8,536,043  429.52% 
 encode_500b             349 (1432 MB/s)         1,629 (306 MB/s)               1,280  366.76% 
 encode_500b_reuse_buf   336 (1488 MB/s)         1,629 (306 MB/s)               1,293  384.82% 
 encode_50b              67 (746 MB/s)           202 (247 MB/s)                   135  201.49% 
 encode_50b_reuse_buf    45 (1111 MB/s)          178 (280 MB/s)                   133  295.56%

That seems like a steep drop.

An improved try:

$ cargo benchcmp master-bench safe5-bench
 name                    master-bench ns/iter    safe5-bench ns/iter     diff ns/iter   diff % 
 decode_100b             118 (847 MB/s)          116 (862 MB/s)                    -2   -1.69% 
 decode_100b_reuse_buf   94 (1063 MB/s)          98 (1020 MB/s)                     4    4.26% 
 decode_10mib            10,487,708 (999 MB/s)   10,311,384 (1016 MB/s)      -176,324   -1.68% 
 decode_10mib_reuse_buf  8,386,518 (1250 MB/s)   8,041,441 (1303 MB/s)       -345,077   -4.11% 
 decode_30mib            31,998,505 (983 MB/s)   30,954,011 (1016 MB/s)    -1,044,494   -3.26% 
 decode_30mib_reuse_buf  25,030,342 (1256 MB/s)  24,426,738 (1287 MB/s)      -603,604   -2.41% 
 decode_3b               46 (86 MB/s)            45 (88 MB/s)                      -1   -2.17% 
 decode_3b_reuse_buf     21 (190 MB/s)           21 (190 MB/s)                      0    0.00% 
 decode_3kib             2,418 (1270 MB/s)       2,475 (1241 MB/s)                 57    2.36% 
 decode_3kib_reuse_buf   2,441 (1258 MB/s)       2,450 (1253 MB/s)                  9    0.37% 
 decode_3mib             3,345,550 (940 MB/s)    3,037,194 (1035 MB/s)       -308,356   -9.22% 
 decode_3mib_reuse_buf   2,587,417 (1215 MB/s)   2,309,580 (1362 MB/s)       -277,837  -10.74% 
 decode_500b             417 (1199 MB/s)         401 (1246 MB/s)                  -16   -3.84% 
 decode_500b_reuse_buf   397 (1259 MB/s)         382 (1308 MB/s)                  -15   -3.78% 
 decode_50b              80 (650 MB/s)           77 (675 MB/s)                     -3   -3.75% 
 decode_50b_reuse_buf    55 (945 MB/s)           54 (962 MB/s)                     -1   -1.82% 
 encode_100b             104 (961 MB/s)          186 (537 MB/s)                    82   78.85% 
 encode_100b_reuse_buf   82 (1219 MB/s)          161 (621 MB/s)                    79   96.34% 
 encode_10mib            10,487,488 (999 MB/s)   17,835,534 (587 MB/s)      7,348,046   70.06% 
 encode_10mib_reuse_buf  6,663,982 (1573 MB/s)   14,078,296 (744 MB/s)      7,414,314  111.26% 
 encode_30mib            31,732,310 (991 MB/s)   54,274,642 (579 MB/s)     22,542,332   71.04% 
 encode_30mib_reuse_buf  19,926,666 (1578 MB/s)  44,353,283 (709 MB/s)     24,426,617  122.58% 
 encode_3b               36 (83 MB/s)            53 (56 MB/s)                      17   47.22% 
 encode_3b_reuse_buf     15 (200 MB/s)           30 (100 MB/s)                     15  100.00% 
 encode_3kib             1,942 (1581 MB/s)       4,269 (719 MB/s)               2,327  119.82% 
 encode_3kib_reuse_buf   1,914 (1605 MB/s)       4,200 (731 MB/s)               2,286  119.44% 
 encode_3mib             3,075,370 (1022 MB/s)   5,506,443 (571 MB/s)       2,431,073   79.05% 
 encode_3mib_reuse_buf   1,987,322 (1582 MB/s)   4,511,933 (697 MB/s)       2,524,611  127.04% 
 encode_500b             349 (1432 MB/s)         817 (611 MB/s)                   468  134.10% 
 encode_500b_reuse_buf   336 (1488 MB/s)         838 (596 MB/s)                   502  149.40% 
 encode_50b              67 (746 MB/s)           124 (403 MB/s)                    57   85.07% 
 encode_50b_reuse_buf    45 (1111 MB/s)          107 (467 MB/s)                    62  137.78%

I guess the remaining question is whether this crate has hard performance targets in the 10Gbps range.

from rust-base64.

marshallpierce commented on May 29, 2024 1

Here's another approach with a smaller performance hit:

 name                        control.bcmp ns/iter    variable.bcmp ns/iter   diff ns/iter  diff % 
 encode_100b                 90 (1111 MB/s)          114 (877 MB/s)                    24  26.67% 
 encode_100b_reuse_buf       72 (1388 MB/s)          91 (1098 MB/s)                    19  26.39% 
 encode_10mib                9,282,507 (1129 MB/s)   11,206,626 (935 MB/s)      1,924,119  20.73% 
 encode_10mib_reuse_buf      5,928,751 (1768 MB/s)   8,247,930 (1271 MB/s)      2,319,179  39.12% 
 encode_30mib                27,895,742 (1127 MB/s)  34,408,473 (914 MB/s)      6,512,731  23.35% 
 encode_30mib_reuse_buf      17,837,434 (1763 MB/s)  25,276,966 (1244 MB/s)     7,439,532  41.71% 
 encode_3b                   32 (93 MB/s)            38 (78 MB/s)                       6  18.75% 
 encode_3b_reuse_buf         13 (230 MB/s)           18 (166 MB/s)                      5  38.46% 
 encode_3kib                 1,781 (1724 MB/s)       2,309 (1330 MB/s)                528  29.65% 
 encode_3kib_reuse_buf       1,761 (1744 MB/s)       2,284 (1345 MB/s)                523  29.70% 
 encode_3kib_reuse_buf_mime  2,032 (1511 MB/s)       2,571 (1194 MB/s)                539  26.53% 
 encode_3mib                 2,539,998 (1238 MB/s)   3,192,098 (985 MB/s)         652,100  25.67% 
 encode_3mib_reuse_buf       1,767,981 (1779 MB/s)   2,362,400 (1331 MB/s)        594,419  33.62% 
 encode_500b                 321 (1557 MB/s)         403 (1240 MB/s)                   82  25.55% 
 encode_500b_reuse_buf       303 (1650 MB/s)         384 (1302 MB/s)                   81  26.73% 
 encode_500b_reuse_buf_mime  367 (1362 MB/s)         444 (1126 MB/s)                   77  20.98% 
 encode_50b                  61 (819 MB/s)           72 (694 MB/s)                     11  18.03% 
 encode_50b_reuse_buf        42 (1190 MB/s)          53 (943 MB/s)                     11  26.19%

This one is based off the #27. That PR introduces other usages of unsafe that apply only to line wrapping, but the restructuring of the encode path made it a little easier to work for this. It also included a reworked encoded_size() that is precise in all line wrap / padding combos, which was needed for this approach. Now, other than the aforementioned line wrapping, the use of unsafe is limited to accessing the contents of the output String as a Vec<u8>.

from rust-base64.

marshallpierce commented on May 29, 2024

It's hard to be too up in arms about performance when we just had a security issue. :(

I did see significant performance regressions without unsafe when I was working on that some months ago, but it's certainly worth another look to see if things have improved in the intervening Rust releases. Maybe sprinkling a few key asserts around will help the optimizer get it right.

from rust-base64.

briansmith commented on May 29, 2024

I wonder if further optimizations would be possible by limiting the customization of the character set such that A-Za-z0-9 are always fixed and only the non-digit, non-letter stuff could be customized. Then most table lookups could be avoided, probably?

from rust-base64.

briansmith commented on May 29, 2024

Expanding on my previous comment: A few crypto-specific base64 implementations I've seen recently (e.g. BoringSSL) switched to a constant-time implementation that doesn't use table lookups at all. I imagine eliminating all table lookups entirely could improve performance, but it would be impractical unless the customization were strictly limited.

from rust-base64.

marshallpierce commented on May 29, 2024

Actually, the table lookups (according to perf and friends) seem to be really cheap -- the slowdown here seems to be due to bounds checking when writing to the output slice. :/

from rust-base64.

marshallpierce commented on May 29, 2024

Side note -- writing into the output slice directly, rather than packaging bytes into a u64 and using byteorder to splat them out into the output slice, was significantly slower, because it involves more bounds checks.

from rust-base64.

briansmith commented on May 29, 2024

You might find this useful then: https://users.rust-lang.org/t/how-to-zip-two-slices-efficiently/2048. Note that the suggestions to use zip() won't work for this because the source and destination are different lengths, but some of the tricks about being explicit about lengths being the same or one being larger than another might work.

from rust-base64.

briansmith commented on May 29, 2024

Also, maybe some of the tricks used to implement str and String UTF-8 support might be useful here. Base64 is a simpler transformation in terms of length-of-input/length-of-output differences than UTF-8.

from rust-base64.

marshallpierce commented on May 29, 2024

line wrap is now also unsafe-free in the minimize-unsafe branch. The only unsafe left is the one that allows treating a String as a Vec<u8>. That one I think is defensible: it would not cause memory safety issues, only invalid UTF-8 byte sequences, if things were to go awry.

from rust-base64.

briansmith commented on May 29, 2024

@marshallpierce Internally, the str and String types do a bunch of unsafe stuff assuming that the text is valid UTF-8. For example, let's say the Vec<u8> ends with a value 0b11110???. Then the internals of str will assume there are at least 3 more bytes after it. And then it will interpret those (out-of-bounds) bytes assuming it is valid UTF-8. So I don't think it is clear that invalid UTF-8 is OK.

Regardless, I think it is worthwhile to try to do everything without unsafe unless there's a really strong argument for not doing so, which I don't see.

from rust-base64.

Ichoran commented on May 29, 2024

An aside from someone who has been intending to use this library:

There are plenty of cases (e.g. storing binary data in JSON) where a base64 parser can be a performance bottleneck. So I think near-maximum speed is an important goal; not more important than correctness, of course, but in principle if a core functionality like this cannot be made fast enough without unsafe, then the burden of demonstrating correctness should be moved from the compiler to tests. (It looks as though the "fast enough" strategies have already been found, though.)

It's easy to import instincts from other languages, "Oh, it's probably not the bottleneck", but I think this is less true in Rust because if performance truly wasn't the bottleneck anywhere in their code, people probably wouldn't be using Rust in the first place.

from rust-base64.

marshallpierce commented on May 29, 2024

Here are the options I see.

One approach would be to switch the lower level encoding logic to just String.push_str() for each output byte. In other words, use String as the collection type rather than a [u8] or Vec<u8>. I did not bother to benchmark this (though I am not optimistic about it) because it would be contrary to my goal of avoiding heap-ful types. Encoding to u8 slices enables heap free niceties like the Display implementation in display-wrapper, which will also easily lead to a heap free implementation of #20. I also want to expose encoding to a Vec<u8> (or maybe even a &mut [u8]) in the public API for people who don't care about the String-ness of the output, so for implementation sanity I don't want to push the String type further down into the implementation innards.

One way to continue to encode into a [u8] would be to encode into a slice (which could be to a stack-resident array from the caller) and then let the caller use str::from_utf8 after each batch of bytes is encoded and append the resulting str to a String. However, this has a pretty dreadful performance cost. Here's the performance loss just from doing utf8 validation on each batch of bytes (str::from_utf8 and throwing away the unwrapped result):

 name                        control.bcmp ns/iter    variable.bcmp ns/iter   diff ns/iter  diff % 
 encode_100b                 94 (1063 MB/s)          118 (847 MB/s)                    24  25.53% 
 encode_100b_reuse_buf       71 (1408 MB/s)          97 (1030 MB/s)                    26  36.62% 
 encode_10mib                9,376,605 (1118 MB/s)   11,770,334 (890 MB/s)      2,393,729  25.53% 
 encode_10mib_reuse_buf      6,300,363 (1664 MB/s)   8,673,367 (1208 MB/s)      2,373,004  37.66% 
 encode_30mib                28,628,693 (1098 MB/s)  35,876,681 (876 MB/s)      7,247,988  25.32% 
 encode_30mib_reuse_buf      19,935,389 (1577 MB/s)  26,503,729 (1186 MB/s)     6,568,340  32.95% 
 encode_3b                   40 (75 MB/s)            41 (73 MB/s)                       1   2.50% 
 encode_3b_reuse_buf         18 (166 MB/s)           23 (130 MB/s)                      5  27.78% 
 encode_3kib                 1,691 (1816 MB/s)       2,394 (1283 MB/s)                703  41.57% 
 encode_3kib_reuse_buf       1,668 (1841 MB/s)       2,377 (1292 MB/s)                709  42.51% 
 encode_3kib_reuse_buf_mime  2,097 (1464 MB/s)       2,805 (1095 MB/s)                708  33.76% 
 encode_3mib                 2,697,562 (1166 MB/s)   3,343,944 (940 MB/s)         646,382  23.96% 
 encode_3mib_reuse_buf       1,770,041 (1777 MB/s)   2,501,624 (1257 MB/s)        731,583  41.33% 
 encode_500b                 309 (1618 MB/s)         445 (1123 MB/s)                  136  44.01% 
 encode_500b_reuse_buf       286 (1748 MB/s)         426 (1173 MB/s)                  140  48.95% 
 encode_500b_reuse_buf_mime  374 (1336 MB/s)         508 (984 MB/s)                   134  35.83% 
 encode_50b                  65 (769 MB/s)           77 (649 MB/s)                     12  18.46% 
 encode_50b_reuse_buf        47 (1063 MB/s)          56 (892 MB/s)                      9  19.15%

That's not counting the cost of copying the bytes into the String, the additional bounds checking that would incur, or the overhead of switching to batch-by-batch encoding logic (which, judging by the Display wrapper which basically does this, would be another 10-20%). So, that's not great.

Another approach would be to post-hoc str::from_utf8 on the slice that's been written to and otherwise leaving the logic as is. This has a less ruinous performance cost on my machine:

 name                        control.bcmp ns/iter    variable.bcmp ns/iter   diff ns/iter  diff % 
 encode_100b                 94 (1063 MB/s)          102 (980 MB/s)                     8   8.51% 
 encode_100b_reuse_buf       71 (1408 MB/s)          80 (1250 MB/s)                     9  12.68% 
 encode_10mib                9,376,605 (1118 MB/s)   10,327,063 (1015 MB/s)       950,458  10.14% 
 encode_10mib_reuse_buf      6,300,363 (1664 MB/s)   7,206,090 (1455 MB/s)        905,727  14.38% 
 encode_30mib                28,628,693 (1098 MB/s)  32,278,805 (974 MB/s)      3,650,112  12.75% 
 encode_30mib_reuse_buf      19,935,389 (1577 MB/s)  22,757,454 (1382 MB/s)     2,822,065  14.16% 
 encode_3b                   40 (75 MB/s)            42 (71 MB/s)                       2   5.00% 
 encode_3b_reuse_buf         18 (166 MB/s)           23 (130 MB/s)                      5  27.78% 
 encode_3kib                 1,691 (1816 MB/s)       1,818 (1689 MB/s)                127   7.51% 
 encode_3kib_reuse_buf       1,668 (1841 MB/s)       1,821 (1686 MB/s)                153   9.17% 
 encode_3kib_reuse_buf_mime  2,097 (1464 MB/s)       2,232 (1376 MB/s)                135   6.44% 
 encode_3mib                 2,697,562 (1166 MB/s)   2,782,393 (1130 MB/s)         84,831   3.14% 
 encode_3mib_reuse_buf       1,770,041 (1777 MB/s)   1,933,529 (1626 MB/s)        163,488   9.24% 
 encode_500b                 309 (1618 MB/s)         329 (1519 MB/s)                   20   6.47% 
 encode_500b_reuse_buf       286 (1748 MB/s)         306 (1633 MB/s)                   20   6.99% 
 encode_500b_reuse_buf_mime  374 (1336 MB/s)         403 (1240 MB/s)                   29   7.75% 
 encode_50b                  65 (769 MB/s)           70 (714 MB/s)                      5   7.69% 
 encode_50b_reuse_buf        47 (1063 MB/s)          50 (1000 MB/s)                     3   6.38%

I could maybe groan and tolerate that if that wasn't pretty much a best-case scenario: I have a fast Xeon with tons of cache and memory bandwidth and a clever prefetcher. On a lesser system (which I don't have easy access to), it may be significantly worse. If anyone has interesting systems they want to try this on, let me know. Furthermore, I'm not really sure how valuable this runtime check is. We still have the unsafe .as_mut_vec(); we're just checking what we wrote into it after the fact. Towards @Ichoran's point about handing off responsibility to tests -- could we not achieve a satisfactory level of reliability if instead of adding a check that runs after every encode, we instead have tests that check random inputs with random encoding configs to make sure that they're valid UTF-8? I think we can pretty effectively draw a boundary around the code that does potentially incorrect things with UTF-8 and test the snot out of them. The logic is not so complex as to deflect human inspection, and the correctness criteria for the output is straightforward and easy to exhaustively verify.

from rust-base64.

sdroege commented on May 29, 2024

could we not achieve a satisfactory level of reliability if instead of adding a check that runs after every encode, we instead have tests that check random inputs with random encoding configs to make sure that they're valid UTF-8?

Sounds like setting up "cargo fuzz" for this crate, and having it run for a long time, would be useful. Regardless of how much unsafe'ness of the implementation can be prevented or not. See #21

from rust-base64.

marshallpierce commented on May 29, 2024

See #26. :)

from rust-base64.

alicemaz commented on May 29, 2024

fixed with #31

from rust-base64.

Stop using unsafe code about rust-base64 HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent