loupvaillant / monocypher Goto Github PK
View Code? Open in Web Editor NEWAn easy to use, easy to deploy crypto library
Home Page: https://monocypher.org
License: Other
An easy to use, easy to deploy crypto library
Home Page: https://monocypher.org
License: Other
Aloha!
The README.md states "There is a html version in doc/html/, that you can regenerate by executing the doc/man2html.sh script (this requires mandoc)."
the first part "There is a html version in doc/html/" is not really correct. The html version of the manual is not there. The html pages aren't that big so could we please add them instead of either (1) having users generate the pages, or (2) leave the repo (on Github) to go to the Monocypher to find the manual? Having the html files included would make the repo self contained manual-wise.
If not adding the pages, I would suggest changing the README text to state: "The html version of the manual can be regenerated by running the..." etc.
Aloha!
The monocypher.c is 1600+ is one gigantic file with all but the sha512 functionality. How about splitting the file inta few separate files? One for poly1305, chacha etc.
It would make reading the code a bit easier. And possibly to maintain.
Currently, the test suite runs with make test
. automake specifies make check
that does: "Run the test suite, if any."
Based on the principle of least surprise, it may be worth renaming or aliasing make test
to make check
.
I used to think RFC 7539 was over-engineered. I suspected their padding game were little more than some alignment game, which probably arisen from sub-par interfaces for the primitives. Interfaces that I was quite proud to generalise, if not simplify a bit.
Hem.
Turned out I was misguided and ignorant. First, I believed serialization and de-serialization took a negligible amount of time, compared to the heavy duty work. Nope. Nope. Nope. Nope. Nope. Nope. Nope. The most impresive there was Poly1305, which became 5 times faster than it was. (De-)serialisation was a full blown bottleneck.
Second, I didn't know that the best way to optimise many such primitives, despite their structure, was to perform the good old "unroll then vectorise" trick. For Chacha20, processing 8 blocks at a time with AVX-2 instructions is four times faster than using AVX instructions for a single round. Despite Chacha20 being designed for single-round vectorisation. I wonder how fast full blown bit slicing would be… Poly1305 enjoys similar effects (Libsodium is 2.3 times faster than Monocypher).
Thus, the details of the AEAD construction have a significant impact on the performance of small and medium messages (I'm thinking 1-8 kilobytes). I've measured something between 10-30% slow-down. Not that we want Monocypher to reach ultimate performance. This would cost way too much code not to mention the portability nightmare, of which #85 gives a taste. But we do want to scale as needed. If some user start to notice Monocypher's suboptimal performance is hurting their power bill (they just overthrew Google or something), it would be nice to get that performance without breaking compatibility.
In this light, I have to admit RFC 7539 is actually pretty well thought out. The only real qualm I have with it is it's half assed nonce, which isn't long enough to support randomness and cuts the counter to a worryingly short length. It's padding scheme however is pretty much ideal:
Wasting a whole Chacha block (64 bytes) to compute the authentication key instead of just 32-bytes is actually faster than keeping to 32 bytes. First, it tends to hit the fast path for de-serialization in typical implementations. Second, it facilitates streaming block by block, where the size of the blocks are powers of 2: by starting at a block boundary, we can ensure we never hit the slow serialisation path, and we take full advantage of the "unroll-then-vectorise" scheme that is fond of multiple of 8 blocks, without odd remainders at the end.
Assuming 4k blocks (a size that is dear to me personally because I'd like to implement a file system like encrypted archive), this alone speeds up the fast version of Monocypher by more than 15%. Smaller blocks (2k, 1k) are even more sensitive to this effect, and such sizes might be relevant to something like a chat application (where you would pad messages to multiples of 1kiB to reduce information leak).
In terms of wasted computation, padding Poly1305 wastes at most 1 block. Considering this enables a 2.3 speedup on message boundaries when doing the same as Chacha20, and the fact that the encrypted buffers are generally distinct from the additional data, aligning the block at the beginning of the message makes sense.
Of course, the padding means we now have to authenticate the length of both the additional data and the encrypted message. RFC 7539 have us covered there, using exactly 1 additional block to authenticate both lengths. 64 bits each should be plenty. Oh, and it removes the minor worry I had about authenticating the length of the additional data. Now it's all clean, even if that length is neither fixed nor self contained. That should simplify the manual a bit.
So I'm thinking of breaking compatibility. Again. This should fit the current API, we'll just have incompatible encryption and mac. The upside is, if some one wants to optimise this up to the heavens later, they can.
While we're at it, how about adding raw Chacha20 and IETF initialisation? The former can help with performance in some cases that don't require a random nonce, and the latter has test vectors. I still don't endorse 96-bit nonces, but at this point we might as well add it for completion's sake. I'm not sure yet. Maybe we could limit those alternate initialisation schemes to the incremental interface, and have the monolithic interface present a clear default?
People are making packages for Monocypher, e.g. one in AUR and the one for Void Linux in progress, which is interesting. However, Monocypher decidedly does currently not really work as a package.
monocypher.c
into /usr/include
, which is questionable at best. However, Monocypher cannot be built as a library. Maybe doing so is overkill or unnecessarily complex in terms of encumberance of the build path. It is already currently broken with BSD make and requires GNU make. Dynamic libraries should probably be discouraged as Monocypher intends to be fairly static in the first place..Nm
inserts the program name and formats it as expected, as opposed to manual repetition and formatting on old-style man/roff pages), which allows searching by macro keys like apropos Fa=ad
to find all man pages referencing a function argument ad
. Sensible mandoc guidelines can be found on OpenBSD mdoc(7). Otherwise, ronn exists.https://github.com/google/wycheproof/blob/master/testvectors/eddsa_test.json#L94
thanks to thaidn_ for sharing this
OpenBSD 6.1 is the current version of OpenBSD. It ships gcc 4.2.1. While OpenBSD is going to use clang for 6.2, this only affects i386 and amd64 targets for now.
In tests/self.c
, there are constructions that attempt to build a const value by involving another const value. gcc on OpenBSD 6.1 rejects those, saying "error: initializer element is not constant". Affected lines in tests/self.c
are 295, 296, 334, 376, 377, 416, 417, 453.
This currently breaks the build. I would normally file a pull request, but in this case, I'm not sure how you would want it to be handled, I could imagine simply hardcoding the values, hardcoding the values and extending the comments (usually pushing them past the 80 column boundary), or using macros.
(After fixing this, the build on OpenBSD 6.1 goes through without issues. There's a warning that the ALIGN
macro is redefined in tests/ed25519-donna/ed25519-donna-portable.h
, conflicting with the one in /usr/include/sys/param.h
, but the code in the donna implementation is out of your control.)
I would like to suggest a slight change of wording. The manual for Poly1305, user error.
https://monocypher.org/manual/crypto_poly1305.html#User_error
(Wouldn't it be great if that link went to the repo instead? ;-)
First thing:
I don't think "User error" is a great heading. "User considerations", "Important details" etc. The text does not describe errors users do, but what the user must consider to use the function in a secure manner.
Secondly:
The third paragraph states: "Random numbers cannot be used either..." In fact this is not true. In order to follow the security of Poly1305. The DJB paper (https://cr.yp.to/mac/poly1305-20050329.pdf), chapter 3 states: "Any protocol that uses Poly1305-AES must ensure unpredictability of the secret key (k,r). This section assumes that secret keys are chosen from the uniform distribution: i.e., probability 2−234 for each of the 2234 possible pairs (k, r)." Stating that random numbers can not be used open open for misuse.
The text in the manual continues to describe one way to generate a key (using the initial keystream generated by the stream cipher). That keystream is a stream of random numbers. The important second requirement is that the key must be kept secret. That secrecy is achieved with the scheme presented. And this corresponds to what DJB states in chapter 3: "Any protocol that uses Poly1305-AES must ensure that the secret key is, in fact, kept secret."
I would therefore suggest to:
This has to do with the way the signature is computed internally: part of the signature is written to before we last read the message. If they overlap, we would mess up the last message read, leading to a wrong hash, and a wrong signature.
Regardless of whether the user was asking for trouble or not, this restriction is avoidable, and therefore should be lifted. The test suite should also be augmented accordingly.
The rest of the library should be checked for these kind of restrictions while we're at it.
This surfaced as failure when testing on OpenBSD. If the test has been passing until now, it was by coincidence.
https://github.com/LoupVaillant/Monocypher/blob/master/tests/test.c#L621-L627
input
is INPUT_SIZE
bytes long, i.e. 160 bytes. SHA_512_BLOCK_SIZE
is 128 bytes. The routine thus reads 128 bytes from input + 64
. This overreads input
at offset 32.
I suspect you meant MESSAGE_SIZE
instead of SHA_512_BLOCK_SIZE
, but then it'd make no sense to do the multiplication by two in #define INPUT_SIZE (MESSAGE_SIZE + (2 * 64))
. Due to this uncertainty, I'm reporting this as an issue, rather than creating a pull request.
Hello !
I proposed a package for Void linux, see my Pull Request #7407.
Regards.
I get the following warnings and errors when compiling using Microsoft Visual C/C++ 2017.
monocypher.c(64): warning C4244: 'function': conversion from 'u64' to 'u32', possible loss of data
monocypher.c(240): error C4146: unary minus operator applied to unsigned type, result still unsigned
monocypher.c(329): warning C4244: '=': conversion from 'const u64' to 'uint32_t', possible loss of data
monocypher.c(314): warning C4244: 'initializing': conversion from 'const u64' to 'const u32', possible loss of data
monocypher.c(381): error C4146: unary minus operator applied to unsigned type, result still unsigned
monocypher.c(432): warning C4244: 'function': conversion from 'const u64' to 'u32', possible loss of data
monocypher.c(433): warning C4244: 'function': conversion from 'const u64' to 'u32', possible loss of data
monocypher.c(434): warning C4244: 'function': conversion from 'const u64' to 'u32', possible loss of data
monocypher.c(435): warning C4244: 'function': conversion from 'const u64' to 'u32', possible loss of data
monocypher.c(587): error C4146: unary minus operator applied to unsigned type, result still unsigned
monocypher.c(948): warning C4267: 'function': conversion from 'size_t' to 'u32', possible loss of data
monocypher.c(954): warning C4267: 'initializing': conversion from 'size_t' to 'u32', possible loss of data
monocypher.c(955): warning C4267: 'initializing': conversion from 'size_t' to 'u32', possible loss of data
monocypher.c(958): warning C4267: 'initializing': conversion from 'size_t' to 'u32', possible loss of data
monocypher.c(1052): warning C4244: '=': conversion from 'i64' to 'i32', possible loss of data
monocypher.c(1062): warning C4244: '=': conversion from 'i64' to 'i32', possible loss of data
monocypher.c(1118): warning C4244: '=': conversion from 'i64' to 'i32', possible loss of data
monocypher.c(1152): warning C4244: '=': conversion from 'i64' to 'i32', possible loss of data
monocypher.c(1714): error C4146: unary minus operator applied to unsigned type, result still unsigned
monocypher.c(1758): error C4146: unary minus operator applied to unsigned type, result still unsigned
Due to the errors compilation fails with MSVC. Perhaps you might want to check this for a future release.
Aloha!
Did a "make test" and clang/llvm emitted a number of warnings on test.c:
tests/test.c:237:12: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
d = (d < 0 ? -d : d);
~ ^ ~
tests/test.c:254:12: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
d = (d < 0 ? -d : d);
~ ^ ~
tests/test.c:274:25: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 output_chunk[input_size];
^~~~~~~~~~
tests/test.c:275:25: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 output_whole[input_size];
^~~~~~~~~~
tests/test.c:277:25: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 input [input_size]; p_random(input, input_size);
^~~~~~~~~~
tests/test.c:308:24: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 output_part[stream_size ];
^~~~~~~~~~~
tests/test.c:309:24: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 output_all [stream_size ];
^~~~~~~~~~~
tests/test.c:310:24: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 output_more[stream_size * 2];
^~~~~~~~~~~~~~~
tests/test.c:357:18: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 input[input_size]; p_random(input, input_size);
^~~~~~~~~~
tests/test.c:398:18: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 input[input_size]; p_random(input, input_size);
^~~~~~~~~~
tests/test.c:436:18: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 input[input_size]; p_random(input, input_size);
^~~~~~~~~~
tests/test.c:466:16: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
u8 message[message_size]; p_random(message, 32);
^~~~~~~~~~~~
This is with gcc aliased to clang Apple LLVM version 9.0.0 (clang-900.0.37).
Have you seen these before?
There appears to be a mistake where the manual refers to crypto_lock_key()
which doesn't exist in the header/source, and vice versa crypto_key_exchange()
exists but is not reflected in the manual.
The various makefiles seem to only work with GNU make. Which is a pity, considering how simple they now are. They should be compatible with other make utilities.
Tested with GCC 5.4.0, -O3 optimisations. Upon noticing suspect timings for some usages of crypto_memcmp()
, I tried to look at the assembly output. I got this hard to review mess:
.LCOLDB18:
.text
.LHOTB18:
.p2align 4,,15
.globl crypto_memcmp
.type crypto_memcmp, @function
crypto_memcmp:
.LFB11:
.cfi_startproc
testq %rdx, %rdx
pushq %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
pushq %rbp
.cfi_def_cfa_offset 24
.cfi_offset 6, -24
pushq %rbx
.cfi_def_cfa_offset 32
.cfi_offset 3, -32
je .L174
movq %rdi, %r8
negq %r8
andl $15, %r8d
cmpq %rdx, %r8
cmova %rdx, %r8
cmpq $17, %rdx
ja .L197
movq %rdx, %r8
.L164:
movzbl (%rdi), %ecx
xorb (%rsi), %cl
cmpq $1, %r8
movzbl %cl, %eax
je .L177
movzbl 1(%rdi), %eax
xorb 1(%rsi), %al
orl %eax, %ecx
cmpq $2, %r8
movzbl %cl, %eax
je .L178
movzbl 2(%rdi), %eax
xorb 2(%rsi), %al
orl %eax, %ecx
cmpq $3, %r8
movzbl %cl, %eax
je .L179
movzbl 3(%rdi), %eax
xorb 3(%rsi), %al
orl %eax, %ecx
cmpq $4, %r8
movzbl %cl, %eax
je .L180
movzbl 4(%rdi), %eax
xorb 4(%rsi), %al
orl %eax, %ecx
cmpq $5, %r8
movzbl %cl, %eax
je .L181
movzbl 5(%rdi), %eax
xorb 5(%rsi), %al
orl %eax, %ecx
cmpq $6, %r8
movzbl %cl, %eax
je .L182
movzbl 6(%rdi), %eax
xorb 6(%rsi), %al
orl %eax, %ecx
cmpq $7, %r8
movzbl %cl, %eax
je .L183
movzbl 7(%rdi), %eax
xorb 7(%rsi), %al
orl %eax, %ecx
cmpq $8, %r8
movzbl %cl, %eax
je .L184
movzbl 8(%rdi), %eax
xorb 8(%rsi), %al
orl %eax, %ecx
cmpq $9, %r8
movzbl %cl, %eax
je .L185
movzbl 9(%rdi), %eax
xorb 9(%rsi), %al
orl %eax, %ecx
cmpq $10, %r8
movzbl %cl, %eax
je .L186
movzbl 10(%rdi), %eax
xorb 10(%rsi), %al
orl %eax, %ecx
cmpq $11, %r8
movzbl %cl, %eax
je .L187
movzbl 11(%rdi), %eax
xorb 11(%rsi), %al
orl %eax, %ecx
cmpq $12, %r8
movzbl %cl, %eax
je .L188
movzbl 12(%rdi), %eax
xorb 12(%rsi), %al
orl %eax, %ecx
cmpq $13, %r8
movzbl %cl, %eax
je .L189
movzbl 13(%rdi), %eax
xorb 13(%rsi), %al
orl %eax, %ecx
cmpq $14, %r8
movzbl %cl, %eax
je .L190
movzbl 14(%rdi), %eax
xorb 14(%rsi), %al
orl %eax, %ecx
cmpq $15, %r8
movzbl %cl, %eax
je .L191
movzbl 15(%rdi), %eax
xorb 15(%rsi), %al
orl %eax, %ecx
cmpq $17, %r8
movzbl %cl, %eax
jne .L192
movzbl 16(%rsi), %eax
xorb 16(%rdi), %al
orl %eax, %ecx
movzbl %cl, %eax
movl $17, %ecx
.L166:
cmpq %r8, %rdx
je .L167
.L165:
movq %rdx, %r11
leaq -1(%rdx), %r10
subq %r8, %r11
leaq -16(%r11), %r9
subq %r8, %r10
shrq $4, %r9
addq $1, %r9
movq %r9, %rbx
salq $4, %rbx
cmpq $14, %r10
jbe .L168
pxor %xmm1, %xmm1
leaq (%rdi,%r8), %r12
xorl %r10d, %r10d
pxor %xmm5, %xmm5
addq %rsi, %r8
pxor %xmm2, %xmm2
xorl %ebp, %ebp
.L169:
movdqu (%r8,%r10), %xmm0
addq $1, %rbp
pxor (%r12,%r10), %xmm0
addq $16, %r10
cmpq %rbp, %r9
movdqa %xmm0, %xmm3
punpckhbw %xmm5, %xmm0
punpcklbw %xmm5, %xmm3
movdqa %xmm3, %xmm4
punpckhwd %xmm2, %xmm3
punpcklwd %xmm2, %xmm4
por %xmm4, %xmm3
movdqa %xmm0, %xmm4
punpckhwd %xmm2, %xmm0
punpcklwd %xmm2, %xmm4
por %xmm4, %xmm3
por %xmm3, %xmm0
por %xmm0, %xmm1
ja .L169
movdqa %xmm1, %xmm0
addq %rbx, %rcx
psrldq $8, %xmm0
por %xmm1, %xmm0
movdqa %xmm0, %xmm1
psrldq $4, %xmm1
por %xmm1, %xmm0
movd %xmm0, %r8d
orl %r8d, %eax
cmpq %r11, %rbx
je .L167
.L168:
movzbl (%rsi,%rcx), %r8d
xorb (%rdi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 1(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 1(%rdi,%rcx), %r8d
xorb 1(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 2(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 2(%rdi,%rcx), %r8d
xorb 2(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 3(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 3(%rdi,%rcx), %r8d
xorb 3(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 4(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 4(%rdi,%rcx), %r8d
xorb 4(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 5(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 5(%rdi,%rcx), %r8d
xorb 5(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 6(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 6(%rdi,%rcx), %r8d
xorb 6(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 7(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 7(%rdi,%rcx), %r8d
xorb 7(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 8(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 8(%rdi,%rcx), %r8d
xorb 8(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 9(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 9(%rdi,%rcx), %r8d
xorb 9(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 10(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 10(%rdi,%rcx), %r8d
xorb 10(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 11(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 11(%rdi,%rcx), %r8d
xorb 11(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 12(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 12(%rdi,%rcx), %r8d
xorb 12(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 13(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 13(%rdi,%rcx), %r8d
xorb 13(%rsi,%rcx), %r8b
movzbl %r8b, %r8d
orl %r8d, %eax
leaq 14(%rcx), %r8
cmpq %r8, %rdx
jbe .L167
movzbl 14(%rsi,%rcx), %edx
xorb 14(%rdi,%rcx), %dl
movzbl %dl, %edx
orl %edx, %eax
.L167:
subl $1, %eax
shrl $8, %eax
andl $1, %eax
subl $1, %eax
.L163:
popq %rbx
.cfi_remember_state
.cfi_def_cfa_offset 24
popq %rbp
.cfi_def_cfa_offset 16
popq %r12
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L197:
.cfi_restore_state
testq %r8, %r8
jne .L164
xorl %ecx, %ecx
xorl %eax, %eax
jmp .L165
.p2align 4,,10
.p2align 3
.L174:
xorl %eax, %eax
jmp .L163
.p2align 4,,10
.p2align 3
.L183:
movl $7, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L184:
movl $8, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L185:
movl $9, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L186:
movl $10, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L187:
movl $11, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L188:
movl $12, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L189:
movl $13, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L190:
movl $14, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L179:
movl $3, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L180:
movl $4, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L181:
movl $5, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L182:
movl $6, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L177:
movl $1, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L178:
movl $2, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L191:
movl $15, %ecx
jmp .L166
.p2align 4,,10
.p2align 3
.L192:
movl $16, %ecx
jmp .L166
.cfi_endproc
crypto_zerocmp()
is just as gigantic:
.LHOTB19:
.p2align 4,,15
.globl crypto_zerocmp
.type crypto_zerocmp, @function
crypto_zerocmp:
.LFB12:
.cfi_startproc
testq %rsi, %rsi
je .L210
movq %rdi, %rdx
movq %rsi, %rcx
negq %rdx
andl $15, %edx
cmpq %rsi, %rdx
cmova %rsi, %rdx
cmpq $19, %rsi
ja .L217
.L200:
xorl %edx, %edx
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L202:
movzbl (%rdi,%rdx), %r8d
addq $1, %rdx
orl %r8d, %eax
cmpq %rcx, %rdx
jne .L202
cmpq %rcx, %rsi
movq %rcx, %rdx
je .L203
.L201:
movq %rsi, %r10
leaq -1(%rsi), %r9
subq %rdx, %r10
leaq -16(%r10), %r8
subq %rdx, %r9
shrq $4, %r8
addq $1, %r8
movq %r8, %r11
salq $4, %r11
cmpq $14, %r9
jbe .L204
pxor %xmm1, %xmm1
addq %rdi, %rdx
xorl %r9d, %r9d
pxor %xmm5, %xmm5
pxor %xmm2, %xmm2
.L205:
movdqa (%rdx), %xmm0
addq $1, %r9
addq $16, %rdx
cmpq %r9, %r8
movdqa %xmm0, %xmm3
punpckhbw %xmm5, %xmm0
punpcklbw %xmm5, %xmm3
movdqa %xmm3, %xmm4
punpckhwd %xmm2, %xmm3
punpcklwd %xmm2, %xmm4
por %xmm4, %xmm3
movdqa %xmm0, %xmm4
punpckhwd %xmm2, %xmm0
punpcklwd %xmm2, %xmm4
por %xmm4, %xmm3
por %xmm3, %xmm0
por %xmm0, %xmm1
ja .L205
movdqa %xmm1, %xmm0
addq %r11, %rcx
psrldq $8, %xmm0
por %xmm1, %xmm0
movdqa %xmm0, %xmm1
psrldq $4, %xmm1
por %xmm1, %xmm0
movd %xmm0, %edx
orl %edx, %eax
cmpq %r11, %r10
je .L203
.L204:
movzbl (%rdi,%rcx), %edx
orl %edx, %eax
leaq 1(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 1(%rdi,%rcx), %edx
orl %edx, %eax
leaq 2(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 2(%rdi,%rcx), %edx
orl %edx, %eax
leaq 3(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 3(%rdi,%rcx), %edx
orl %edx, %eax
leaq 4(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 4(%rdi,%rcx), %edx
orl %edx, %eax
leaq 5(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 5(%rdi,%rcx), %edx
orl %edx, %eax
leaq 6(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 6(%rdi,%rcx), %edx
orl %edx, %eax
leaq 7(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 7(%rdi,%rcx), %edx
orl %edx, %eax
leaq 8(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 8(%rdi,%rcx), %edx
orl %edx, %eax
leaq 9(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 9(%rdi,%rcx), %edx
orl %edx, %eax
leaq 10(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 10(%rdi,%rcx), %edx
orl %edx, %eax
leaq 11(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 11(%rdi,%rcx), %edx
orl %edx, %eax
leaq 12(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 12(%rdi,%rcx), %edx
orl %edx, %eax
leaq 13(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 13(%rdi,%rcx), %edx
orl %edx, %eax
leaq 14(%rcx), %rdx
cmpq %rdx, %rsi
jbe .L203
movzbl 14(%rdi,%rcx), %edx
orl %edx, %eax
.L203:
subl $1, %eax
shrl $8, %eax
andl $1, %eax
subl $1, %eax
ret
.p2align 4,,10
.p2align 3
.L217:
testq %rdx, %rdx
jne .L218
xorl %ecx, %ecx
xorl %eax, %eax
jmp .L201
.p2align 4,,10
.p2align 3
.L210:
xorl %eax, %eax
ret
.L218:
movq %rdx, %rcx
jmp .L200
.cfi_endproc
.LFE12:
.size crypto_zerocmp, .-crypto_zerocmp
.section .text.unlikely
Are those really constant time?
I would recommend https://directory.fsf.org/wiki/License:ISC or BSD 3-clause. I think that using a standard free software license will encourage greater adoption of this excellent library!
The current license seems to be some kind of home brew public domain disclaimer. I believe there are some problems with such a license and it doesn't correctly/legally express what you're aiming for with releasing this code http://www.linuxjournal.com/article/6225.
(This is really for the monocypher home page, but github issues are nice.)
I would like to suggest to add date information to items on the ChangeLog page. There is information that there are a 2.0.0 version, but no info when it was released. Since there is neither a News-page, a potential user won't be able to figure out how old a release is. Is the project really active or is it in fact stale and unsupported? It is hard to know without date information for releases. How about at least add a News-section or release info section at the top of the home page? Makes it easier to promoto Monocypher too.
The Github repo of course shows a lot of info about activity. And since the first three search items on Google for "Monocypher" or "monocypher" is to Github, this is where people will end up. But still I think the home page would be improved with date info. (The actual home page does not appear on the first page of search results anymore.)
The README reads:
Do not use Monocypher without running the self contained tests at least once.
Which is solid advice. However, I'm concerned about the scenario where you're the developer of an application unable to control where your code is actually going to run (e.g. you have some code, someone runs it on MIPS64 and Monocypher starts breaking; however, the user only sees that the application "doesn't work" because the crypto is broken).
Thoughts on including tests/self.c
into Monocypher itself or at least a mention in the manual to cat src/monocypher.c tests/self.c > amalgamation.c
and renaming main()
? It has to be optional and not called automatically in case of things like embedded platforms, where developers can check the platform itself but running all the tests would be (1) rather time consuming, or (2) impossible with its stack size.
It looks to me that the shared_secret array shall be zeroed before returning from the function.
I am not familiar with HChaCha20 so what is done maybe OK in some context, but if the key is to be used as a session secret key you shall hash (Blake2B or SHA512) the x25519 secret with the 2 public keys
You should remove "static" from "sv": https://github.com/LoupVaillant/Monocypher/blob/master/src/sha512.c#L4
It should explain that new dist/
business, and the fact that end users should fetch the latest tarball instead. (Note: there is no such tarball just yet, just older releases.)
After writing PR #44, I noticed that style.css
is installed to $PREFIX/share/man
. That is very likely the wrong place for it.
I see two options:
style.css
, but that would prevent it from being included in packages, where users may want to$PREFIX/share/doc/monocypher
and make a note in README.md
; which will likely get packaged there as well.The latest bug fix took a huge toll on EdDSA performance. We should try and re-introduce the optimisation it reverted, this time without the bug.
Don't want to lock people in, do we?
Adding an issue for this so it doesn't get forgotten
Here: http://monocypher.org
No HTTPS for now, I'll set that up shortly.
website is down atm so i can't check it, but IIRC the wget/tar listing on the front page refers to 1.0.1 still, and there's no 1.1.0 on github
I have pushed the new AEAD interface. Hopefully it will be really stable this time. (I believe it will be because it is now compatible with Libsodium's crypto_aead_xchacha20poly1305_ietf_encrypt()
, to the point where I was able to make test vectors.)
Since this change breaks everything, I took the opportunity to tweak the incremental interface. The subsequent compile time errors may help users figure out why their data isn't decrypting itself any more. Or maybe it will just piss them off. I'm not sure.
I'm thinking of renaming crypto_aead_lock()
into crypto_lock_aead()
(and crypto_aead_unlock
into crypto_unlock_aead
). This would have 2 effects:
crypto_lock
or crypto_unlock
. This may or may not matter. I just like being consistent, and this was a tiny inconsistency.@culex, @mikejsavage, @secworks, @konovod, what do you think?
thanks for your great work。
To make a small suggestion, add the following code to the head and tail of the h file:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
Monocypher has two significant bottlenecks at the moment: Chacha20, and Argon2i. This makes libsodium 3.6 times as fast on authenticated encryption, and 57% faster at password hashing. The former is such a huge gap that many will take it as a deal breaker. The latter is not as brutal, but its incentive structure means performance directly impacts security.
So I've started an experimental speed branch to try and go beyond purely portable C code. So far, I've integrated the vectorised implementation of Chacha20 from Romain Dolebeau. The question is, how should we handle this code?
Right now, I (very crudely) rely on the __AVX2__
pre-processor definition, which on GCC is activated whenever the architecture supports AVX-2 instructions (modern Intel and AMD processors do). Obviously, this is not enough: part of the code is actually AVX compatible (only use 128-bit registers), and could be activated separately on AVX-2, AVX, MMX, and perhaps others I have yet to be aware of.
Still, those pre-processor options may not be exactly what we want. This is non-portable code we're talking about. Is it good practice to rely on those prep-processor definitions? If so, should we enable optimisations by default? If so, should we provide a way to disable it even for users who use -march=native
?
My inclination would be to activate optimisations by default, so that users who compile non-portably (either enabling AVX or such, or by using -march=native
) could benefit from the optimisations without thinking about it. Rationale being, if you compile non-portably, you're not portable anyway, so you might as well use those options to the fullest. I just fear unanticipated problems.
Oh, and I'd rather not have the user "initialise" the library like Libsodium does.
@culex , @mikejsavage, @secworks, @anywone, what do you think?
Reported by /u/knotdjb on reddit.
Timing tests for constant time comparison fail on some machines. To reproduce, just run:
$ make self
$ ./self
It might or might not fail.
We could make the threshold more tolerant, or take several measurements (the same way we do for the speed benchmark), or be a little cleverer, and compare against a variable time comparison.
CC itself recommends not to use CC0 for software.. Please reconsider. This is similar like Public Domain not being available in many juristictions (Europe). MIT/BSD or GPL/EUPL are better choices depending on your needs.
Throughout the Monocypher Manual, recommendations saying to use the operating system's RNG can be found. That is in and of itself probably good advice, but if your goal is being "easy to deploy", just stating the following in the manual may not be enough.
The easiest (and recommended) way to generate this nonce is to use your OS's random number generator (/dev/urandom on UNIX systems). Don't worry about accidental collisions, the nonce is big enough to make them virtually impossible.
Using /dev/urandom
is actually surprisingly difficult:
/dev/urandom
may not necessarily be a CSPRNG on the OS the code is being run on.The point is that doing random numbers is difficult, just like the cryptographic primitives provided in Monocypher are. Providing some way to wrap the RNGs of the most relevant operating systems (Windows, macOS, Linux, Solaris, FreeBSD, OpenBSD, NetBSD) or at least some further reading in the documentation may be a good idea.
Step to reproduce: just run the latest tests:
$ make test
I'm not done yet, but here are my notes so far. I hope this is readable!
intro.html
crypto_aead_lock.html
crypto_aead_unlock.html
crypto_lock.html
crypto_unlock.html
crypto_argon2i.html
"It can be used to encrypt private keys or password databases." - "it" is the resulting key?
"malloc() should yield a suitable chunk of memory." - malloc will (not should) return 8 byte aligned addresses on any normal platform. Might be worth explicitly mentioning that.
"The version provided by Monocypher has no threading support, so the degree of parallelism is limited to 1. This is considered good enough for most purposes." - I don't know why you mention this
"This most likely has no practical application but is exposed for the sake of completeness." - TBH I would not expose it in that case
"Use crypto_verify16(3monocypher), crypto_verify32(3monocypher) or crypto_verify64(3monocypher) to compare password hashes to prevent timing attacks, which are feasible against the weakest passwords." - maybe drop the bit after the comma
"The hardness of the computation can be chosen thus:" and the two bullet points are oddly phrased. How about:
To select the nb_blocks and nb_iterations parameters, you should first
decide how long the computation should take. For user authentication,
somewhere between half a second (convenient) and several seconds
(paranoid) is a good starting point, and you should use as much memory
as you can spare.
Since parameter selection depends on your hardware, you will need to
use some trial and error to find the ideal settings. Three iterations
and 100k blocks (one hundred megabytes of memory) is a good starting
point. Adjust nb_blocks first. If using all available memory is not
slow enough, increase nb_iterations.
crypto_blake2b.html
crypto_blake2b_final.html
crypto_blake2b_general.html
crypto_blake2b_general_init.html
crypto_blake2b_init.html
crypto_blake2b_update.html
"It can authenticate messages with a naive approach." - Naive approach = just calling the functions? Is it still a naive approach if it works and is the intended usage?
"The output hash. Must have at least hash_size free bytes." - You don't normally say "Must have at least x free bytes".
"Use crypto_verify16(3monocypher), crypto_verify32(3monocypher), or crypto_verify64(3monocypher) to compare MACs created this way." - is it ok to use memcmp if you didn't use a key?
"Message to hash. May overlap with the hash argument where present." - What do you mean by "where present"? The hash and message are always present
"the length of the message in bytes." - should start with a capital T for consistency (I went back and looked at the argon2i page, you stop using capitals half down there too. Presumably this comes up elsewhere too)
"The direct interface has 2 functions" - I think it's considered good style to write two etc for small numbers. Doesn't really matter though
"crypto_blake2b(), provided for convenience (calling it is the same as calling crypto_blake2b_general() with a null key and a 64-byte hash)." - how about:
The direct interface has two functions, crypto_blake2b() and
crypto_blake2b_general(). crypto_blake2b() is provided for convenience,
and is identical to calling crypto_blake2b_general() with no key and a
64 bit hash.
crypto_chacha20_H.html
crypto_chacha20_encrypt.html
crypto_chacha20_init.html
crypto_chacha20_set_ctr.html
crypto_chacha20_stream.html
crypto_chacha20_x_init.html
crypto_check.html
crypto_sign.html
crypto_sign_public_key.html
crypto_check_final.html
crypto_check_init.html
crypto_check_update.html
crypto_sign_final.html
crypto_sign_init_first_pass.html
crypto_sign_init_second_pass.html
crypto_sign_update.html
crypto_key_exchange.html
crypto_x25519.html
crypto_x25519_public_key.html
crypto_lock_auth.html
crypto_lock_encrypt.html
crypto_lock_final.html
crypto_lock_init.html
crypto_lock_update.html
crypto_unlock_final.html
crypto_unlock_update.html
crypto_memcmp.html
crypto_zerocmp.html
crypto_poly1305.html
crypto_poly1305_final.html
crypto_poly1305_init.html
crypto_poly1305_update.html
crypto_verify16.html
crypto_verify32.html
crypto_verify64.html
"Standard comparison functions tend to exit as soo as they find a difference" - "as soo" -> "as soon"
how about:
You will often need to compare secrets, or values derived from secrets.
Standard comparison functions (like memcmp) tend to exit when they find
the first difference, leaking information through timing differences.
"provide comparison functions whose timing is independent from the content of their input." - maybe: "provide comparison functions that run in constant time"
crypto_wipe.html
"Secrets and values derived from them should stay in memory for the shortest possible amount of time..." - how about:
crypto_wipe() securely erases sensitive data in memory.
Sensitive data (such as cryptographic keys or secret plaintexts) should
be erased from memory as early as possible, to minimise the window in
which it can be leaked.
Size argument should maybe be called secret_size.
Signature verification accepts all-zero signatures as valid, even with a genuine public key.
This has become glaringly apparent when shuffling the header for ease of use by end users: some functions are either "missing" or have the wrong name.
crypto_lock_init()
and crypto_lock_auth()
, and crypto_lock_ctx
do not have any unlock
counterpart (they are used for both locking and unlocking). Also, crypto_lock_auth()
is a general authentication function, that can authenticate additional data and the rest. A users writing (or reviewing) code using those functions might think the following:
crypto_lock_ctx ctx; // wrong type! use crypto_unlock_ctx;
crypto_lock_init(ctx, key, nonce); // wrong init!
crypto_lock_auth(ctx, aead, aead_size); // not AEAD specific, still not unlocking!
crypto_unlock_update(ctx, msg, msg_size); // OK
if (crypto_unlock_final(ctx, mac)) { abort(); } // OK
Should we add the following aliases?
typedef crypto_unlock_ctx crypto_lock_ctx
#define crypto_unlock_init crypto_lock_init
#define crypto_lock_aead crypto_lock_auth
#define crypto_unlock_aead crypto_lock_auth
We have crypto_x25519()
, crypto_key_exchange()
, and crypto_x25519_public_key()
. End users are meant to use the last two, which then display inconsistent naming. Should we add the following alias?
#define crypto_key_exchange_public_key crypto_x25519_public_key
crypto_sign_update()
is used for the first pass and second pass. Should we add the following aliases?
#define crypto_sign_update_first_pass crypto_sign_update
#define crypto_sign_update_second_pass crypto_sign_update
I'm a bit torn. On the one hand, I want to keep the minimum required to use Monocypher. On the other hand, I'd like to avoid needless confusion. I believe we should add some aliases, but not all:
crypto_lock*()
functions, but not the context.crypto_x225519_public_key()
to avoid the blatant inconsistency, or rename crypto_key_exchange()
into crypto_x25519_HChacha20()
(and keep the old name for backward compatibility). I'm not sure which is best.@culex, @mikejsavage, what do you think?
Signatures are not like hashes. For any public key/message pair, there are many possible signatures.
Some protocol designers tend to expect security properties not provided by the primitives, resulting in insecure designs. For instance, if you assume a message/public key pair can only yield a single signature, the same way it would yield a single cryptographic hash, you might be tempted to treat the signature as a unique identifier, and fall prey to replay attacks (crypto currencies seem to be especially vulnerable).
The documentation should be updated to make clear what EdDSA does not provide.
When compiling the tests there are so many warnings it is hard to see what's happening. Based on Warning Options why not either:
-Wimplicit-fallthrough=0
-Wimplicit-fallthrough=3
plus /* FALLTHRU */
comments.Aloha!
I would suggest pointing to, or adding a comment like "remember to generate the test vectors and the file vectors.h, see XYZ" in dist/README.md under the "make test" info. The need for generating the vectors are described in the top level README.md, but is worth repeating in the dist/README.md to not confuse those that gets surprised when following the "make test" instruction in the dist/README.md.
Hey,
sorry for contacting you like this but there was no contact info on monocypher.org.
Way down on the page (ctrl-f yh will point it out) you misspelled the file name the user is recommended to copy.
:-)
(First reported by @mikejsavage)
It is not clear by looking at the header alone whether one should use crypto_lock_update()
or crypto_lock_encrypt()
(we should use the former). Users may use one instead of the other, and end up not authenticating their messages. This may happen even if they have read the manual.
We should add a little comment about crypto_lock_encrypt()
being a low level function, not intended for end users.
Actually, we should add that warning about every low-level primitives (Chacha20, Poly1305 etc…).
Packages started to appear, but we've learned since #19 that packagers make mistakes. This suggest the procedure to packages Monocypher lacks either automation or documentation —possibly both.
We should make the packager's life easy. I suggest the following:
PACKAGING.md
file or similar to motivate these choices.pkg_config
).This line was meant to zero out the relevant byte of the input buffer before actually putting in the input. This basically filters out the garbage before putting in the information. In practice, this works perfectly well, and compilers generate correct machine code.
Unfortunately, this also means that when the input buffer is uninitialised, we are reading uninitialised memory. And that's undefined behaviour under those crazy C and C++ standards. Here's a minimal example:
uint32_t x;
x &= 0xffffff00; // reading x is undefined
The bug was first found with Frama-C, and can be reproduced with the TIS interpreter (run the formal-analysis.sh
script, then run the TIS interpreter on the sources in tests/formal-analysis/
folder, it errors out within the first 20 minutes). It could have been caught if I ran it before release. I didn't, because I didn't want to wait the 15+ hours required to run the entire test suite under the TIS interpreter, and doing so never caught bugs that Valgrind or the clang sanitisers couldn't catch. Until now, that is. Lesson learned, I guess.
A fix will be coming by the end of the week. In the mean time, users who are in a hurry may replace line 532 by the following:
if (byte == 0) { ctx->input[word] = 0; }
(That one has been verified under the TIS interpreter.)
$ make clean
$ make test CC="clang -std=c99" CFLAGS="-fprofile-instr-generate"
$ tests/coverage.sh
error: Failed to load coverage: No coverage data found
A big fat listing describing code coverage.
The make test
step generates a default.profraw
file, as expected. The tests/coverage.sh
just invokes 2 commands:
llvm-profdata-3.8 merge default.profraw -o all.profdata
llvm-cov show -instr-profile=all.profdata "./test.out"
The first line generates a all.profdata
, as expected. The second line is expected to read that file, but for some reason doesn't.
It used to work, so maybe my llvm setup broke upon update or something. But it could be that there is indeed a problem with the makefile or the coverage script.
Could someone reproduce this on their machine?
It may be worthwhile to also create an incremental crypto_lock
/crypto_unlock
API for cases where you have multiple buffers to decrypt (e.g. header and body in separate buffers) or a large file to read out in chunks.
Same for crypto_sign
/crypto_check
(think: signing fairly large update files).
YMMV, but it's an idea.
The website says that "Monocypher takes less than 1300 lines of code". This is no longer true. By no measures I could get less than 1300.
Methods attempted:
wc -l monocypher.c
(1649)grep -v '^$' monocypher.c|wc -l
(1492)You may want to either adjust the figure or note how you got your number.
(In more positive news, sloccount thinks monocypher.c costs $35,845 to develop.)
RFC 7539 "ChaCha20 and Poly1305 for IETF Protocols" contains serveral test vectors for both ChaCha20, Poly1305 as separate constructs, as well for the combined construct. At least the test vectors for ChaCha and Poly1305 could be useful to add to Monocypher. This would show that the implementations in Monocypher seems to match other specs and implementations.
See for example chapter 2.5.2 for Poly1305 test vectors with a three block message with results for each block and final result.
Hi,
I like the manual because it's pretty much self-sufficient for a crypto newbie, but it doesn't really explain anything about these two parameters of the argon hash, except that they're optional.
What is their purpose, and when would you want to use them?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.