Giter Site home page Giter Site logo

0xade1a1de / cryptopt Goto Github PK

View Code? Open in Web Editor NEW
53.0 53.0 11.0 134.82 MB

CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives

Home Page: https://0xade1a1de.github.io/CryptOpt/

License: Apache License 2.0

Dockerfile 0.01% Makefile 0.06% TypeScript 86.39% Shell 0.01% C 0.27% sed 0.02% TeX 0.50% JavaScript 0.01% Assembly 12.67% Gnuplot 0.05%

cryptopt's People

Contributors

dderjoel avatar dependabot[bot] avatar github-actions[bot] avatar javali7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cryptopt's Issues

Memory and storage requirements?

Do you have a good sense of how much RAM and disk CryptOpt should use? I haphazardly attempted a week-long CryptOpt run and saw it reach memory exhaustion on day 2. Here's an output from the rare instance that got killed by node instead of Linux:

fiat_curve25519_solinas_square| run|  6|bs  156|#inst: 103|cycl_     12|G  26 cycl _  0|B  26 cycl _  0|L  64|l/g 2.5098| P|P[  -1/   0/   1/  -1]|D[FL/ 48/ 13/ 5
7]|47.2M(43%) 115/s                                                                                                                          
<--- Last few GCs --->                                                                                                                       
                                                                                                                                             
[212:0x67a1c50] 513566914 ms: Scavenge 3885.3 (4129.6) -> 3878.9 (4129.6) MB, 3.2 / 0.0 ms  (average mu = 0.629, current mu = 0.642) task;   
[212:0x67a1c50] 513566933 ms: Scavenge 3886.7 (4129.6) -> 3879.1 (4129.6) MB, 3.4 / 0.0 ms  (average mu = 0.629, current mu = 0.642) allocation failure; 
[212:0x67a1c50] 513566952 ms: Scavenge 3886.0 (4129.6) -> 3879.2 (4145.6) MB, 3.7 / 0.0 ms  (average mu = 0.629, current mu = 0.642) task;   

Does 4GB/process look like expected memory usage to you? I could provision that much, but I can't think of a reason why it would be needed.

JS stacktrace:

``` FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory 1: 0xb6b850 node::Abort() [node] 2: 0xa806a6 [node] 3: 0xd52140 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node] 4: 0xd524e7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node] 5: 0xf2fbe5 [node] 6: 0xf420cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node] 7: 0xfb1124 v8::internal::ScavengeJob::Task::RunInternal() [node] 8: 0xe2187b non-virtual thunk to v8::internal::CancelableTask::Run() [node] 9: 0xbd6684 [node] 10: 0xbd9aee node::PerIsolatePlatformData::FlushForegroundTasksInternal() [node] 11: 0x1652906 [node] 12: 0x1664e44 [node] 13: 0x165326e uv_run [node] 14: 0xaafa2d node::SpinEventLoop(node::Environment*) [node] 15: 0xbb11f4 node::NodeMainInstance::Run() [node] 16: 0xb26c44 node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResult const*) [node] 17: 0xb2a83f node::Start(int, char**) [node] 18: 0x7fde10f7ed90 [/lib/x86_64-linux-gnu/libc.so.6] 19: 0x7fde10f7ee40 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6] 20: 0xaad7ee _start [node] /root/CryptOpt/CryptOpt: line 31: 212 Aborted ```

I believe here is the invocation I used:

for i in $(seq 0 "$(("$(nproc)"-1))"); do
  tmux new-window -n "c$i"
  tmux send-keys "taskset -c $i ~/CryptOpt/CryptOpt --no-proof --resultDir /mnt/results --curve curve25519_solinas --method square --framePointer save --evals $((175*60*60*24*9))" C-m
  sleep "0.$RANDOM"
done

Additionally, the results directory seems to have acquired 49GB of csv files (and some asm and json). Are these something I'd may want to look at, or perhaps I should not be collecting them at all?

Target request: x86_64 without ADX

It would be nice to use CryptOpt to generate plain x86_64 code that does not depend on the ADX extension, to serve as a fallback from CryptOpt-optimized fast assembly in distributed binaries. This is a requirement for deployment in BoringSSL, and I hear it may be relevant to adoption of mit-plv/fiat-crypto#1582 as well.

I am thinking of use of CryptOpt in this context as primarily an assurance benefit, though if it's decently fast still, even better.

I would be happy to do the work for adapting CryptOpt here if you think that this would be a good first project to hack on in the CryptOpt codebase.

Native usage on Debian testing? (nodejs segfault)

Hi,

I am trying to run CryptOpt on natively on Debian testing because reasons. I'd like to know whether you think this has a chance of working. The current state is that I think I have managed to get both AssemblyLine and CryptOpt to build and link. asmline runs, but running CryptOpt exits with a segmentation fault:

Last lines of ./CryptOpt output with DEBUG=1 --verbose:

executing cmd gcc -march=native -mtune=native -O3 -fPIC -shared -o /tmp/CryptOpt.cache/uczayywaxs/libcheckfunctions-s1678729657975-bfiat-p4149251.so /home/andreser/CryptOpt/dist/data/fiat-bridge/.cache/85ea43d3500c431ba0ae9e2f56985f9990aa34fa42fe048a10d26e52b40f2df2.c with opts {"shell":"/usr/bin/bash"}
./CryptOpt: line 31: 4149251 Segmentation fault      PATH="$(realpath ./bins/node/bin):${PATH}" /usr/bin/env node "./dist/CryptOpt.js" "${@}"

gdb says:

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x00007ffff7e25fd6 in ms_initialize () from /home/andreser/CryptOpt/node_modules/measuresuite/build/Release/measuresuite.node
(gdb) bt
#0  0x00007ffff7e25fd6 in ms_initialize ()
   from /home/andreser/CryptOpt/node_modules/measuresuite/build/Release/measuresuite.node
#1  0x00007ffff7e25b1b in init ()
   from /home/andreser/CryptOpt/node_modules/measuresuite/build/Release/measuresuite.node
more backtrace #2 0x0000000000b10d7d in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo const&) () #3 0x0000000000db0230 in v8::internal::MaybeHandle v8::internal::(anonymous namespace)::HandleApiCallHelper(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Handle, v8::internal::Handle, v8::internal::Handle, v8::internal::BuiltinArguments) () #4 0x0000000000db176f in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) () #5 0x00000000016ef579 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit () #6 0x00000000016734d0 in Builtins_InterpreterEntryTrampoline () #7 0x00002e0bd55015b9 in ?? () #8 0x000015cb6c393811 in ?? () #9 0x0000000800000000 in ?? () #10 0x00002e0bd5501689 in ?? () #11 0x00003ebecb2fef29 in ?? () #12 0x0000000500000000 in ?? () #13 0x0000000100000000 in ?? () #14 0x0000000100000000 in ?? () #15 0x0000000100000000 in ?? () #16 0x0000000100000000 in ?? () #17 0x0000000500000000 in ?? () #18 0x00003ebecb2fef29 in ?? () #19 0x000015cb6c393811 in ?? () #20 0x0000064757026351 in ?? () #21 0x0000009400000000 in ?? () #22 0x000027d6971beb29 in ?? () #23 0x0000000000000008 in ?? () #24 0x00003a9153c02699 in ?? () #25 0x0000064757026351 in ?? () #26 0x00007fffffff9220 in ?? () #27 0x0000000001670e62 in Builtins_JSConstructStubGeneric () Backtrace stopped: frame did not save the PC
(gdb) disas $pc Dump of assembler code for function ms_initialize: 0x00007ffff7e25f30 <+0>: push %rbp 0x00007ffff7e25f31 <+1>: xor %eax,%eax 0x00007ffff7e25f33 <+3>: mov %rsp,%rbp 0x00007ffff7e25f36 <+6>: push %r15 0x00007ffff7e25f38 <+8>: mov %ecx,%r15d 0x00007ffff7e25f3b <+11>: push %r14 0x00007ffff7e25f3d <+13>: mov %edx,%r14d 0x00007ffff7e25f40 <+16>: push %r13 0x00007ffff7e25f42 <+18>: mov %esi,%r13d 0x00007ffff7e25f45 <+21>: push %r12 0x00007ffff7e25f47 <+23>: mov %rdi,%r12 0x00007ffff7e25f4a <+26>: push %rbx 0x00007ffff7e25f4b <+27>: sub $0x8,%rsp 0x00007ffff7e25f4f <+31>: call 0x7ffff7e24400 0x00007ffff7e25f54 <+36>: mov %rax,(%r12) 0x00007ffff7e25f58 <+40>: test %rax,%rax 0x00007ffff7e25f5b <+43>: je 0x7ffff7e25f6f 0x00007ffff7e25f5d <+45>: mov %r13d,%esi 0x00007ffff7e25f60 <+48>: mov %rax,%rdi 0x00007ffff7e25f63 <+51>: mov %rax,%rbx 0x00007ffff7e25f66 <+54>: call 0x7ffff7e24580 0x00007ffff7e25f6b <+59>: test %eax,%eax 0x00007ffff7e25f6d <+61>: je 0x7ffff7e25f88 0x00007ffff7e25f6f <+63>: mov $0x1,%eax 0x00007ffff7e25f74 <+68>: add $0x8,%rsp 0x00007ffff7e25f78 <+72>: pop %rbx 0x00007ffff7e25f79 <+73>: pop %r12 0x00007ffff7e25f7b <+75>: pop %r13 0x00007ffff7e25f7d <+77>: pop %r14 0x00007ffff7e25f7f <+79>: pop %r15 0x00007ffff7e25f81 <+81>: pop %rbp 0x00007ffff7e25f82 <+82>: ret 0x00007ffff7e25f83 <+83>: nopl 0x0(%rax,%rax,1) 0x00007ffff7e25f88 <+88>: mov %r14d,%esi 0x00007ffff7e25f8b <+91>: mov %rbx,%rdi 0x00007ffff7e25f8e <+94>: call 0x7ffff7e24540 0x00007ffff7e25f93 <+99>: test %eax,%eax 0x00007ffff7e25f95 <+101>: jne 0x7ffff7e25f6f 0x00007ffff7e25f97 <+103>: mov %r15d,%esi 0x00007ffff7e25f9a <+106>: mov %rbx,%rdi 0x00007ffff7e25f9d <+109>: call 0x7ffff7e242b0 0x00007ffff7e25fa2 <+114>: test %eax,%eax 0x00007ffff7e25fa4 <+116>: jne 0x7ffff7e25f6f 0x00007ffff7e25fa6 <+118>: mov %rbx,%rdi 0x00007ffff7e25fa9 <+121>: call 0x7ffff7e24460 0x00007ffff7e25fae <+126>: test %eax,%eax 0x00007ffff7e25fb0 <+128>: jne 0x7ffff7e25f6f 0x00007ffff7e25fb2 <+130>: mov %rbx,%rdi 0x00007ffff7e25fb5 <+133>: call 0x7ffff7e241f0 0x00007ffff7e25fba <+138>: test %eax,%eax 0x00007ffff7e25fbc <+140>: jne 0x7ffff7e25f6f 0x00007ffff7e25fbe <+142>: mov %rbx,%rdi 0x00007ffff7e25fc1 <+145>: call 0x7ffff7e246d0 0x00007ffff7e25fc6 <+150>: test %eax,%eax 0x00007ffff7e25fc8 <+152>: jne 0x7ffff7e25f6f 0x00007ffff7e25fca <+154>: mov %rbx,%rdi 0x00007ffff7e25fcd <+157>: call 0x7ffff7e24560 0x00007ffff7e25fd2 <+162>: test %eax,%eax 0x00007ffff7e25fd4 <+164>: jne 0x7ffff7e25f6f => 0x00007ffff7e25fd6 <+166>: movl $0x0,0x88(%rbx) 0x00007ffff7e25fe0 <+176>: jmp 0x7ffff7e25f74
(gdb) info registers rax 0x0 0 rbx 0x80eefa3fe0 553765191648 rcx 0x7ffff7b20733 140737349027635 rdx 0xffffffffffffff00 -256 rsi 0x1000 4096 rdi 0x0 0 rbp 0x7fffffff8e60 0x7fffffff8e60 rsp 0x7fffffff8e30 0x7fffffff8e30 r8 0xfffffff9 4294967289 r9 0x0 0 r10 0x1 1 r11 0x246 582 r12 0x7fffffff8e88 140737488326280 r13 0x5 5 r14 0x1 1 r15 0x1 1 rip 0x7ffff7e25fd6 0x7ffff7e25fd6 eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
node-segfault-handler log, read after the same gdb execution: -----[ Native Stacktraces ]----- [pc=0x00007ffff7fb569e, sp=0x00007fffffff8860] in segfault_handler(int)+0x4e [pc=0x00007ffff7a5af90, sp=0x00007fffffff8880] in __sigaction+0x40 [pc=0x00007ffff7e25fd6, sp=0x00007fffffff8e30] in ms_initialize+0xa6 [pc=0x00007ffff7e25b1b, sp=0x00007fffffff8e70] in init+0xab [pc=0x0000000000b10d7d, sp=0x00007fffffff8ed0] in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo const&)+0x7d [pc=0x0000000000db0230, sp=0x00007fffffff8f30] in v8::internal::MaybeHandle v8::internal::(anonymous namespace)::HandleApiCallHelper(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Handle, v8::internal::Handle, v8::internal::Handle, v8::internal::BuiltinArguments)+0x380 [pc=0x0000000000db176f, sp=0x00007fffffff9070] in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*)+0xaf [pc=0x00000000016ef579, sp=0x00007fffffff90e0] in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit+0x39 [pc=0x00000000016734d0, sp=0x00007fffffff90f0] in Builtins_InterpreterEntryTrampoline+0xd0 [pc=0x0000000001670e62, sp=0x00007fffffff9100] in Builtins_JSConstructStubGeneric+0x122 [pc=0x000000000178d3ff, sp=0x00007fffffff9110] in Builtins_ConstructHandler+0x2bf [pc=0x00000000016734d0, sp=0x00007fffffff9120] in Builtins_InterpreterEntryTrampoline+0xd0 [pc=0x00000000016734d0, sp=0x00007fffffff9130] in Builtins_InterpreterEntryTrampoline+0xd0 [pc=0x00000000016734d0, sp=0x00007fffffff9140] in Builtins_InterpreterEntryTrampoline+0xd0 [pc=0x0000000001670e62, sp=0x00007fffffff9150] in Builtins_JSConstructStubGeneric+0x122 [pc=0x000000000178d3ff, sp=0x00007fffffff9160] in Builtins_ConstructHandler+0x2bf [pc=0x00000000016734d0, sp=0x00007fffffff9170] in Builtins_InterpreterEntryTrampoline+0xd0 [pc=0x00000000016f96d7, sp=0x00007fffffff9180] in Builtins_AsyncModuleEvaluate+0x97 [pc=0x0000000001671adc, sp=0x00007fffffff9190] in Builtins_JSEntryTrampoline+0x5c [pc=0x0000000001671803, sp=0x00007fffffff91a0] in Builtins_JSEntry+0x83 [pc=0x0000000000e9293e, sp=0x00007fffffff91b0] in v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&)+0x13e [pc=0x0000000000e93a10, sp=0x00007fffffff9a50] in v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [clone .constprop.84]+0x50 [pc=0x0000000000e94012, sp=0x00007fffffff9ab0] in v8::internal::Execution::TryCall(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Handle, int, v8::internal::Handle*, v8::internal::Execution::MessageHandling, v8::internal::MaybeHandle*, bool)+0x62 [pc=0x00000000011c3611, sp=0x00007fffffff9b40] in v8::internal::SourceTextModule::InnerExecuteAsyncModule(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Handle)+0xd1 [pc=0x00000000011c38df, sp=0x00007fffffff9ba0] in v8::internal::SourceTextModule::ExecuteAsyncModule(v8::internal::Isolate*, v8::internal::Handle)+0x1af [pc=0x00000000011c5785, sp=0x00007fffffff9c10] in v8::internal::SourceTextModule::AsyncModuleExecutionFulfilled(v8::internal::Isolate*, v8::internal::Handle)+0x135 [pc=0x0000000000dbdc2b, sp=0x00007fffffff9ce0] in v8::internal::Builtin_CallAsyncModuleFulfilled(int, unsigned long*, v8::internal::Isolate*)+0x3b [pc=0x00000000016ef579, sp=0x00007fffffff9d20] in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit+0x39 [pc=0x000000000173fa71, sp=0x00007fffffff9d30] in Builtins_PromiseFulfillReactionJob+0x31 [pc=0x000000000169893b, sp=0x00007fffffff9d40] in Builtins_RunMicrotasks+0x27b [pc=0x0000000001671a03, sp=0x00007fffffff9d50] in Builtins_JSRunMicrotasksEntry+0x83 [pc=0x0000000000e92e0a, sp=0x00007fffffff9d60] in v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&)+0x60a [pc=0x0000000000e93a10, sp=0x00007fffffff9fb0] in v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [clone .constprop.84]+0x50 [pc=0x0000000000e9411a, sp=0x00007fffffffa010] in v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*, v8::internal::MaybeHandle*)+0x5a [pc=0x0000000000ec161a, sp=0x00007fffffffa080] in v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) [clone .part.50]+0x8a [pc=0x0000000000ec19f2, sp=0x00007fffffffa130] in v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*)+0x42 [pc=0x0000000000aaec29, sp=0x00007fffffffa160] in node::InternalCallbackScope::Close()+0x109 [pc=0x0000000000aaee01, sp=0x00007fffffffa1b0] in node::InternalCallbackScope::~InternalCallbackScope()+0x11 [pc=0x0000000000b70e5b, sp=0x00007fffffffa1d0] in node::fs::FileHandle::CloseReq::Resolve()+0x9b [pc=0x0000000000b73420, sp=0x00007fffffffa250] in node::fs::FileHandle::ClosePromise()::{lambda(uv_fs_s*)#1}::_FUN(uv_fs_s*)+0x230 [pc=0x0000000000b681f4, sp=0x00007fffffffa2d0] in node::MakeLibuvRequestCallback::Wrapper(uv_fs_s*)+0x54 [pc=0x000000000164e11d, sp=0x00007fffffffa300] in uv__work_done+0x9d [pc=0x0000000001652906, sp=0x00007fffffffa350] in uv__async_io.part.1+0x126 [pc=0x0000000001664e44, sp=0x00007fffffffa7a0] in uv__io_poll+0x494 [pc=0x000000000165326e, sp=0x00007fffffffd8b0] in uv_run+0x14e [pc=0x0000000000aafa2d, sp=0x00007fffffffd910] in node::SpinEventLoop(node::Environment*)+0x14d [pc=0x0000000000bb11f4, sp=0x00007fffffffd9c0] in node::NodeMainInstance::Run()+0xf4 [pc=0x0000000000b26c44, sp=0x00007fffffffda50] in node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResult const*)+0xb4 [pc=0x0000000000b2a83f, sp=0x00007fffffffdb00] in node::Start(int, char**)+0x2df [pc=0x00007ffff7a4618a, sp=0x00007fffffffdb70] in __libc_init_first+0x8a [pc=0x00007ffff7a46245, sp=0x00007fffffffdc10] in __libc_start_main+0x85 [pc=0x0000000000aad7ee, sp=0x00007fffffffdc60] in _start+0x2e

---[ V8 JavaScript Stacktraces ]---
at Measuresuite (file:///home/andreser/CryptOpt/node_modules/measuresuite/ts/dist/index.js:62:16)
at createMS (file:///home/andreser/CryptOpt/dist/optimizer.helper.class-13535478.js:4108:23)
at init (file:///home/andreser/CryptOpt/dist/optimizer.helper.class-13535478.js:4129:12)
at Optimizer (file:///home/andreser/CryptOpt/dist/CryptOpt.js:4535:46)
at (null) (file:///home/andreser/CryptOpt/dist/CryptOpt.js:4856:20)

$ node --version
v18.13.0

Feature request: output C with intrinsics

It'd be neat if we could have CryptOpt do optimization on assembly as it does, but then print the scheduled and register-allocated code as C to see how much worse C compilers do when given the easiest possible task. The code generated in this manner might also be preferrable to raw fiat-crypto output in cases a C implementation needs to be deployed. I don't think we should make any effort to use standard intrinsics, rather let's do something like fiat-crypto where we just define C functions that compute the same values as supported assembly instructions.

Transient build error (missing makefile dependencies?)

On a clean clone I got the following:

andreser@andreser ~/CryptOpt % make -j all
mkdir -p ./bins
Installing dependencies
curl -L https://nodejs.org/dist/v20.5.1/node-v20.5.1-linux-x64.tar.xz | tar --extract --xz --directory ./bins
cd src/bridge/fiat-bridge/data && sha256sum unsaturated_solinas word_by_word_montgomery dettman_multiplication solinas_reduction > sha256sums
/bin/sh: line 1: npm: command not found
make: *** [Makefile:37: node_modules] Error 127
make: *** Waiting for unfinished jobs....
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23.9M  100 23.9M    0     0  12.1M      0  0:00:01  0:00:01 --:--:-- 12.1M
mv -f ./bins/node-v20.5.1-linux-x64 "/usr/local/google/home/andreser/CryptOpt/bins/node"

I was following https://github.com/0xADE1A1DE/CryptOpt/blob/main/INSTALL.md#bare-metal

Feature request: option to maintain or preserve rbp

rbp-based stack unwinding is a requirement for stack-sampling profiling in a security-conscious production deployments where debug information cannot be evaluated during sampling, e.g. Google-wide profiling.

I imagine it would be pretty easy to just get CryptOpt to entirely ignore the rbp register by ripping it out from the list of registers. It'd be nice to add a command-line flag that achieves this behavior. Perhaps we could also have an option to update rbp as in -fno-omit-frame-pointer so time spent in CryptOpt-generated routines does not get misattributed during profiling.

Again, I'd be happy to do the work given the go-ahead and high-level guidance.

optimizing register allocation

CryptOpt code contains more spills than comparable handwritten code, and sometimes more than compiler-generated code. It is not clear whether this is important for performance, but I do have an implementation where CryptOpt is not winning over naive handwritten assembly and that exhibits this pattern.

What if register allocation first identified uses that can be unspilled using x86 memory operands, "allocated" temporaties for which all uses are of this form, and then allocated the remaining variables into registers based on the remaining uses. In particular, I believe this generic heuristic would do a decent job on the adx-multiplication pattern of mulx a0 b0; mulx a0 b1; mulx b0 b2 ... ; mulx a1 b0; ...: all b can stay in memory and only one a needs to be live at a time. This suggestion is based on the hypothesis that memory-operand access is cheaper than a separate load even if that load is amortized over multiple instructions; I am not sure whether this is always true.

Another guess for a tweak would be to consider would be spilling and restoring callee-saved registers using push/pop at the very beginning and end of the function.

Have you tried something like these options already?

Hardware is incompatible for (x84-64) Intel i5-8257U

Hello,

I followed the Docker installation step.
It identified that my hardware is not incompatible with the current requirement (which I believe it is).

Result with: ./CryptOpt
Screenshot 2567-02-25 at 22 09 12

Then, I worked around by directly executing .dist/CryptOpt.js by node. It identified illegal hardware instruction.
Screenshot 2567-02-25 at 22 13 12

My Hardware:

  • Mackbook Pro 2019
  • CPU: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz (x84-64) - Coffee Lake
  • Memory: 8GB

Note:

  • Everything ran in Docker image shell.
  • I did modify cycle goal, evaluations round, batch size (in the way requiring as less resources as possible). Nothing work.

Error: tried to spill OF, but didnt work. TSNH.

p256sqr2.zip

./CryptOpt --bridge manual --cFile ./p256sqr2.c --jsonFile ./p256sqr2.json --seed 2
Start on brg/symbolname>>manual/p256_sqr<< >>without proofing correct<< on cpu >>Intel(R) Xeon(R) CPU @ 2.80GHz<< writing results to>>/home/andreser/CryptOpt/results<< with seed >>4156876406132912<< for >>  200<< evaluations against CC>>gcc -march=native -mtune=native -O3<< with cycle goal>>10000<< for each measurement on host>>andreser<< with pid>>3079059<< using counter>>RDTSCP<< framePointer=>>omit<< memoryConstraints>>none<< starting @>>2023-09-11T19:23:01.677Z<<
{
  curOperation: {
    datatype: 'u64',
    name: [ 'x59', '_' ],
    operation: 'addcarryx',
    arguments: [ 'x58', 'x38', 'x40' ],
    decisions: {
      di_choose_arg: [Array],
      di_spill_location: [Array],
      di_flag: [Array],
      di_handle_flags_kk: [Array],
      di_choose_imm: [Array]
    },
    decisionsHot: []
  },
  e: Error: tried to spill OF, but didnt work. TSNH.
      at Jt (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:24964)
      at pe (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:37026)
      at $e (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:33719)
      at Ae (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:51131)
      at file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:61683
      at Timeout._onTimeout (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:62312)
      at listOnTimeout (node:internal/timers:573:17)
      at process.processTimers (node:internal/timers:514:7),
  allocs: {
    '0x100000000': { datatype: 'u64', store: 'r10' },
    arg1: { datatype: 'u64[4]', store: 'rsi' },
    'calSv-r12': { datatype: 'u64', store: '[ rsp - 0x70 ]' },
    'calSv-r13': { datatype: 'u64', store: '[ rsp - 0x68 ]' },
    'calSv-r14': { datatype: 'u64', store: '[ rsp - 0x60 ]' },
    'calSv-r15': { datatype: 'u64', store: '[ rsp - 0x58 ]' },
    'calSv-rbp': { datatype: 'u64', store: '[ rsp - 0x78 ]' },
    'calSv-rbx': { datatype: 'u64', store: '[ rsp - 0x80 ]' },
    out1: { datatype: 'u64[4]', store: '[ rsp - 0x50 ]' },
    x106: { datatype: 'u64', store: 'r15' },
    x107: { datatype: 'u64', store: '[ rsp - 0x10 ]' },
    x22: { datatype: 'u64', store: '[ rsp - 0x18 ]' },
    x38: { datatype: 'u1', store: 'OF' },
    x39: { datatype: 'u64', store: '[ rsp - 0x40 ]' },
    x40: { datatype: 'u64', store: '[ rsp - 0x48 ]' },
    x41: { datatype: 'u64', store: '[ rsp - 0x20 ]' },
    x42: { datatype: 'u64', store: '[ rsp - 0x38 ]' },
    x51: { datatype: 'u64', store: 'rcx' },
    x53: { datatype: 'u64', store: 'r9' },
    x55: { datatype: 'u64', store: 'r12' },
    x57: { datatype: 'u64', store: 'r11' },
    x58: { datatype: 'u1', store: 'CF' },
    x66: { datatype: 'u64', store: 'rbp' },
    x68: { datatype: 'u1', store: 'dil' },
    x69: { datatype: 'u64', store: 'rdx' },
    x70: { datatype: 'u1', store: 'r8b' },
    x71: { datatype: 'u64', store: '[ rsp - 0x30 ]' },
    x72: { datatype: 'u64', store: '[ rsp - 0x28 ]' },
    x77: { datatype: 'u64', store: 'rbx' },
    x78: { datatype: 'u64', store: 'r13' },
    x96: { datatype: 'u64', store: 'r14' },
    x97: { datatype: 'u64', store: 'rax' }
  },
  pres: [
    '',
    ';should save OF(x38) but as it has not dependents, we just ignore it.'
  ],
  failfile: '/home/andreser/CryptOpt/results/lastFail.asm'
}
{
  curOperation: {
    datatype: 'u64',
    name: [ 'x59', '_' ],
    operation: 'addcarryx',
    arguments: [ 'x58', 'x38', 'x40' ],
    decisions: {
      di_choose_arg: [Array],
      di_spill_location: [Array],
      di_flag: [Array],
      di_handle_flags_kk: [Array],
      di_choose_imm: [Array]
    },
    decisionsHot: []
  },
  e: Error: tried to spill OF, but didnt work. TSNH.
      at Jt (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:24964)
      at pe (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:37026)
      at $e (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:33719)
      at Ae (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:51131)
      at file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:61683
      at Timeout._onTimeout (file:///home/andreser/CryptOpt/dist/CryptOpt.js:1:62312)
      at listOnTimeout (node:internal/timers:573:17)
      at process.processTimers (node:internal/timers:514:7),
  allocs: {
    '0x100000000': { datatype: 'u64', store: 'r10' },
    arg1: { datatype: 'u64[4]', store: 'rsi' },
    'calSv-r12': { datatype: 'u64', store: '[ rsp - 0x70 ]' },
    'calSv-r13': { datatype: 'u64', store: '[ rsp - 0x68 ]' },
    'calSv-r14': { datatype: 'u64', store: '[ rsp - 0x60 ]' },
    'calSv-r15': { datatype: 'u64', store: '[ rsp - 0x58 ]' },
    'calSv-rbp': { datatype: 'u64', store: '[ rsp - 0x78 ]' },
    'calSv-rbx': { datatype: 'u64', store: '[ rsp - 0x80 ]' },
    out1: { datatype: 'u64[4]', store: '[ rsp - 0x50 ]' },
    x106: { datatype: 'u64', store: 'r15' },
    x107: { datatype: 'u64', store: '[ rsp - 0x10 ]' },
    x22: { datatype: 'u64', store: '[ rsp - 0x18 ]' },
    x38: { datatype: 'u1', store: 'OF' },
    x39: { datatype: 'u64', store: '[ rsp - 0x40 ]' },
    x40: { datatype: 'u64', store: '[ rsp - 0x48 ]' },
    x41: { datatype: 'u64', store: '[ rsp - 0x20 ]' },
    x42: { datatype: 'u64', store: '[ rsp - 0x38 ]' },
    x51: { datatype: 'u64', store: 'rcx' },
    x53: { datatype: 'u64', store: 'r9' },
    x55: { datatype: 'u64', store: 'r12' },
    x57: { datatype: 'u64', store: 'r11' },
    x58: { datatype: 'u1', store: 'CF' },
    x66: { datatype: 'u64', store: 'rbp' },
    x68: { datatype: 'u1', store: 'dil' },
    x69: { datatype: 'u64', store: 'rdx' },
    x70: { datatype: 'u1', store: 'r8b' },
    x71: { datatype: 'u64', store: '[ rsp - 0x30 ]' },
    x72: { datatype: 'u64', store: '[ rsp - 0x28 ]' },
    x77: { datatype: 'u64', store: 'rbx' },
    x78: { datatype: 'u64', store: 'r13' },
    x96: { datatype: 'u64', store: 'r14' },
    x97: { datatype: 'u64', store: 'rax' }
  },
  pres: [
    '',
    ';should save OF(x38) but as it has not dependents, we just ignore it.'
  ],
  failfile: '/home/andreser/CryptOpt/results/lastFail.asm'
}

Done with code: 1 (statefile: /home/andreser/CryptOpt/results/manual/p256_sqr/seed0000000000000002.json)

Wrote RES/manual/p256_sqr/seed0000000000000002.json exiting.

It is possible that the input is silly, I haven't proven anything about it yet.

Feature requrest: enforce memory contraints

re bitcoin-core/secp256k1#1329 (comment)

  • r and a may point to the same object, but neither can be equal to b. (...)
    */
    static void secp256k1_fe_mul(secp256k1_fe *r, const secp256k1_fe *a, const secp256k1_fe *b);

The idea is that we could have three enforcement levels:

  • do not care at all (current default). This assumes, that all memory is distinct. I.e read a[0] after writing r[0] is fine.
  • level 2: assume that r and a can be equal, but must be aligned. then, writing to r[0] and subsequently reading from a[1] is valid. Reading a[0] after writing r[0] would be invalid.
  • level 3: do not read any memory after writing any. (except for stack). I.e. there cannot be any reads after the first write, expect if the read is from the stack.

What is going on when further optimization appears to make the program slower?

image

A cartoon image of CryptOpt I've had is that it rejects mutations that make the program slower. I know that this is idealistic, even the optimization trace in the paper goes down for a bit. But how is it that continuing to try more mutations seems to have a real chance of making the cycle counts in the CryptOpt output go up? I understand that there's some chance that a mutation will misleadingly appear attractive due to measurement noise, but looking at the above log I can tell at a glance that a previous program would likely perform better. What is going on here -- is the theory that something changes about the machine to make both versions run slower, or does the wrong one just get picked sometimes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.