Giter Site home page Giter Site logo

slothy-optimizer / slothy Goto Github PK

View Code? Open in Web Editor NEW
132.0 6.0 10.0 8.97 MB

Assembly super-optimization via constraint solving

Home Page: https://slothy-optimizer.github.io/slothy/

License: Other

Python 8.18% Shell 0.58% Assembly 91.23% Dockerfile 0.02%
assembly-language constraint-programming cryptography-tools superoptimization post-quantum-cryptography

slothy's Introduction

SLOTHY: Assembly optimization via constraint solving

About SLOTHY

SLOTHY - Super (Lazy) Optimization of Tricky Handwritten assemblY - is an assembly-level superoptimizer for:

  1. Instruction scheduling
  2. Register allocation
  3. Software pipelining (= periodic loop interleaving)

SLOTHY is generic in the target architecture and microarchitecture. This repository provides instantiations for:

  • Armv8.1-M+Helium: Cortex-M55, Cortex-M85
  • AArch64: Cortex-A55, and experimentally Cortex-A72, Cortex-X/Neoverse-V, Apple M1 (Firestorm, Icestorm)

SLOTHY is discussed in Fast and Clean: Auditable high-performance assembly via constraint solving.

Goal

SLOTHY enables a development workflow where developers write 'clean' assembly by hand, emphasizing the logic of the computation, while SLOTHY automates microarchitecture-specific micro-optimizations. This accelerates development, keeps manually written code artifacts maintainable, and allows to split efforts for formal verification into the separate verification of the clean code and the micro-optimizations.

How it works

SLOTHY is essentially a constraint solver frontend: It converts the input source into a data flow graph and builds a constraint model capturing valid instruction schedulings, register renamings, and periodic loop interleavings. The model is passed to an external constraint solver and, upon success, a satisfying assignment converted back into the final code. Currently, SLOTHY uses Google OR-Tools as its constraint solver backend.

Performance

As a rough rule of thumb, SLOTHY typically optimizes workloads of <50 instructions in seconds to minutes, workloads up to 150 instructions in minutes to hours, while for larger kernels some heuristics are necessary.

Applications

SLOTHY has been used to provide the fastest known implementations of various cryptographic and DSP primitives: For example, the SLOTHY paper discusses the NTTs underlying ML-KEM and ML-DSA for Cortex-{A55, A72, M55, M85}, the FFT for Cortex-{M55,M85}, and the X25519 scalar multiplication for Cortex-A55. You find the clean and optimized source code for those examples in paper/.

Getting started

Have a look at the SLOTHY tutorial for a hands-on and example-based introduction to SLOTHY.

Real world uses

Installation

Requirements

SLOTHY has been successfully used on

  • Ubuntu-21.10 and up (64-bit),
  • macOS Monterey 12.6 and up.

SLOTHY requires Python >= 3.10. See requirements.txt for package requirements, and install via pip install -r requirements.txt.

Note: requirements.txt pins versions for reproducibility. If you already have newer versions of some dependencies installed and don't want them downgraded, consider using a virtual environment:

python3 -m venv venv
./venv/bin/python3 -m pip install -r requirements.txt

Then, enter the virtual environment via source venv/bin/activate prior to running SLOTHY.

Docker

A dockerfile for an Ubuntu-22.04 based Docker image with all dependencies of SLOTHY and the PQMX+PQAX test environments setup can be found in paper/artifact/slothy.dockerfile. See paper/artifact/README.md for instructions.

Quick check

To check that your setup is complete, try the following from the base directory:

% python3 example.py --examples aarch64_simple0_a55

You should see something like the following:

* Example: aarch64_simple0_a55...
INFO:aarch64_simple0_a55:Instructions in body: 20
INFO:aarch64_simple0_a55.slothy:Perform internal binary search for minimal number of stalls...
INFO:aarch64_simple0_a55.slothy:Attempt optimization with max 32 stalls...
INFO:aarch64_simple0_a55.slothy:Objective: minimize number of stalls
INFO:aarch64_simple0_a55.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.7.2996) ...
INFO:aarch64_simple0_a55.slothy:[0.0721s]: Found 1 solutions so far... objective 19.0, bound 8.0 (minimize number of stalls)
INFO:aarch64_simple0_a55.slothy:[0.0765s]: Found 2 solutions so far... objective 18.0, bound 12.0 (minimize number of stalls)
INFO:aarch64_simple0_a55.slothy:OPTIMAL, wall time: 0.155224 s
INFO:aarch64_simple0_a55.slothy:Booleans in result: 509
INFO:aarch64_simple0_a55.slothy.selfcheck:OK!
INFO:aarch64_simple0_a55.slothy:Minimum number of stalls: 18

Examples

The SLOTHY Tutorial and the examples directory contain numerous exemplary assembly snippets. To try them, use python3 example.py --examples={YOUR_EXAMPLE}. See python3 example.py --help for the list of all available examples.

The use of SLOTHY from the command line is illustrated in scripts/ supporting the real-world optimizations for the NTT, FFT and X25519 discussed in Fast and Clean: Auditable high-performance assembly via constraint solving.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.