Giter Site home page Giter Site logo

august's Introduction

August ๐Ÿช“

August is an assembler written from scratch in Ink for me to learn about assemblers, linkers, executable file formats, and compiler backends. It currently supports assembling and linking (in a single step) x86_64 ELF binaries for Linux, and might in the future support ELF executables for ARM, RISC-V, and x86 architectures. In the far long term, August might also become a code generation backend for a compiler written in Ink for some small subset of C if I feel adventurous. But for now, August is an educational project that assembles a subset of x86_64 to a Linux ELF binary.

August currently supports the following features:

  • A good portable subset of the integer x86_64 instruction set
  • Support for arguments as immediates, registers, and labels
  • Embedded read-only data segments
  • Symbol tables for debugging and disassembly

You can see some example assembly code that August can assemble and link under test/.

Design

August provides a CLI, ./src/cli.ink, that currently takes a single assembly program and emits a single statically-linked x86_64 ELF executable. Under the hood, August reads the assembly program, parses it into a simple representation of symbols and sections in the source, assembles it into machine code, and links it all together with a minimal ELF linker.

At the moment, the assembler and linker are pretty tightly integrated. The ELF linker assumes that only two sections are used, .text and .rodata, and the assembler generates code with that assumption. The virtual address table for the generated executable is also currently hard-coded into the linker and relied on by the assembler when resolving symbols.

Here's a transcript of a shell session that demonstrates what August can do today. We take a bare-bones Hello World program for Linux on x86_64, assemble it with August, run it, and dump the generated assembly with objdump.

$ cat test/asm/004-sym.asm
; Hello World

section .text   ; implicit

_start:
    mov eax 0x1     ; write syscall
    mov edi 0x1     ; stdout
    mov esi msg     ; string to print
    mov edx len     ; length
    syscall

exit:
    mov eax 60      ; exit syscall
    mov edi 0       ; exit code
    syscall

section .rodata

msg:
    db "Hello, World!" 0xa
len:
    eq 14

Run the emitted program, which prints, "Hello, World!" and exits cleanly.

$ august test/asm/004.asm ./hello-world
executable written.

$ ./hello-world
Hello, World!

$ echo $?
0

If we disassemble the generated executable, we find the assembly we began with.

$ objdump -d ./hello-world

./hello-world:     file format elf64-x86-64

Disassembly of section .text:

0000000000401000 <_start>:
  401000:       b8 01 00 00 00          mov    eax,0x1
  401005:       bf 01 00 00 00          mov    edi,0x1
  40100a:       be 00 50 6b 00          mov    esi,0x6b5000
  40100f:       ba 0e 00 00 00          mov    edx,0xe
  401014:       0f 05                   syscall

0000000000401016 <exit>:
  401016:       b8 3c 00 00 00          mov    eax,0x3c
  40101b:       bf 00 00 00 00          mov    edi,0x0
  401020:       0f 05                   syscall
        ...

Assembler

The instruction encoding is handled by the ./src/asm.ink library within the project. Currently, August can assemble simple programs that work with 32-bit registers and the ALU, handle branches and jumps, make system calls and function calls per the x86 calling convention, and read or write to memory. Even with these basic building blocks, we can write programs that do interesting things like loop, manipulate memory, and make recursive calls. You can check out some examples in test/asm/.

ELF Linker

August uses a library for constructing ELF executable files located at ./src/elf.ink. The ELF generated by the ELF library in August currently makes use of three sections:

  • .text containing the program text, i.e. translated x64 assembly.
  • .rodata containing read-only data loaded into process memory as read-only
  • .shstrtab containing section headers

The content of .text and .rodata sections can be provided to the ELF library, which will return a fully linked ELF binary as the result. All labels found in the assembly code are treated as local function symbols and placed into the generated symbol table.

References and further reading

The ELF file format is quite well documented, especially in source bases of various linkers, assemblers, and kernels, but the available reference material for implementing an ELF linker is not...what you would call super accessible. In the process of building August, I've found the following references particularly helpful.

In writing an x86/x64 assembler, the following were especially helpful to get me up to speed.

Development

To work on August, you obviously need Ink installed. Inkfmt is also useful for auto-formatting code, which you can run with make format or make f.

When I work on August (especially the instruction encoder), I usually have two other panes open, running:

  • ls test/asm/*.asm lib/*.ink src/*.ink | entr -cr make so every file change assembles and runs a program to test
  • ls ./b.out | entr -cr objdump -d -Mintel ./b.out so that every time the executable is re-compiled, I can see the disassembly of the executable and check it against the intended assembly code.

There is a growing test suite for the assembler / x86 instruction encoder, which you can run with make check or make t.

august's People

Contributors

thesephist avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.