A C library and binary for generating machine code of x86_64 assembly language and executing on the fly without invoking another compiler, assembler or linker.
note: refer to /src/supported_instructions.h for a complete list of supported instructions
$ ./configure
or$ CFLAGS='-g -O3' ./configure
to generate Makefiles.$ make
to compile$ make install prefix=$(pwd)
to install it locally or$ sudo make install
to install globally$ gcc -o executable your_program.c -lassemblyline
to compile a c program using assemblyline
note: refer to /src/assemblyline.h for more information
- Include the required header files and preprocessors
#include <stdint.h> #include <sys/mman.h> #include <assemblyline.h> #define BUFFER_SIZE 300
- Allocate an executable buffer of sufficient size (> 20 bytes) using mmap
// the machince code will be written to this location uint8_t *mybuffer = mmap(NULL, sizeof(uint8_t) * BUFFER_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
- Create an instance of assemblyline_t and attach
mybuffer
or set it to NULL for internal memory allocation (willrealloc
if it was too small)// external memory allocation assemblyline_t al = asm_create_instance(mybuffer, BUFFER_SIZE); // internal memory allocation assemblyline_t al = asm_create_instance(NULL, 0);
- OPTIONAL: Enable debug mode to print the machinecode in hex to stdout.
asm_set_debug(al, true);
- OPTIONAL: Set a chunk size boundary to ensure that no instruction opcode will cross the specified chunk boundary length.
*** note: refer instructions
nop, nop2, ..., nop11
// It will use the appropriate `nop` instruction for the remaining bytes to fill the chunk boundry. int chunk_size = 16; asm_set_chunk_size(al, chunk_size);
- Assemble a file or string containing x64 assembly code. The machine code will be written to
mybuffer
or the internal buffer. You can call those functions sequentially; the new machinecode will be appended at the end.assemble_file(al, "./path/to/x64_file.asm"); assemble_str(al, "mov rax, 0x0\nadd rax, 0x2; adds two"); assemble_str(al, "sub rax, 0x1; subs one\nret");
- Get the start address of the buffer containing the start of the assembly program
void (*func)() =(void (*)())(asm_get_code(al)); // you can then call the function int result = func();
- Free all memory associated with assembyline (external buffer is not freed)
asm_destroy_instance(al);
- Full example:
#include <stdint.h> #include <sys/mman.h> #include <assemblyline.h> #define BUFFER_SIZE 300 uint8_t *mybuffer = mmap(NULL, sizeof(uint8_t) * BUFFER_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); assemblyline_t al = asm_create_instance(mybuffer, BUFFER_SIZE); asm_set_chunk_size(al, 16); assemble_str(al, "mov rax, 0x0\nadd rax, 0x2; adds two"); assemble_str(al, "sub rax, 0x1; subs one\nret"); void (*func)() =(void (*)())(asm_get_code(al)); int result = func(); printf("The result is: %d\n", result); // prints "The result is: 1\n" asm_destroy_instance(al);
$ make check
to run all test suites
- To run only one testsuite
TESTS=seto.asm make -e check
, then check the ./al_nasm_compare.sh /path/to/file.asmeto.log - Or run the
./al_nasm_compare.sh seto.asm
- Adding a new test: add the testfile e.g.
sub.asm
to the directory and addsub.asm
to theTESTS
-variable inMakefile.am
then run$ make check
. Finally, addMakefile.am
andsub.asm
to git.
note: run $ asmline
or $ asmline --help
to view usage information
USAGE:
asmline [-r] [-p] [-c CHUNK_SIZE>1] [-o ELF_FILENAME_NO_EXT] path/to/file.asm
DESCRIPTION:
Generates machine code from a file containing x64 assembly instructions.
Machine code could be executed directly without the need for an executable file format.
Obtain command-line instructions for generating an ELF binary file from assembly code.
$ asmline -o FILENAME path/to/file.asm
to output the generated machine code into a binary file (FILENAME.bin)-o --object FILENAME Generates a binary file from path/to/file.asm called FILENAME.bin in the current directory.
- The above call will generate a binary file FILENAME.bin and the command below could be used to create an ELF file.
$ objcopy --input-target=binary --globalize-symbol=FILENAME --rename-section .data=.text --output-target=elf64-x86-64 FILENAME.bin FILENAME.o # link the elf object file with a c program $ gcc linker linker.c FILENAME.o
$ asmline -p path/to/file.asm
to write the generated machine code fromfile.asm
to stdout-p --print When assembling path/to/file.asm the corresponding machine code will be printed to stdout.
- The above call will output some machine code in the hexadecimal format given
path/to/file.asm
.
$ asmline -c CHUNK_SIZE>1 path/to/file.asm
to appy chunk size fitting when assemblingpath/to/file.asm
.-c --chunk CHUNK_SIZE>1 Sets a given CHUNK_SIZE boundary in bytes. Nop padding will be used to ensure no instruction opcode crosses the specified CHUNK_SIZE boundary.
- A specific chunk size within a memory block could be specified (chunk sizes less must be greater than 1),
- Then a chunk size is given, assemblyline will ensure no instruction opcode crosses the chunk boundary by applying nop padding
$ asmline --return path/to/file.asm
to directly executepath/to/file.asm
given the following options:-r --return Executes assembly code and prints out the contents of the rax register (return value register).
-r
executes assembly program specified bypath/to/file.asm
and print out the return value of that program
- Get the instruction opcode layout and operand encoding format (please refer to: https://www.felixcloutier.com/x86/).
- Add the new instruction to the asm_instr enumerator set found in the /src/enums.h.
- Add a new entry to INSTR_TABLE[] /src/instructions.h for the specific instruction (see below for more details).
struct INSTR_TABLE[] {
/* null terminated string representation of an instruction ex: "mov"
* subsequent instructions of the same name with a different operand
* encoding must be places contiguously with the first instance of the
* instuction and must have the '\0' string
*/
char instr_name[MAX_INSTR_LEN];
// asm_instr enumerator for uniquely identifying a single instruction (the one from enums.h)
int name;
/* contains the valid operand formats for an instruction that maps
* to the same operand enccoding (at most 2 for a single operand encoding)
* ex: rr (instr reg,reg) && rm (instr reg, [mem]) -> RM
*/
int opd_format[VALID_OPERAND_FORMATS];
/* operand encoding format as an enumerator (determines how instruction operands will be encoded)
* in assemblyline the 'I' character op/en will be ignored unless it is standalone
* ex: MI -> M , RMI -> RM , I -> I
*/
operand_encoding encode_operand;
/* enumerator for defining the semantic type of an instruction
* if the instruction type is not known set this value to 'OTHER'
* refer to the link below to find the correct type for the instruction
* https://docs.oracle.com/cd/E36784_01/html/E36859/eoizp.html#scrolltoc
*/
instr_type type;
/* 'i' index of opcode[i] when a byte changes in the opcode depending
* on the register size for the instruction
* set this value to NA if not applicable to the instruction
*/
int op_offset_i;
/* 'i' index of opcode[i] when an offset is present for a REG value denoted as '+ rd'
* set this value to NA if not applicable to the instruction
*/
int rd_offset_i;
// used instructions with a single register operand denoted as '/num'
int single_reg_r;
// length of instruction opcode excluding immediate and memory displacement
int instr_size;
// displacement for the W0 prefix (following byte after the vector extension prefix VEX)
int w0_disp;
/* opcode layout for an instruction ex: {REX,0x0f,0xa9,REG}
* REX and REG are placeholders for the prefix and register values
* more can be found in enums.h op_encoding
*/
unsigned int opcode[10];
}
- David Wu (University of Adelaide)
- Joel Kuepper (University of Adelaide)
- The Air Force Office of Scientific Research (AFOSR) under award number FA9550-20-1-0425
- An ARC Discovery Early Career Researcher Award (project number DE200101577)
- An ARC Discovery Project (project number DP210102670)
- The Blavatnik ICRC at Tel-Aviv University
- the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL) under contracts FA8750-19-C-0531 and HR001120C0087
- the National Science Foundation under grant CNS-1954712
- Gifts from AMD, Google, and Intel