This project is an assembler written in C (standard C11) for Little Computer 3 (LC-3) as specified by Introduction to Computing Systems: From bits and gates to C and beyond
There are multiple online implementations of LC-3 virtual machines that can be used to run the resulting binary code:
- GUI simulator that also includes its own assembler and operating system.
wget https://acg.cis.upenn.edu/milom/cse240-Fall05/handouts/code/LC3sim.jar
java -jar LC3sim.jar
-
lc3tools containing several utilities to assemble and run the program
-
or you can write your own virtual machine
Note: in addition to the instructions described in the specification, the assembler implemented in this project also supports JMPT and RRT. These instructions are a variant of JMP
and RET
, respectively, that have the additional effect of setting the privilege bit in PS (Process Status Register). There is no guarantee though that the above virtual machines will support it.
Here's some other learning resources and references:
Run make lc3as CPPFLAGS=-DFAB_MAIN
to create the executable lc3as.
When running lc3as on an assembly file (.asm), two files are generated (in the same folder as .asm):
- binary with extension .obj
- symbol table with extension .sym
To run the unit tests:
-
install cmocka:
brew install cmocka
in MacOSX,sudo apt-get install libcmocka-dev
in Ubuntu
-
run
make unittest
To get a test coverage report with gcov and lcov:
- install lcov (
brew install lcov
in MacOS) - run
make coverage_report
- report will open in the default browser
The folder tools
contains some debugging utilities used during the development of this assembler:
lc3objdump
is a version of objdump to print the binary content of an object file generated by the LC3 assembler; Makefile shows how to run it
The following sections contain a brief description of the LC-3 arquitecture and the assembly language
The LC-3 memory has an address space of 2^16 (65,536) locations, and an addressability of 16 bits.
The normal unit of data that is processed in the LC-3 is 16 bits, we refer to 16 bits as one word, and we say the LC-3 is word-addressable.
The LC-3 specifies eight general purpose registers, each identified by a 3-bit register number. They are referred to as R0, R1 ... R7.
Registers are used as memory locations to store information. The number of bits stored in each register is 16 (one word).
Registers can be accessed in a single machine cycle as opposed to data from memory that normally requires more than one cycle.
An instruction is made up of two things: opcode and operands.
The instruction set of an ISA is defined by its set of opcodes, data types, and addressing modes. The addressing modes determine where the operands are located.
The LC-3 ISA has 15 instructions, each identified by its unique opcode. The opcode is specified by bits [15:12] of the instruction. Since four bits are used to specify the opcode, 16 distinct opcodes are possible. However, the LC-3 ISA specifies only 15 opcodes. The code 1101 has been left unspecified, reserved for some future need.
There are three different types of instructions, which means three different types of opcodes:
- operates instructions: process information
- data movement instructions: move information between memory and the registers and between registers/memory and input/output devices
- control instructions: change the sequence of instructions that will be executed (instead of processing them sequentially according to their location in memory)
- conditional branch
- unconditional jump
- subroutine (function) call
- TRAP (system calls, PC changes to a memory address that is part of the operating system so that the operating system will perform some task on behalf of the program)
- return from interrupt
The data type of the operands is 16-bit 2's complement integers.
An addressing mode is a mechanism for specifying where the operand is located. For instance, a 16-bit integer does not fit in an instruction, therefore the only way an opcode can operate on said integer is by storing it in memory/register and use as operand a reference to that location.
An operand can generally be found in one of three places:
- in memory,
- in a register, or
- as a part of the instruction (in this case, the operand is called literal or immediate)
The LC-3 supports five addressing modes:
- immediate (or literal)
- register
- memory addressing modes:
- PC-relative: bits [8:0] of the instruction specify an offset relative to the PC. The memory address is computed by sign- extending bits [8:0] to 16 bits, and adding the result to the incremented PC
- indirect: in this case, bits [8:0] do not contain the operand but the memory addres of the operand
- Base+offset: bits [5:0] of the instruction specify an offset relative to a base register. The memory address is computed by sign- extending bits [5:0] to 16 bits, and adding the result to the base register.
Condition codes allow the instruction sequencing to change on the basis of a previously generated result.
The LC-3 has three single-bit registers (condition codes) that are set (set to 1) or cleared (set to 0) each time one of the eight general purpose registers is written. The three single-bit registers are called N (negative), Z (zero) and P (positive).
This is the specification of the assembly language corresponding to the previously described ISA.
The LC-3 assembler is the program that takes as input a computer program written in LC-3 assembly language and translates it into a program in the ISA of the LC-3.
(LABEL) OPCODE OPERANDS (; COMMENTS)
- The OPCODE is a symbolic name for the opcode of the corresponding LC-3 instruction.
- The number of OPERANDS depends on the operation being performed and are separated by commas
- Labels are symbolic names that are used to identify memory locations that are referred to explicitly in the program
- Comments are identified by semicolons and are ignored by the assembler
Labels and comments can also appear on their own line (without accompanying any instruction). Labels always make reference to the memory location of the first instruction after the label. Two consecutive labels on the same line is considered illegal. However, two consecutive labels on different lines is permitted.
The assembly language provides some aliases for the TRAP instructions:
- GETC: TRAP x20
- OUT: TRAP x21
- PUTS: TRAP x22
- IN: TRAP x23
- PUTSP: TRAP x24
- HALT: TRAP x25
An assembler directive is a message to help the assembler in the assembly process. Once the assembler handles the message, the pseudo-op is discarded.
- .ORIG: tells the assembler where in memory to place the LC-3 program
- .FILL: tells the assembler to set aside the next location in the program and initialize it with the value of the operand.
- .BLKW: tells the assembler to set aside some number of sequential memory locations (BLocK of Words) in the program
- .STRINGZ: tells the assembler to initialize a sequence of n + 1 memory locations; the argument is a sequence of n characters, inside double quotation marks; the first n words of memory are initialized with the zero-extended ASCII codes of the corresponding characters in the string; the final word of memory is initialized to 0.
- .END: tells the assembler where the program ends; any characters that come after .END are ignored by the assembler.