New computer and new CPU PoC
The main processor of the AltairX K1 is a VLIW In Order CPU.
It has 3 internal memory:
128 KiB L1 data Scratchpad memory.
128 KiB L1 instruction Scratchpad memory.
32 KiB L1 data Cache 4-way.
1 MiB L2 cache (Set-associative 4/8 ways).
1 MiB L2 Scratchpad memory.
The processor has no branch prediction, it will be based on the delay slot (1 cycle for Fetch) and 1 decode cycle + Jump (Delay)
The number of instructions is done via a "Pairing" bit, when it is equal to 1, there is another instruction to be executed in parallel, 0 indicates the end of the bundle.
The goal of this processor is to reach the minimum of latency, and to solve the problem of latency of the RAM.
For this, the compiler will have to do two things:
- resolve pipeline conflicts
- preload the data in advance with a DMA
This is a technique used on consoles like the Playstation 2 and 3, we have to make a double buffer, and therefore execute and read our data in buffer 1, while we preload our data in buffer 2.
Then we execute the buffer 2 and we preload the buffer 1 and so on.
To resolve pipeline conflicts, it has an accumulator internal to the ALU and to the VFPU which is register 61.
To avoid multiple writes to registers due to unsynchronized pipeline, there are two special registers P and Q (Product and Quotient) which are registers 62 and 63, to handle mul / div / sqrt etc etc.
It also has the uncached accelerated to speed up reads only (cache miss lasts half the time).
For floating point numbers in AltairX , it will not be 100% compatible with the standard with IEEE 754
-Non-normalized numbers are not handled (they are equal to zero).
-Infinite numbers cannot be handled (they are worth the max value).
-NaN values are not manage (they are worth the max value).
-Rounding is always towards 0
-Exceptions are not handled
For the calculation unit it has:
2ALU+4LSU 1VFPU/FDIV 1DIV/MUL BRU/CMP
The advantage of this processor is that it has a simple design, and requires little transistor for "high performance" and therefore consume / cost less than CISC/RISC Out Of Order processors.
- Finish the assembler program
- Make documentation (pdf / html) (ISA and hardware)
- Translate the IR code (Clang) for AltairX
- Make the virtual machine
Main core : AltairX K1 2.5 GHz
Sub core : AltairX K1 2.5 GHz , 6 cores
LPDDR4 3200 MHz , 8GB in a unified memory
GPU Aldebaran G1 1 GHz , 4 CU , 512 GFlops
AltairX K1 ISA : https://docs.google.com/spreadsheets/d/1AmdMslRcXIX9pKGBSRJJcx2IvRyzBLjA61SzxmlEYf8/edit?usp=sharing
AltairX K1 Pipeline : https://docs.google.com/spreadsheets/d/1u-XBjAyq8LOzAFcWMXsdAChMMzbmTIuZtzWQ7XDTRdk/edit?usp=sharing
AltairX K1 Memory Map : https://docs.google.com/spreadsheets/d/1UQ15KpRRWncc_Ouzhas0W1uWuSIfjAODw8KD-2-AoDA/edit?usp=sharing
AltairX IR ISA : https://docs.google.com/spreadsheets/d/19nOBbH_4KWaXxDSNA4JuZjaBble0VRrBxcVlEjTZ3iI/edit?usp=sharing
AltairX Executable Header : https://docs.google.com/spreadsheets/d/1g7mEhaBIVBJ75-5gJ_TrYiVJVTZHEJQnqN0XXUBX57g/edit?usp=sharing
Aldebaran G1 ISA : https://docs.google.com/spreadsheets/d/1LiSZbdd6wCpa-sZZ9uLg5eAyGxdpMl363waUP927xS4/edit?usp=sharing
GPU todo list : https://docs.google.com/spreadsheets/d/1eRX1vLHEJdrAsx2u1OiycSSz82G3cboVMcu8gBYkgGA/edit?usp=sharing