Giter Site home page Giter Site logo

ceespu's Introduction

CeesPU - A homemade CPU for the mojo

A 32 bit cpu made for the mojo v3 fpga. Coded in verilog. The design is largely based of the mips, and microblaze architectures. It consists of a four stage pipeline, and has thirty or so instructions.

CPU Architecture

The CPU is generally 32-bit: that is the width of registers, the size of each instruction word. The size of memory locations are 16 bit due to a limitation in the amount of available memory on the mojo baord(64 KB).

An instruction is 32 bits wide. The opcode is 6 bits an register address 5, and an immidiate 16 bits. There are three main types of instructions:

TypeA:  Opcode RegD RegA RegB 
TypeB:  Opcode RegD RegA Immidiate
TypeC:  Opcode RegA RegB Immidiate

There are 32 32-bit registers.

  • c0 zero; unchangeable
  • c1-c11 saved registers for static variables (preserved accross function calls)
  • c12-c17 scratch temporary registers
  • c18 stack pointer
  • c19 function return address
  • c20 function return value
  • c20-c25 function args
  • c25-c31 scratch temporary registers
Mnemonic Encoding Operation
ADD b000000 Add two registers Rd = Ra + Rb
ADC b000001 Addition with carry Rd = Ra + Rb + carry
SUB b000010 Subtraction Rd = Rb - Ra
SBB b000011 Subtraction with borrow Rd = Rb - Ra - carry
OR b000100 Bitwise or Rd = Ra
AND b000101 Bitwise and Rd = Ra & Rb
XOR b000110 Bitwise exclusive or Rd = Ra ^ Rb
SHL b001000 Bitshift to the left Rd = Ra << Rb
SHR b001000 Bitshift to the right Rd = Ra >> Rb
SAR b001000 Signed shift to the right Rd = Ra >> Rb
MUL b001001 Multiply Rd = Ra * Rb
SEB b000111 Sign extend byte Rd = (int)(Ra & 0xff)
SEH b000111 Sign extend halfword Rd = (int)(Ra & 0xffff)
ADDI b000000 Add immidiate Rd = Ra + Imm
ADCI b000001 Addition with carry Rd = Ra + Imm + carry
SUBI b000010 Subtraction Rd = Imm - Ra
SBBI b000011 Subtraction with borrow Rd = Imm - Ra - carry
ORI b000100 Bitwise or Rd = Ra
ANDI b000101 Bitwise and Rd = Ra & Imm
XORI b000110 Bitwise exclusive or Rd = Ra ^ Imm
SHLI b001000 Bitshift to the left Rd = Ra << Imm
SHRI b001000 Bitshift to the right Rd = Ra >> Imm
SARI b001000 Signed shift to the right Rd = Ra >> Imm
MULI b001001 Multiply Rd = Ra * Imm
LW b100000 Load word from memory Rd = Mem[Ra + Imm]
LH b100000 Load halfword from memory Rd = (short)Mem[Ra + Imm]
LB b100000 Load byte from memory Rd = (char)Mem[Ra + Imm]
LHU b100000 Load unsigend halfword from memory Rd = (unsigned short)Mem[Ra + Imm]
LBU b100000 Load unsigned byte from memory Rd = (byte)Mem[Ra + Imm]
SW b100000 Store word in memory Mem[Ra + Imm] = Rb
SH b100000 Store halfword in memory Mem[Ra + Imm] = (short)Rb
SB b100000 Store byte in memory Mem[Ra + Imm] = (byte)Rb
BEQ b111000 Branch if equal PC = Imm if(Ra == Rb)
BNE b111001 Branch if not equal PC = Imm if(Ra != Rb)
BGU b111010 Branch if PC = Imm if(Ra > Rb)
BGEU b111011 Branch if equal PC = Imm if(Ra == Rb)
BG b111100 Branch if PC = Imm if(Ra > Rb)
BGE b111101 Branch if equal PC = Imm if(Ra == Rb)
BC b111110 Branch if carry flag is set PC = Imm if(carry)
B b111111 Unconditional branch PC = Imm
BX b111111 Branch to register PC = Ra
CALL b111111 Branch and set the link register LR= PC; PC = Imm

Pipeline Stages

alt tag This is a rough overwiew of the proccesor. It consist of a four pipeline stage. In the first clock cycle the PC is incremented and the instruction is loaded. In the second the instruction is decoded and the proccesor decides what to do. In the third the proccesor executes the instruction, the relevant calculations are performed and the memory is accessed. In the fourth the result is written back into a register and the instruction is done.

Hazards

To increase performance the proccesor is pipelined, this means that while an instruction is decoded the next one is already loaded. While this does increase performance, this does mean that hazards arise. For example, an instruction might read a value before that value is written back, thus resulting in an incorrect value being used

0: addi c20, c0, 23
4: addi c21, c20, 32

Here the second instruction reads c20 before the first instruction has written the accurate result back. This means that the result of the second instruction will be incorrect. To deal with this a technique called forwarding is used. The alu result is then directly forwarded to the alu input hereby bypassing the writeback stage. In cases where this is not possible the proccesor is stalled until the instruction is completed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.