ucb-bar / midas Goto Github PK

View Code? Open in Web Editor NEW

91.0 91.0 13.0 3.85 MB

FPGA-Accelerated Simulation Framework Automatically Transforming Arbitrary RTL

License: Other

C++ 19.74% Scala 69.17% Python 2.70% Makefile 0.79% Verilog 7.54% C 0.06%

midas's People

Contributors

Stargazers

Watchers

Forkers

cfandy rogerxujiang songjunjian csl-ku sleepbook kkangle rockstarrecords11 tmagik davidbiancolin ocakgun mfkiwl wchgithb joonho3020

midas's Issues

Remote set url for zc706_MIG

The remote url for
zc706_MIG/fpga-images-zc706 is
[email protected]:ucb-bar/fpga-images-zc706.git

This makes it inaccessible to those outside of the project members.

Fix:
Edit .gitmodules
git submodule sync

Remove requires where possible FASED functional model

The functional model needs to be more flexible since it's being driven by an edge which can vary widely from target-to-target.
Eg. non-powers-of-two multi-queues

Generating a Midas Memory Model
  Max Read Requests: 16
  Max Write Requests: 16
  Max Read Length: 8
  Max Write Length: 8
  Max Read ID Reuse: 3
  Max Write ID Reuse: 3
Timing Model Parameters
  Timing Model Class: Latency Bandwidth Pipe
  No LLC Model Instantiated
[error] (run-main-0) java.lang.IllegalArgumentException: requirement failed
[error] java.lang.IllegalArgumentException: requirement failed
[error]         at scala.Predef$.require(Predef.scala:264)
[error]         at midas.widgets.MultiQueue.<init>(Lib.scala:118)
``

midas is overdependent on rocketchip

For now, midas depends on rocketchip, so regardless of the target design, we should import rocketchip to use midas. This is very unacceptable in most cases including midas-examples. midas needs to depend only on chissel and firrtl. Here's stats for code sharing from rocketchip:

config: 71 lines
arbiters: 68 lines for nasti
multi-width FIFO: 79 lines
misc: 74 lines (from util/Misc.scala)

Thus, we use only <300 lines from >10K lines of rocketchip. If we don't need parameterized bundle any more, this is less. I don't see any justification of rocketchip dependency right now. We may use tilelink for midas, but it is an uncertain future. Also, don't tell me that submoduling rocketchip and its build time do not matter at all. Even not importing riscv-tools is very hard without a script. Writing a build system for rocketchip is even harder.

Here's my plan to cut off the rocketchip dependency:

Have an own config system until parameterization is provided by chisel.
Just copy some util code. This code is unlikely to change. Copying code is very ugly, but I believe less uglier than importing the whole rocketchip and bumping effort whenever it changes. If some modules need more rocketchip code, they are in a wrong place. (Should reside in firesim)

Of course, we should cut off the barstools dependency too.

Golden Gate Release Checklist

Endpoint-based black box
Conversion to FIRRTL stage
Blackbox support in FAME 1 transform
Dead code clean up

No tester dependency

With #6, ZynqShimTester is not working because of weird timing behavior of testers, so it's time for strober to graduate from chisel tester.

[DRAM Model] Better indicate that the functional model is undersized

The quick fix: add messages to assertions.

[DRAM Model] Accept target-address offset and memory system size in configuration

This will let us better allocate host-dram.

[DRAM model] queue-size setting by using mmReg

DRAM FRFCFSModel and PCRAM model use the queue and buffer with configurable-size by mmReg. But, for example, if transactionQueueDepth=8 is applied to the model, this model set the queue depth of "000". As discussed with David, this may be caused by overflow issue, so the below code should be modified to increase register size.

===
class FirstReadyFCFSMMRegIO(val cfg:FirstReadyFCFSConfig) extends BaseDRAMMMRegIO(cfg) {
val schedulerWindowSize = Input(UInt(log2Ceil(cfg.schedulerWindowSize).W))
val transactionQueueDepth = Input(UInt(log2Ceil(cfg.transactionQueueDepth).W))

Update README to explain deps on RISCV toolchain.

Allow endpoints to specify initial token count in their channels.

Currently, all target DecoupledIO between endpoints and the transformed-RTL model is given a decoupled channel with latency = 1 (they are seeded with one initial token).

While this will be fixed in the new FAME compiler; a short term solution would be to allow endpoints to specify what sort of channel (or latencies) they'd like on the interconnect moving between the transformed-RTL.

Croak more obviously when the driver is mismatched with a bitstream.

[MIDAS 2] Bring up FASED Configuration Generation

The new endpoint system will break how this currently invoked.

step() in peek poke interface in simif is deceptive

step() in simif can be deceptive for users who are familiar with Chisel's PeekPokeTesters. Consider the following example:

import chisel3._

class ShiftRegister extends Module {
  val io = IO(new Bundle {
    val in1  = Input(UInt(8.W))
    val in2  = Input(UInt(8.W))
    val out = Output(UInt(8.W))
    val enable = Input(Bool())
  })
  val out = RegInit(0.U(8.W))
  io.out := out
  when (io.enable) {
    out := io.in1 + io.in2
  }
}

#include "simif.h"

class ShiftRegister_t: virtual simif_t
{
public:
  void run() {
    std::vector<uint32_t> reg(4);
    target_reset();
    poke(io_enable, 0);
    step(5);
    poke(io_enable, 1);
    step(1);
    poke(io_in1, 10);
    poke(io_in2, 20);
    step(1);
    expect(io_out, 30);
  }
};

In chisel-testers, the poke-step-expect sequence works as expected, but since simif's "step" is really "fire one cycle of targetFire", the expect line actually dequeues the old value of io_out right before the posedge, which is different from what the chisel-testers do.

Suggestions: either document this, or perhaps rename as "targetFireStep()" or some other disambiguated name, or provide step(1) as an alias for targetFireStep(2), etc.

@davidbiancolin

Expose MSHRs as a runtime configurable settings.

MSHRs can and should be made a runtime-configurable setting.

@farzadfch

SerialWidget's target-time behavior influenced by simulation stalls

The SerialWidget inBuf starts to be filled in a modified design (modified to also stall when myStall is high) while the stall signal is active. As a result, the inBuf already contains an element once the simulation is restored and this is immediately sent to the target. Instead, in the golden design, the inBuf sends the element 2 rocket-chip cycles later than in the modified design because the inBuf had to be filled first

FAME-1 transforms should more intelligently bundle target I/O into FAME channels

I'm just going to start opening issues for things that need obvious improvement. It'll be easy to track them here.

Presently, all leaf signals are broken into fame decoupled bundles, despite the fact all input tokens will be consumed on the same host cycle and all outputs are being produced on the same cycle.

We should only create fame decoupled bundles for subsets of the output that can be produced on different cycles, and subsets of the input that can be consumed on different host cycles. An example of this would be a fame-1 decoupled target with multiple clock-domains (whose frequencies differ).

For fame-1 decoupled targets with a signal clock, there should be only a single output and input fame-1 channel produced.

Synthesize Prints Cycle Count Mismatch

Seems like there is a mismatch between the cycle count associated with synthesize prints, and the target cycle count. This seems to manifest in the case of of sparse prints (~370000 print statements out of 6246847756 target cycles). As a particular example, the last print statement in an experiment indicates CYCLE: 167551685485, while the end of simulation indicates Runs 6246847756 cycles

Master doesn't properly measure runtime, reports wrong simulation frequency

For long running workloads we stuff like this:

==> spec-test/473.astar.test.err <==
SEED: 7282986
time elapsed: 18446744072974.2 s, simulation speed = 0.00 KHz
*** PASSED *** after 74901751853 cycles
Runs 74901751853 cycles
[PASS] MidasTop Test
SEED: 7282989

real    130m58.391s
user    0m0.237s
sys     0m0.526s

Issue with subclasses in widget matching

We recently made changes to IceNet/SimpleNIC that changed the NICIO to be a subclass of SerialIO instead of StreamIO. Unfortunately, this caused midas widget mapping to break, because the SerialWidget was matching on SerialIO and all its subclasses, so it mistakenly matched the NICIO with the SerialWidget instead of the SimpleNICWidget. The solution was to change SimSerialIO's matchType function to explicitly return false for NICIO. I'm not sure how exactly this issue could be avoided in the future.