Giter Site home page Giter Site logo

zsim's People

Contributors

arodchen avatar gaomy3832 avatar grantae avatar markcjeffrey avatar s5z avatar yang-yifan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zsim's Issues

Running the ZSim debugger / running ZSim with gdb

I compiled zsim in debug mode using "scons --d". Then I naively ran "gdb ./build/opt/zsim " from the zsim home folder and then "run ./tests/simple.cfg" from inside gdb's interface. It seems to run correctly. However, when I run "info functions" inside gdb, I cannot obtain many of zsim's functions, specifically all the functions that are inside the libzsim.so shared library.

I did set sim.attachDebugger to true in the config file but that did not seem to make a difference to gdb.

Deadlock Issues

Hello,

sometimes I have deadlock problems in my simulations (WARN: Deadlock detected, killing children) . I can reproduce the problem with the next simple configuration file, running an unmodified version of zsim in a 2-core host machine. The deadlock is produced simulating 16 cores, but with 4 or 8 cores the simulation runs ok. Any clue about what its happening?

Thank you

sys = {
cores = {
oooCore = {
cores = 16;
type = "OOO"
dcache = "l1d";
icache = "l1i";
};
};
lineSize = 64;
caches = {
l1d = {
caches = 16;
size = 256;
parent = "l2";
};
l1i = {
caches = 16;
size = 256;
parent = "l2";
};
l2 = {
caches = 1;
size = 8192;
parent = "mem";
};
};
mem = {
type = "Simple";
latency = 120;
};
};

sim = {
phaseLength = 10000;
statsPhaseInterval = 1000;
};

process0 = {
command = "/home/xxx/Projects/Benchmarks/parsec/parsec-3.0/ext/splash2x/kernels/fft/obj/amd64-linux.gcc/fft -p16";
};

Using Pinpoints with ZSim

I have been trying to integrate ZSim with PinPoints so that we can run simulations quickly.
https://software.intel.com/en-us/articles/pin-a-binary-instrumentation-tool-pinpoints

The pre-requisite to use Pinpoints is that the pintool (in our case libzsim.so ) should not change the
control flow of the program in any way.

Is this the case for ZSim ? If so, I can work on doing the integration and send out a patch.
I thought I could ask this on the forum instead of reading through the entire ZSim code to
find any modifications to the control flow, if they are indeed there.

cCycles in zsim.out and core stall cycles

I find "cCycles" in zsim.out file a little bit confusing. The comment says "Cycles due to contention stalls". What does this really mean? My understanding is that this is the cycles simulated in weave phases, which means the extra cycles due to interference between cores that should be added to bound phases. And also, these cycles are already included in the "cycles" counters. Right?

But I don't think "cCycles" covers all the stall cycles in each core, so I can't use it as the total stall cycles. For timing core it is simple to assume peak CPI = 1 so unstall cycles = instructions. Is there any simple way to estimate total stall cycles for OOO core?

Thanks!

how to enable hardware prefetcher

I have been using zsim for a while, thank you for this amazing simulator!
Currently I want to use the hardware prefetcher, but I do not know how to enable it, could you please give me some advise?

For example, if I want to add a prefetcher for L2 cahce, should I set L2 as the parent of the prefetcher, or should I set the prefetcher as the parent of L2 cache? I am not quite familiar with the memory hierarchy configuration in zsim. If it is possible, could you please give me a sample cfg file?

Thank you very much.

Error when returning from syscall

Hello,

I am trying to use zsim on a c++ file that calls a shell/system call with popen and retrieves the output.

For some reason, I think zsim is not able to get the return value correctly.

I have made a small example that shows my confusion:

// -----------------------------------------------------
// syscall_test.cpp
// -----------------------------------------------------

include

include

include <stdio.h>

std::string exec(char* cmd) {
FILE* pipe = popen(cmd, "r");
if (!pipe) return "ERROR";
char buffer[128];
std::string result = "";
while(!feof(pipe)) {
if(fgets(buffer, 128, pipe) != NULL)
result += buffer;
}
pclose(pipe);
return result;
}

int main (void) {
std::string ret;
char* execstring = "echo 1234";
ret = exec(execstring);

// std::cout << ret;
std::string s = "1234\n";
if (ret.compare(s)!= 0) std::cout << "error\n";

return 0;

}
// -----------------------------------------------------

This file executes an "echo 1234" command and checks to see if the syscall returned "1234\n".
Running it directly, there is no error as we get 1234\n.
Running it with an arbitrary pintool (say tools/SegTrace), there is no error as we get 1234\n.
However, running it with zsim gets an error. I think it is a null return?

Any idea why this is happening?

Thanks!
Vinson

Running applications in zsim: specifying location of input files used by applications

zsim does not run by traces, so programs have to run on-the-fly.

Many of SPEC application assume certain input files to be in same folder as the one where application is being run. If it try to put those files in main folder, it looks very cluttered, and even then, I don't think it is able to read those files.
Is there an option to specify from where to read the input files.

I tried to use perProcessDir to specify a folder and I put input files there, but could not still run, e.g. gamess.

Any help would be appreciated.

Understanding WayPart configuration for multicores

I tried to run a 16-threaded matrix multiplication application with UCP(shared 16-way set-associative L3) on default het.cfg(16 cores). I only changed the replacement policy as sys.caches.l3.repl.type="WayPart". I did not specify umonWays and umonLines.
I got an error message:
[S 0] Failed assertion on build/opt/part_repl_policies.h:234 'curWay == ways' (with '97 == 16')
For a 32-way set associative L3:
~/zsim/build/opt/zsim.cpp:1393 / InternalExceptionHandler
For a 64way set-associative L3,
[S 0] Failed assertion on build/opt/part_repl_policies.h:234 'curWay == ways' (with '152 == 64')

By reading init.cpp ,
ways=Buckets (for 16way L3 , so 16 buckets)
Is umonWays , the sampled number of UMON DSS ways? (umonWays=ways-> 16ways here)
Is umonLines, the sampled number of UMON DSS lines (umonWays x associativity,) ? which by default is 256(assuming 16 sets are sampled for a 16-way cache)?
May I get a sample UCP configuration please ?

Deadlock issue when running MPI applications

I use zsim and DRAMsim2 to run mpi applications, I encounter the similar problem as #26 when I increase the number of processes (such as 36). I use het.cfg. I don't know if it is the same problem as this. I try to skip fake leave in Scheduler::syscallLeave but it doesn't help. If it is the same problem, could you help to tell me how to skip the fake leave in the SYS_futex call? Thanks for any suggestions.

[S 0] Detected possible stall due to fake leaves (4 current)
[S 0] [35/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [19/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [28/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [4/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] Detected possible stall due to fake leaves (4 current)
[S 0] [27/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [32/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [11/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [30/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] Detected possible stall due to fake leaves (4 current)
[S 0] [31/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [11/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [2/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 0] [30/0] sys_sched_yield (24) @ 0x7fffe30e6c05
[S 6] Time slice ended, context-switched 16 threads, runQueue size 4, available 0
[S 6] State: 1966080r 1048576r 524288r 1245184r 1835008r 1376256r 2097152r 131072r 720896r 655360r 262144r 1441792r 393216r 1572864r 917504r 2424832r
[S 0] Terminating scheduler watchdog thread
[S 0] Delaying termination until all other processes finish
[S 0] Caught termination condition on join, exiting
[S 0] Terminating FF control thread
[H] WARN: Stalled for 20 secs so far
[H] WARN: Stalled for 30 secs so far
[H] WARN: Stalled for 40 secs so far
[H] WARN: Stalled for 50 secs so far
[H] WARN: Stalled for 60 secs so far
[H] WARN: Stalled for 70 secs so far
[H] WARN: Stalled for 80 secs so far
[H] WARN: Stalled for 90 secs so far
[H] WARN: Stalled for 100 secs so far
[H] WARN: Stalled for 110 secs so far
[H] WARN: Stalled for 120 secs so far
[H] WARN: Stalled for 130 secs so far
[H] WARN: Deadlock detected, killing children
[H] Received interrupt
[H] Attempting graceful termination
[H] Killing process 15184
[H] Done sending kill signals
[H] WARN: Hard death at exit (1 children running), killing the whole process tree
Killed

stats for energy consumption and example cfg files

Hello :)

Is there any way to get stats for energy consumption?
I believe it was possible. Because I have seen some energy stats of ZSim from Stanford EE282 course materials after some googling.
But in Github version, it seems that codes generating energy stats have been cut down.

And do you have any plan to add example cfg files for beginners?
It is hard to write a script after reading source codes for beginners like me.
So some example scripts describing commercialized computers would be very helpful for better understanding.

Thank You

mem ops unmatch when calling into application function from zsim

Hello,

In a PIN analysis routine I call into an application function instrumented by zsim via PIN_CallApplicationFunction, then zsim fails with following information.

[S 0] Failed assertion on build/opt/ooo_core.cpp:358: nehalem-0: loadIdx(5) != loads (3)

This only happens when using OOO cores. It is fine when using Simple cores.

I guess calling application functions from zsim is not supported. But I need this mechanism for my simulation. Is there a way I can fix this?

Btw, are cycles of application functions calling in this way counted?

Cannot resolve 'gm_create failed shmget: Invalid argument'

I recently installed zsim on Ubuntu 12.04.5 LTS @ Intel i7 and when trying to launch a test run (./build/opt/zsim tests/simple.cfg) I am getting 'gm_create failed shmget: Invalid argument'.

I read Memory Management section in README.md and as far as I understood I need to copy-replace some functions in some files. Unfortunalety, I cannot figure out what to change and where, thus I would like to ask

  • which files should be changed (including their relative locations) and
  • what changes should be done (i. e. I would like to get some examples of these changes).

Side note: there is no g_stl folder, mentioned in Memory Management section, in my zsim installation.

synced fast forward philosophy and behavior

I find the behavior of synced fast forward is a little confusing. First, what is its purpose? #27 said it is used between processes. Does it have any impact on single-process, multi-threaded program?
#27 also said syncedFastForward should not be the default behavior. But in the cfg file if we don't specify it, the default is true, as in https://github.com/s5z/zsim/blob/master/src/process_tree.cpp#L175.

The main problem with syncedFastForward enabled is for ROI end operation, as in https://github.com/s5z/zsim/blob/master/src/zsim.cpp#L1158-1159. ROI end will be completely ignored. However, the lines below it are also confusing, as the branch in line 1166 will never enter given line 1158. The inline comments and #27 explained that this is to avoid deadlock. I think we should either change the default syncedFastForward value or come up with a better mechanism for this.

Thanks!

xed-iclass-enum.h not found when compile zsim

/home/liuwj/pin-2.14-71313-gcc.4.4.7-linux/source/include/pin/level_base.PLH:83:29: fatal error: xed-iclass-enum.h: no file or directory found

is it necessary to install the xed package for Pin?

MPI applications in zsim

Hello,

I have an issue running MPI application in zsim on a cluster. For some processes I get an error:
"libzsim.so address mismatch! text: 0x2aaaac60cd10 != 0x2aaaabcdcd10. Perform loader injection to homogenize offsets!"
After this, simulation gets stalled...

Did anybody have similar problems or does anybody have any suggestions for running MPI applications in zsim?

Thanks,
Darko

libhdf5 ver 1.8 compile issues

Latest zsim fails to compile with libhdf1.8. To fix add "-DH5_USE_16_API" to SConstruct.

diff --git a/SConstruct b/SConstruct
index b2d4fff..7fc2586 100644
--- a/SConstruct
+++ b/SConstruct
@@ -46,7 +46,7 @@ def buildSim(cppFlags, dir, type, pgo=None):
# NOTE (dsm 10 Jan 2013): Tested with Pin 2.10 thru 2.12 as well
# NOTE: Original Pin flags included -fno-strict-aliasing, but zsim does not do type punning
env["CPPFLAGS"] += " -g -std=c++0x -Wall -Wno-unknown-pragmas -fomit-frame-pointer -fno-stack-protector"

  • env["CPPFLAGS"] += " -MMD -DBIGARRAY_MULTIPLIER=1 -DUSING_XED -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX"
  • env["CPPFLAGS"] += " -MMD -DBIGARRAY_MULTIPLIER=1 -DUSING_XED -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX -DH5_USE_16_API "

Pin 2.12+ kits have changed the layout of includes, detect whether we need

source/include/ or source/include/pin/

Issues while compiling Zsim on linux x86-64

I get the following 2 errors:
/usr/bin/ld: cannot find -lpthread
build/opt/virt/patchdefs.h:41:1: error: ‘SYS_getcpu’ was not declared in this scope

I tried to search on internet, but could not resolve. Here is the detailed log:

g++ -o build/opt/fftoggle -Wl,-R/home/esm/bin/libconfig/lib --static build/opt/fftoggle.o build/opt/config.o build/opt/galloc.o build/opt/log.o build/opt/pin_cmd.o -L/home/esm/bin/libconfig/lib -lconfig++ -lpthread
/usr/bin/ld: cannot find -lpthread
collect2: error: ld returned 1 exit status
g++ -o build/opt/virt/virt.os -c -fPIC -march=core2 -g -O3 -funroll-loops -g -std=c++0x -Wall -Wno-unknown-pragmas -fomit-frame-pointer -fno-stack-protector -MMD -DBIGARRAY_MULTIPLIER=1 -DUSING_XED -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX -Werror -DPIN_PATH="/home/esm/PIN/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/intel64/bin/pinbin" -DZSIM_PATH="/home/esm/Zsim/zsim/build/opt/libzsim.so" -DMT_SAFE_LOG -I/home/esm/PIN/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/extras/xed2-intel64/include -I/home/esm/PIN/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/source/include -I/home/esm/PIN/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/source/include/gen -I/home/esm/PIN/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/extras/components/include -I/home/esm/bin/libconfig/include -Ibuild/opt build/opt/virt/virt.cpp
echo "#define ZSIM_BUILDDATE ""date""\n#define ZSIM_BUILDVERSION ""python misc/gitver.py""" >>build/opt/version.h
scons: *** [build/opt/fftoggle] Error 1
In file included from build/opt/virt/virt.cpp:69:0:
build/opt/virt/patchdefs.h: In function ‘void VirtInit()’:
build/opt/virt/patchdefs.h:41:1: error: ‘SYS_getcpu’ was not declared in this scope
scons: *** [build/opt/virt/virt.os] Error 1
scons: building terminated because of errors.

Suppressing output of applications while running zsim

Since applications run on-the-fly in the zsim, their output gets dumped on screen. I tried to search code and also tried '> filename' for output redirection, but could not stop this.

Would you please suggest if there is a way to suppress output of say SPEC applications, for example, for hmmer

hmmcalibrate -- calibrate HMM search statistics
HMMER 2.3 (May 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
.......

waitUntilQueued time out

Hi,

I have some modification on zsim scheduler.h/cpp to support pinning a thread to a specific core. To do this, I hook a function in zsim.cpp and call zinfo->sched->leave() and clear the cid, and then call zinfo->sched->join() and set the cid. Also, I add some code to update the thread mask in scheduler, and add a branch in sched->leave() to transfer to BLOCKED instead of OUT (by doing deschedule(th); freeList.push_back(ctx); bar.leave(cid);) since OUT will always come back to the same core without checking mask.

The simulation seems to run well, except for that it gets a lot of waitUntilQueued time out warnings, especially when there are many threads (e.g., with 1024 threads, almost all threads but one or two will get waitUntilQueued time out). After some debug, I find that the threads are not waiting on schedLock, which makes the waitUntilQueued() while loop not exit normally.

Looking at the other parts of the code in scheduler.h/cpp, I don't understand why the thread that is woken up needs to wait on schedLock. waitUntilQueued() is only called in wakeup(), which is called in leave() (for now ignore sync()). In leave(), when a thread X leaves a core, if there is another thread Y waiting for this core (at waitForContext() in join()), we will swap in thread Y by waking it up (line 369 to 373 in scheduler.h). However, unless thread Y has needsJoin == true, after waking up it will just return from waitForContext() and join(), which has nothing to do with schedLock. So I wonder if I need to call wakeup() with needsJoin being true. But in the original code, all the call sites to wakeup() are with needsJoin being false, which confuses me the usage of "needsJoin".

I already read #8 and #9, but it helps little to my problem. Maybe the story is too long. To sum up my questions:

  1. Why waitUntilQueued needs to wait for threads to wait on schedLock?
  2. When should needsJoin argument be set to true when calling wakeup()?

At the end, I would like to report a minor bug: in scheduler.cpp, line 390:

while(!IsSleepingInFutex(th->linuxTid, th->linuxTid, (uintptr_t)&schedLock)) {

the first argument should be th->linuxPid instead of th->linuxTid.

Thanks!

Running GDB with zsim -- Cannot find bounds of function error

I get the following message while running gdb with zsim. I compiled zsim using scons --d but some debug information still seems to be missing.

It works correctly everywhere I tested except for the Indirect* functions (eg :IndirectBasicBlock, IndirectStoreSingle etc.)

image

[H] Failed assertion on build/opt/zsim_harness.cpp:112: Wait should not fail, cpid=0

I'm getting this error while running some parsec benchmarks:

[H] Failed assertion on build/opt/zsim_harness.cpp:112: Wait should not fail, cpid=0
[H] WARN: Segmentation fault
[H] WARN: Hard death, killing the whole process tree
[H] Panic on build/opt/zsim_harness.cpp:161: SIGKILLs sent -- exiting
[H] WARN: Hard death at exit (1 children running), killing the whole process tree

What could cause this error?

Thank you

Deadlock in barrier.h

Hi,
I am running an application using 2 threads on a simulated 32 core system. Zsim deadlocks when one thread calls leave() while the other thread sleeps after calling syscall(FUTEX_WAIT) within sync(). The sleeping thread will never wake up as the other thread has left. The application uses true spin_locks to synchronize between the threads but no futexes. The leave is triggered by a application futex syscall (202) and a fake leave timeout, although I am not using futexes in my application (maybe a lib?).

(gdb) bt
#0 0x00007ffff6593454 in Barrier::leave (tid=,

this=0xabba08cb98) at build/opt/barrier.h:167

#1 0x00007ffff6625b83 in Scheduler::leave (this=0xabba08cb80,

pid=<optimized out>, tid=<optimized out>, cid=<optimized out>)
at build/opt/scheduler.h:377

#2 0x00007ffff6623bdf in Scheduler::watchdogThreadFunc (this=0xabba08cb80)

at build/opt/scheduler.cpp:157

#3 0x00000000307455b1 in ?? ()
#4 0x0000000000000002 in ?? ()
#5 0x0000004a96d7bf10 in ?? ()
#6 0x0000004a96d7bf78 in ?? ()
#7 0x0000000000000000 in ?? ()

(gdb) info threads
Id Target Id Frame

  • 2 LWP 24696 "genomeSTM64" 0x00007ffff6593454 in Barrier::leave (
    tid=, this=0xabba08cb98) at build/opt/barrier.h:167
    1 LWP 24685 "genomeSTM64" 0x00007ffff6ec06e9 in ?? ()
    (gdb) thread 1
    [Switching to thread 1 (LWP 24685)]
    #0 0x00007ffff6ec06e9 in ?? ()
    (gdb) bt
    #0 0x00007ffff6ec06e9 in ?? ()
    #1 0x00007ffff6668164 in sync (schedLock=0xabba0960a0, tid=0,
    this=0xabba08cb98) at build/opt/barrier.h:193
    #2 sync (cid=0, tid=0, pid=, this=0xabba08cb80)
    at build/opt/scheduler.h:388
    #3 TakeBarrier (tid=tid@entry=0, cid=cid@entry=0) at build/opt/zsim.cpp:923
    #4 0x00007ffff6602ccb in OOOCore::BblFunc (tid=0, bblAddr=,
    bblInfo=) at build/opt/ooo_core.cpp:524
    #5 0x00000049a58cc1d0 in ?? ()
    #6 0x0000000000000000 in ?? ()
    (gdb)

Running het.cfg configuration example returns errors

Hello,

I am quite new to ZSim so I went through the documentation, issues and the examples. The simple ones seem to work but when I run the simulation with the het.cfg configuration, it returns some errors. I only left one process:

process0 = {
command = "$ZSIMAPPSPATH/build/parsec/blackscholes 15 2000000";
startFastForwarded = True;
};

and I checked beforehand that the Parsec benchmark is not the issue.
The error returned is:

[H] Creating global segment, 1024 MBs
[H] Global segment shmid = 4161551
[H] Deadlock detection ON
/build/parsec/blackscholes : No such file or directory
[H] Child 11756 done
[H] All children done, exiting

When I change the command to be run as an executable:

command = "./$ZSIMAPPSPATH/build/parsec/blackscholes 15 2000000";

I get the following:

[S 0] [0] Adjusting clocks, domain 0, de-ffwd 0
[H] Attached to global heap
[S 0] vDSO info initialized
[S 0] Started scheduler watchdog thread
[S 0] FF control Thread TID 11799
[S 0] FF thread 0 starting
[S 0] Started contention simulation thread 0
PARSEC Benchmark Suite
Usage:
.//build/parsec/blackscholes
[S 0] Shadow/NOP thread 0 finished
[S 0] Finished, code 1
[S 0] Dumping termination stats
[H] Child 11794 done
[H] All children done, exiting

Any help with this would be greatly appreciated.
Thanks.

How to fast-forward N instructions in zsim

With single-threaded apps, how can one fast-forward N instructions in zsim. I think it is related to sim.ffReinstrument, but could not verify, also there is currently no way to specify the exact number of instructions to fast-forward (as per my understanding).

Your reply will be greatly appreciated.

Running zsim in Ubuntu 10.04

I have compiled zsim in Ubuntu 14.04 perfectly, but now I need to run zsim in some machines with Ubuntu 10.04, with gcc-4.4, etc. (and I don't have root permissions ...). Is there an easy way to do that? I'm trying to compile zsim statically in Ubuntu 14.04, but I couldn't make it work until now ...

Thank you

compilation error due to hdf5-1.8.4

I am new to zsim and may be asking a simple doubt. I tried to install the simulator in my system with ubuntu 12.04,Pin 2.12 ,Intel Corei5. I followed the instructions in README.md file.
I tried installing hdf5 in 2 ways:
1st way)I installed hdf5 from below link(through software manager) and I have pasted below the error:
link : http://packages.ubuntu.com/lucid/science/hdf5-tools through software center
natarajan@ubuntu:~/zsim$ sudo scons -j16
build/opt/access_tracing.cpp:28:18: fatal error: hdf5.h: No such file or directory
compilation terminated.
build/opt/access_tracing.cpp:28:18: fatal error: hdf5.h: No such file or directory
compilation terminated.
build/opt/hdf5_stats.cpp:29:18: fatal error: hdf5.h: No such file or directory
compilation terminated.
echo "#define ZSIM_BUILDDATE ""date""\n#define ZSIM_BUILDVERSION ""python misc/gitver.py""" >>build/opt/version.h
scons: *** [build/opt/access_tracing.os] Error 1
scons: *** [build/opt/access_tracing.ot] Error 1
scons: *** [build/opt/hdf5_stats.os] Error 1
scons: building terminated because of errors.

I wonder why it cannot locate the path of hdf5.h because I am able to locate it?
natarajan@ubuntu:~$ locate hdf5.h
/home/natarajan/hdf5-1.8.4/src/hdf5.h
/home/natarajan/hdf5-1.8.4/test/testhdf5.h
/home/natarajan/hdf5-1.8.4/testpar/testphdf5.h
/usr/local/hdf5/include/hdf5.h

2nd way) I tried to install hdf5 from source website and followed the instructions from following link.
link:http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL
and modified the path for hdf5.h and hdf5_hl.h in hdf5_stats.cpp and access_tracing.cpp,

I get the following error:
build/opt/access_tracing.cpp:138:101: error: too few arguments to function ‘hid_t H5Acreate2(hid_t, const char_, hid_t, hid_t, hid_t, hid_t)’
/home/natarajan/hdf5-1.8.4/src/H5Apublic.h:44:16: note: declared here
build/opt/access_tracing.cpp:142:97: error: too few arguments to function ‘hid_t H5Acreate2(hid_t, const char_, hid_t, hid_t, hid_t, hid_t)’
/home/natarajan/hdf5-1.8.4/src/H5Apublic.h:44:16: note: declared here
build/opt/access_tracing.cpp: In constructor ‘AccessTraceWriter::AccessTraceWriter(g_string, uint32_t)’:
build/opt/access_tracing.cpp:138:101: error: too few arguments to function ‘hid_t H5Acreate2(hid_t, const char_, hid_t, hid_t, hid_t, hid_t)’
/home/natarajan/hdf5-1.8.4/src/H5Apublic.h:44:16: note: declared here
build/opt/access_tracing.cpp:142:97: error: too few arguments to function ‘hid_t H5Acreate2(hid_t, const char_, hid_t, hid_t, hid_t, hid_t)’
/home/natarajan/hdf5-1.8.4/src/H5Apublic.h:44:16: note: declared here
echo "#define ZSIM_BUILDDATE ""date""\n#define ZSIM_BUILDVERSION ""python misc/gitver.py""" >>build/opt/version.h
scons: *** [build/opt/access_tracing.os] Error 1
scons: *** [build/opt/access_tracing.ot] Error 1
scons: building terminated because of errors.

Could anyone please help me with these errors?

request for sim configuration for multithreaded apps

I am able to simulate Matrix Multiplication app with 2 threads in 2 in-order cores(simple.cfg). When I tried to increase the core count to 4(with 4 threads),8(with 8 threads),etc , I do see that only 2 cores(2 threads executed simultaneously,one in each core) are used and there is context switch between the 2 threads and other threads are idle. I don't know/understand what is the exact scheduler configuration for "number of domains, contention threads and parallelism" for scheduling N threads to N cores and execute them simultaneously without context switch. Any help would be greatly appreciated

How to differentiate memory controllers?

I use ZSim + DRAMSim2 to simulate the heterogenous memory system, so there are two memory controllers with different configurations. I check ZSim source code, the DRAMSimAccEvent is only created in the DRAMSimMemory::access() which is called according to LLC's parent id? Per my understand, the only interface of DRAMSim2 is addTransaction which is encapsulate into DRAMSimMemory::enqueue. I don't what the DRAMSimMemory::access() is used for? I want to do some scheduling between two controllers. Could you please help to explain how to differentiate memory controllers? By DRAMSimAccEvent or parent id or address? Thank you.

vdso error with linux kernel version larger than 3.14

Hi all,

I found a running error related to vdso for a long time, which has not been fixed until now. It throws the following error prompt:

[S 0] Failed assertion on build/opt/zsim.cpp:746 'vdsoPatchData[tid].level' (with '0')
[S 0] [0] Internal exception detected:
[S 0] [0] Code: 1
[S 0] [0] Address: 0x7ffff67991a9
[S 0] [0] Description: Exception Code: ACCESS_INVALID_ADDRESS. Exception Address = 0x7ffff67991a9. Access Type: UNKNOWN. Access Address = 0x000000000
[S 0] [0] Caused by invalid access to address 0x0

I notice that this error occurs when the version of the linux kernel is larger than 3.14 otherwise it does not crash.

Can anyone share any advice?

Many Thanks,

Fast forward in multicore system

Hi all.
I configured the simulator with 4 cores, and I set each process to first fast forward 40000000000 instructions and then simulate 10000000000 instructions by add the following 2 lines into each process's confiuration section:
ffiPoints = "40000000000 10000000000";
startFastForwarded = true;
My quetion is that is it possible that some processes have already finished simulation while others are still in fast forward? If so, how can I avoid this situation.
Another thing that confused me is that how are these processes scheduled on these 4 cores, I mean if I have 4 cores and 4 processes, is each process scheduled on a different core statically?
Thank you very much.

Accessing real-data in zsim

Zsim uses pin, so I assume, it should be possible to access actual data of an application. In current code, I think, only addresses are being simulated. Is there a simple way to get actual data, at least with few (or single) core(s).

Thanks.

Tiled multi-core configuration

I'm currently trying to specify a tiled multi-core configuration as in the "ZSim: Fast and Accurate ..." publication. I assume that the L3 cache is shared among all tiles in this example. Just some questions regarding this spec:

  1. As far as I understand the source so far, memory controllers are not specifically attached to tiles, so the configuration just has 4/16/64 controllers, right?
  2. How is the network specified? Is it specified as latencies between L3 banks? e.g., l3-0b1 l3-0b2 1. But then, how is 2 stage routing specified?
    Thanks in advance

Increase DRAMsim 2 memory latency

Hello

I'm using Zsim with dramsim2. Comparing the real system to the simulation values I find that the memory latency in the simulation is smaller by about 130 cycles. I tried compensating that by increasing the latency value for memory in the mem section of the zsim config file. However, if the latency value is increased beyond a certain value, the error mentioned in issue #25 (ACCESS_INVALID_ADDRESS) occurs.

I'm taking the DRAMsim values from the specs of the manufacturer, but it seems there is some other factor in the system adding to the latency. I would like to know how can I add more latency to memory accesses in Zsim.

Regards

Huge Perf Diff with Zsim-dram model and Dramsim2

Hi,
While comparing zsim-native dram model and zsim-dramsim, I've found perf (IPC ) difference and more importantly the Instruction executed reported in zsim.out has ~20% difference in 4T case.
perf-data

Update:
Instruction counts for both models are same if we make KMP_BLOCKTIME=0. Zsim-native instr value comes down. Still, its interesting that spin loops add instruciton only to native-zsim dram model, not when running with dramsim2.
The IPC difference is still huge , ~7x .

Update 2 : Previous data was with core frequency = 4800 MHz.
At Core frequencies >= 4295MHz, Zsim is reading very low latencies ~10 , in readReturn_CB, while latencies in vis file of dramsim are reasonable.

I reran with Core frequency = 2200MHz.
image

There is still a 38% diff in IPC. Is there some latency or config I am missing ?

I am using
controllers = 1;
type = "DDR";
tech = "DDR3-1333-CL10";
ranksPerChannel = 1;
banksPerRank = 8;
addrMapping = "col:bank:rank";
controllerLatency = 1;
closedPage = false;
queueDepth = 32;
pageSize = 1024;

And the corresponding tech.ini file with scheme2 while running DRAMSIM2 model.
Machine config is 4 OOO cores , with L1D and L1i = 64KB , 1 shared L2 = 1MB , 1 shared L3 = 4MB

FFI related problems

Hello,

I faced FFI related problems with the latest version of zsim.
The problem is that zsim enters fast-forwarding mode again after some period of real execution.
This is the console output I got.

[H] Starting zsim, built Thu Jul 9 13:46:48 KST 2015 (rev master:69:ac6d5ad:1fc 2+ 2- 5753470a)
[H] Creating global segment, 1024 MBs
[H] Global segment shmid = 458753
[H] Deadlock detection ON
[S 0] Started instance
[S 0] Started RR scheduler, quantum=10000 phases
== Loading device model file '/home/smartcode/Workspace/zsim/ext/DRAMSim2/ini/2Gb_DDR3-1600_x8.ini' ==
== Loading system model file '/home/smartcode/Workspace/zsim/ext/DRAMSim2/system.ini' ==
===== MemorySystem 0 =====
CH. 0 TOTAL_STORAGE : 4096MB | 2 Ranks | 8 Devices per rank
===== MemorySystem 1 =====
CH. 1 TOTAL_STORAGE : 4096MB | 2 Ranks | 8 Devices per rank
[S 0] Hierarchy: [ l1i-0 l1d-0 ] -> l2-0
[S 0] Hierarchy: [ l1i-1 l1d-1 ] -> l2-1
[S 0] Hierarchy: [ l1i-2 l1d-2 ] -> l2-2
[S 0] Hierarchy: [ l1i-3 l1d-3 ] -> l2-3
[S 0] Hierarchy: [ l2-0 l2-1 l2-2 l2-3 ] -> l3-0b0..l3-0b3
[S 0] Initialized system
[S 0] HDF5 backend: Opening /home/smartcode/Workspace/zsim/run/429.mcf/zsim.h5
[S 3] Started instance
[S 1] Started instance
[S 0] HDF5 backend: Created table, 3752 bytes/record, 280 records/write
[S 0] HDF5 backend: Opening /home/smartcode/Workspace/zsim/run/429.mcf/zsim-ev.h5
[S 2] Started instance
[S 0] HDF5 backend: Created table, 3752 bytes/record, 35 records/write
[S 0] HDF5 backend: Opening /home/smartcode/Workspace/zsim/run/429.mcf/zsim-cmp.h5
[S 0] HDF5 backend: Created table, 2336 bytes/record, 1 records/write
[S 0] Initialization complete
[S 0] Started process, PID 27358
[S 1] Started process, PID 27359
[S 1] procMask: 0x400000000000000
[S 0] procMask: 0x0
[S 1] [1] Adjusting clocks, domain 0, de-ffwd 0
[S 1] FFI mode initialized, 2 ffiPoints
[S 0] FFI mode initialized, 2 ffiPoints
[S 2] Started process, PID 27360
[S 2] procMask: 0x800000000000000
[H] Attached to global heap
[S 2] FFI mode initialized, 2 ffiPoints
[S 3] Started process, PID 27361
[S 1] vDSO info initialized
[S 0] vDSO info initialized
[S 3] procMask: 0xc00000000000000
[S 3] FFI mode initialized, 2 ffiPoints
[S 2] vDSO info initialized
[S 3] vDSO info initialized
[S 3] FF control Thread TID 27375
[S 3] FF thread 0 starting
[S 1] FF control Thread TID 27372
[S 0] FF control Thread TID 27373
[S 0] Started contention simulation thread 0
[S 0] Started scheduler watchdog thread
[S 1] FF thread 0 starting
[S 0] FF thread 0 starting
[S 2] FF control Thread TID 27374
[S 2] FF thread 0 starting


MCF SPEC CPU2006 version 1.10
Copyright (c) 1998-2000 Zuse Institut Berlin (ZIB)
Copyright (c) 2000-2002 Andreas Loebel & ZIB
Copyright (c) 2003-2005 Andreas Loebel
MCF SPEC CPU2006 version 1.10

Copyright (c) 1998-2000 Zuse Institut Berlin (ZIB)
Copyright (c) 2000-2002 Andreas Loebel & ZIB
Copyright (c) 2003-2005 Andreas Loebel


MCF SPEC CPU2006 version 1.10
Copyright (c) 1998-2000 Zuse Institut Berlin (ZIB)
Copyright (c) 2000-2002 Andreas Loebel & ZIB
Copyright (c) 2003-2005 Andreas Loebel


MCF SPEC CPU2006 version 1.10
Copyright (c) 1998-2000 Zuse Institut Berlin (ZIB)
Copyright (c) 2000-2002 Andreas Loebel & ZIB
Copyright (c) 2003-2005 Andreas Loebel

nodes                      : 25137
active arcs                : 260767
nodes                      : 25137
active arcs                : 260767
nodes                      : 25137
active arcs                : 260767
nodes                      : 25137
active arcs                : 260767
[S 0] ffiPoint reached, 1000000001 instrs, limit 1000000000
[S 0] FFI: Exiting fast-forward
[S 0] [0] Adjusting clocks, domain 0, de-ffwd 1
[S 0] Thread 0 starting
[S 0] Simulation paused due to synced fast-forwarding
[S 3] ffiPoint reached, 1000000001 instrs, limit 1000000000
[S 3] FFI: Exiting fast-forward
[S 3] [3] Adjusting clocks, domain 0, de-ffwd 1
[S 3] Thread 0 starting
[S 1] ffiPoint reached, 1000000001 instrs, limit 1000000000
[S 1] FFI: Exiting fast-forward
[S 1] [1] Adjusting clocks, domain 0, de-ffwd 1
[S 1] Thread 0 starting
[S 2] ffiPoint reached, 1000000001 instrs, limit 1000000000
[S 2] FFI: Exiting fast-forward
[S 2] [2] Adjusting clocks, domain 0, de-ffwd 1
[S 2] Thread 0 starting
[S 0] Synced fast-forwarding done, resuming simulation
writing vis file to ./results/2Gb_DDR3-1600_x8/4GB.2Ch.2R.scheme7.close_page.64TQ.64CQ.RtB.pRank.vis
DRAMSim2 Clock Frequency =800000000Hz, CPU Clock Frequency=3200000000Hz
[S 0] FFI: Entering fast-forward for process 0
[S 0] Thread 0 entering fast-forward
[S 0] [0] Internal exception detected:
[S 0] [0]  Code: 1
[S 0] [0]  Address: 0x7ffff65cadc1
[S 0] [0]  Description: Exception Code: ACCESS_INVALID_ADDRESS. Exception Address = 0x7ffff65cadc1. Access Type: UNKNOWN. Access Address = 0x000000000
[S 0] [0]  Caused by invalid access to address 0x0
[S 1] Simulation paused due to synced fast-forwarding
[S 0] [0] Backtrace (9/40 max frames)
[S 0] [0]  /home/smartcode/Workspace/zsim/build/opt/zsim.cpp:1397 / InternalExceptionHandler
[S 0] [0]  regstrlcpy.c:0 / LEVEL_PINCLIENT::IEH_CALLBACKS::NotifyInternalException(unsigned int, LEVEL_BASE::EXCEPTION_INFO*, LEVEL_VM::CONTEXT*)
[S 0] [0]  /home/smartcode/Workspace/zsim/ext/pin/pin-2.14-71313-gcc.4.4.7-linux/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL19InternalHandlerSyncEiPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTEPPKNS_14SCT_ATTRIBUTESEPNS_5PCTXTEPj+0x444) [0x3043a9454]
[S 0] [0]  /home/smartcode/Workspace/zsim/ext/pin/pin-2.14-71313-gcc.4.4.7-linux/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL20HandlePhysicalSignalEPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTE+0x124) [0x3043aa1f4]
[S 0] [0]  /home/smartcode/Workspace/zsim/ext/pin/pin-2.14-71313-gcc.4.4.7-linux/intel64/bin/pinbin(_ZN5PINVM28SIGNAL_DETAILS_LINUX_INTEL6415InternalHandlerEiPN7BARECRT8SIGXINFOEPv+0xe8) [0x304438c88]
[S 0] [0]  /home/smartcode/Workspace/zsim/ext/pin/pin-2.14-71313-gcc.4.4.7-linux/intel64/bin/pinbin(BARECRT_SigReturnRt+0) [0x30446603c]
[S 0] [0]  /home/smartcode/Workspace/zsim/build/opt/zsim.cpp:515 / TakeBarrier(unsigned int, unsigned int)
[S 0] [0]  /home/smartcode/Workspace/zsim/build/opt/ooo_core.cpp:518 / OOOCore::BblFunc(unsigned int, unsigned long, BblInfo*)
[S 0] [0]  [0x7fffe39bfa5c]
C: Tool (or Pin) caused signal 11 at PC 0x7ffff65cadc1
[H] Child 27358 done
[H] Panic on build/opt/zsim_harness.cpp:123: Child 27358 (idx 0) exit was anomalous, killing simulation
[H] WARN: Hard death at exit (3 children running), killing the whole process tree
Killed

Any comment would be appreciated.
Thank you.

"[S 0] WARN: waitUntilQueued for pid 0 tid 5 timed out" on nanosleep

Hi,
When the simulated application calls nanosleep() I get the following messages:
[S 0] WARN: waitUntilQueued for pid 0 tid 5 timed out

Is this because the thread slept across a phase? In the case I want that should I just ignore those warnings? At least the simulation time seems to increase significantly (because of waiting on phase barriers?)

possible stalls due to fake leaves

Hi,
I am running a single process with 2 threads and getting several of these:
[S 0] Detected possible stall due to fake leaves (1 current)
[S 0] [0/4] INVALID (202) @ 0x7ffff4cf524a
[S 0] Blacklisting from future fake leaves: [0] INVALID @ 0x7ffff4cf524a | arg0 0x4a96d8c2b0 arg1 0x80

What does this mean and how can I fix these errors? The program seems to run fine and to completion.

Regarding terminology in zsim

It is a minor point and I wanted to clarify my understanding (given coherence protocol uses this).

From this
/* Types of Access. An Access is a request that proceeds from lower to upper

  • levels of the hierarchy (core->l1->l2, etc.)
    */

I deduce that the terminology used in zsim is: L1 is lowest and L2 (or LLC) is the highest level of cache. It is certainly correct, I think it is opposite of pyramid model of cache latency where L1 with lowest latency is on top and LLC (and memory, disk) are at the bottom.

running spec06 on zsim

Hi community!
I am simulating using zsim but I came across the problem below.
[H] Starting zsim, built xxxx
[H] Removed 1 old logfiles
[H] Creating global segment, 1024 MBs
[H] Global segment shmid = 458758
[H] Pausing PIN to attach debugger, and not running deadlock detection
[H] Deadlock detection OFF
[H] Child 22741 done
[H] Panic on build/debug/zsim_harness.cpp:118: Child issued a panic, killing simulation
[H] WARN: Hard death at exit (1 children running), killing the whole process tree

My config file is as follows.

process0 = {
command = "/home/zhuguoliang/project/mybenchmark/401/run_base_ref_gcc43-64bit.0000/bzip2_base.gcc43-64bit /home/zhuguoliang/project/mybenchmark/401/run_base_ref_gcc43-64bit.0000/input.source 200";
};

It is my first time using zsim. Please give me a hit on this!

tid ==>args.tid in some places

I think, in files in virt folder, at places such as

trace(TimeVirt, "[%d] Post-patching SYS_clock_nanosleep", tid);

should be replaced by

trace(TimeVirt, "[%d] Post-patching SYS_clock_nanosleep", args.tid);

run with dramsim2 : ACCESS_INVALID_ADDRESS

When it runs with dramsim2, zsim failed with the following error outputs:

[S 0] Failed assertion on build/opt/timing_event.h:161: startCycle 130 < minStartCycle 138 (17MissResponseEvent), preDelay 0 postDelay 0 numChildren 2 str
[S 0] [1] Internal exception detected:
[S 0] [1] Code: 1
[S 0] [1] Address: 0x7ffff66056a5
[S 0] [1] Description: Exception Code: ACCESS_INVALID_ADDRESS. Exception Address = 0x7ffff66056a5. Access Type: UNKNOWN. Access Address = 0x000000000
[S 0] [1] Caused by invalid access to address 0x0
[S 0] [1] Backtrace (11/40 max frames)
[S 0] [1] /home/xxx/Workspace/ZSIM/zsim-src/build/opt/zsim.cpp:1417 / InternalExceptionHandler
[S 0] [1] :? / LEVEL_PINCLIENT::IEH_CALLBACKS::NotifyInternalException(unsigned int, LEVEL_BASE::EXCEPTION_INFO_, LEVEL_VM::CONTEXT_)
[S 0] [1] /home/xxx/Workspace/tools/pin/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL19InternalHandlerSyncEiPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTEPPKNS_14SCT_ATTRIBUTESEPNS_5PCTXTEPj+0x462) [0x3077e002]
[S 0] [1] /home/xxx/Workspace/tools/pin/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL20HandlePhysicalSignalEPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTE+0x200) [0x30783c90]
[S 0] [1] /home/xxx/Workspace/tools/pin/intel64/bin/pinbin(_ZN5PINVM28SIGNAL_DETAILS_LINUX_INTEL6415InternalHandlerEiPN7BARECRT8SIGXINFOEPv+0x9a) [0x308245ca]
[S 0] [1] /home/xxx/Workspace/tools/pin/intel64/bin/pinbin(BARECRT_SigReturnRt+0) [0x3084fa3c]
[S 0] [1] /home/xxx/Workspace/ZSIM/zsim-src/build/opt/timing_event.h:160 / TimingEvent::run(unsigned long)
[S 0] [1] /home/xxx/Workspace/ZSIM/zsim-src/build/opt/contention_sim.cpp:309 / ContentionSim::simulatePhaseThread(unsigned int)
[S 0] [1] /home/xxx/Workspace/ZSIM/zsim-src/build/opt/contention_sim.cpp:282 / ContentionSim::simThreadLoop(unsigned int)
[S 0] [1] /home/xxx/Workspace/tools/pin/intel64/bin/pinbin(_ZN8LEVEL_VM17VM_THREAD_DB_UNIX13THREAD_RUNNER9RunThreadEPN11OS_SERVICES7ITHREADE+0x291) [0x3078d8b1]
[S 0] [1] /home/xxx/Workspace/tools/pin/intel64/bin/pinbin(_ZN11OS_SERVICES6THREAD12RootFunctionEPv+0x26) [0x3084f036]
C:Tool (or Pin) caused signal 11 at PC 0x7ffff66056a5
[H] Child 12065 done
[H] Panic on build/opt/zsim_harness.cpp:123: Child 12065 (idx 0) exit was anomalous, killing simulation

Understanding the OOO core

I'm trying to understand and modify the OOO core, and I have some doubts regarding the access to the memory hierarchy and the recorder...

In the ooo_core.cpp, in the load uop, for accessing the l1 local cache, first is called "l1d->load(...)" and then is record in the recorder "cRec.record(...)". Could you explain the relation between both? (the l1d->load is for calculating the delay of accessing that addr in the memory hierarchy, and the recorder is to planify an event when the memory request is satisfied ????)

I want to make a memory request from ooo_core.cpp, calling "l1d->load(...)" and "cRec.record(...)", and furthermore make another request to get (or prefetch) another line in the L1 (but the core is not going to consume this data) ... How could I do that?

Understanding the different kinds of cache

I have a query about what the different kinds of cache mean.
There are the following types currently available in zsim :

  1. Simple

  2. Timing

    There is also the filter cache for L1, but that seems to be more of a code optimization thing, rather than a real cache in the memory hierarchy.

I noticed that the timing cache has significant more details than the simple cache eg : It has MSHRs, and more event recordings. So it seems to be a better cache model than simple cache. However, I also noticed that the "Simple" cache was used for the pgo.cfg, which was used in the zsim paper.

Additionally, timing cache gave assertion failure at line 279 of ooo_core_recorder.cpp at one of my config files, which leads me to suspect that it does not support weave models.

So I am a bit confused about when each kind of cache should be used. Is simple cache the correct one to use for most simulations, and timing cache for only certain kinds of configurations that do not use the weave models ?

Deadlock due to too many fake leave timeouts

I am running a workload in which the main thread and 1024 workers are synced by condition variables. At some point, all the workers will call wait(). zsim does fake leave for this SYS_futex syscall, and after some timeout period, it will do real leave, as explained in #8.

The problem is, since there are 1024 wait(), zsim needs to detect 1024 timeouts before it can blacklist this call when there is only 1 left. The time to do so exceeds the deadlock detection threshold 120 seconds (it resolves about 5 fake leaves per second, so 1024 fake leaves need 200 seconds), and thus cause the simulation terminate.

There are multiple solutions. First I can increase the deadlock threshold or disable it, but it still waste time on those fake leaves. Another solution is to avoid fake leave for SYS_futex at the first place, and blindly do real leaves for all SYS_futex. This seems to be a good solution to me, but I am curious whether there is any catch, and why you did not do this.

Thanks!

Only one last-level cache allowed, found: [ l3 l2 l1i l1d ]

I have recently gotten this error message:

Panic on build/opt/init.cpp:470: Only one last-level cache allowed, found: [ l3 l2 l1i l1d ]

Even though the configuration is exactly like the one found here: https://lagunita.stanford.edu/courses/Engineering/CS144/Introduction_to_Computer_Networking/wiki/CS316/; the only difference being the path to the program being run. This config has run just fine before, but now all of a sudden it does not.

It might be related to #37 as we recently updated to a new kernel here as well.

What can I do?

Thanks!

Is there any example configuration on DRAMSim2?

According to init.cpp, I insert these lines into cfg file:
mem = {
type = "DRAMSim";
techIni = "ini/DDR3_micron_32M_8B_x4_sg125.ini";
systemIni = "system.ini";
outputDir = "zsimSPEC";
}
But the DRAMSim seems not working...
How to set the result directory?

Multiple sockets and execution time questions

I have 2 questions regarding zsim:

  1. is it possible to specify multi-socket systems where say both sockets have 4 cores -> L1 -> L2 -> L3 -> mem? I tried specifying this but zsim states that there needs to be a single cache with parent mem. Does this mean that in zsim there needs to be a shared cache layer between all cores and mem?

  2. is execution time of the ROI (wall clock) part of the stats? It seems that all stats are given in cycles - is that correct?

Thanks in advance.

ZSim prefetcher bug

I get the following error when I run a simulation using one of the spec benchmarks.
What does this mean ? This error vanishes if I remove the prefetchers between l1 and l2 levels.

[S 0] Failed assertion on build/opt/ooo_core_recorder.cpp:279
[S 0] [0] Internal exception detected:
[S 0] [0] Code: 1
[S 0] [0] Address: 0x7ffff68426ff
[S 0] [0] Description: Exception Code: ACCESS_INVALID_ADDRESS. Exception Address = 0x7ffff68426ff. Access Type: UNKNOWN. Access Address = 0x000000000

and then the following backtrace :

[S 3] [0] Caused by invalid access to address 0x0
[S 3] [0] Backtrace (10/40 max frames)
[S 3] [0] /home/kartik/Prefetch_Simulator/zsim_baseline_inclusive/build/opt/zsim.cpp:1394 / InternalExceptionHandler
[S 3] [0] :? / LEVEL_PINCLIENT::IEH_CALLBACKS::NotifyInternalException(unsigned int, LEVEL_BASE::EXCEPTION_INFO_, LEVEL_VM::CONTEXT_)
[S 3] [0] /home/kartik/Prefetch_Simulator/pinplay-1.4-pin-2.14-67254-gcc.4.4.7-linux/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL19InternalHandlerSyncEiPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTEPPKNS_14SCT_ATTRIBUTESEPNS_5PCTXTEPj+0x444) [0x306f40a4]
[S 3] [0] /home/kartik/Prefetch_Simulator/pinplay-1.4-pin-2.14-67254-gcc.4.4.7-linux/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL20HandlePhysicalSignalEPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTE+0x124) [0x306f4e44]
[S 3] [0] /home/kartik/Prefetch_Simulator/pinplay-1.4-pin-2.14-67254-gcc.4.4.7-linux/intel64/bin/pinbin(_ZN5PINVM28SIGNAL_DETAILS_LINUX_INTEL6415InternalHandlerEiPN7BARECRT8SIGXINFOEPv+0xe8) [0x3077f7f8]
[S 3] [0] /home/kartik/Prefetch_Simulator/pinplay-1.4-pin-2.14-67254-gcc.4.4.7-linux/intel64/bin/pinbin(BARECRT_SigReturnRt+0) [0x307abc3c]
[S 3] [0] /home/kartik/Prefetch_Simulator/zsim_baseline_inclusive/build/opt/ooo_core_recorder.cpp:235 (discriminator 1) / OOOCoreRecorder::recordAccess(unsigned long, unsigned long, unsigned long)
[S 3] [0] /home/kartik/Prefetch_Simulator/zsim_baseline_inclusive/build/opt/ooo_core_recorder.h:93 / OOOCoreRecorder::record(unsigned long, unsigned long, unsigned long)
[S 3] [0] /home/kartik/Prefetch_Simulator/zsim_baseline_inclusive/build/opt/ooo_core.cpp:506 / OOOCore::BblFunc(unsigned int, unsigned long, BblInfo*)
[S 3] [0] [0x7fffe48139d4]
C:Tool (or Pin) caused signal 11 at PC 0x7ffff68426ff

My config file is below :

// Used for the PGO compile flow
// based on zephyr3 [email protected]

process0 = {
startFastForwarded=true;
ffiPoints = "1000000000 2000000000 ";
command = "/home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/Xalan_base.amd64-m64-gcc42-nn -v /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/t5.xml /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/xalanc.xsl ";
};

process1 = {
startFastForwarded=true;
ffiPoints = "1000000000 2000000000 ";
command = "/home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/Xalan_base.amd64-m64-gcc42-nn -v /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/t5.xml /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/xalanc.xsl ";
};

process2 = {
startFastForwarded=true;
ffiPoints = "1000000000 2000000000 ";
command = "/home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/Xalan_base.amd64-m64-gcc42-nn -v /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/t5.xml /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/xalanc.xsl ";
};

process3 = {
startFastForwarded=true;
ffiPoints = "1000000000 2000000000 ";
command = "/home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/Xalan_base.amd64-m64-gcc42-nn -v /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/t5.xml /home/kartik/Prefetching_Benchmarks/benchspec/CPU2006/483.xalancbmk/run/run_base_ref_amd64-m64-gcc42-nn.0003/xalanc.xsl ";
};

sim = {
//maxTotalInstrs = 100000000L;
phaseLength = 10000;
statsPhaseInterval = 0;
simName = "_sim_pgo_test";
};

sys = {
caches = {
l1d = {
array = {
type = "SetAssoc";
ways = 8;
};
//caches = 4;
caches = 4;
latency = 4;
parent = "l2prefetcher";
size = 32768;
};

// l1d.parent = "l2prefetcher";

l1i = {
  array = {
    type = "SetAssoc";
    ways = 4;
  };
  //caches = 1;
  caches = 4;
  latency = 3;
  parent = "l2";
  size = 32768;
};

l2prefetcher = {
isPrefetcher = true;
parent = "l2";
prefetchers = 4;
};

l2 = {
  array = {
    type = "SetAssoc";
    ways = 8;
  };
  //caches = 1;
  caches = 4;
  latency = 7;
  parent = "l3";
  size = 262144;
};

l3 = {
  array = {
    hash = "H3";
    type = "SetAssoc";
    ways = 16;
  };
  banks = 6;
  caches = 1;
  latency = 27;
  parent = "mem";
  size = 12582912;
  nonInclusiveHack=true;
};

};

cores = {

westmere = {
  //cores = 1;
  cores = 4;
  dcache = "l1d";
  icache = "l1i";
  type = "OOO";
};

};

frequency = 2270;
lineSize = 64;

mem = {
controllers = 3;
type = "DDR";
controllerLatency = 40;
};
};

ZSim prefetcher correctness

There is the following code in the prefetch.h.

As seen in the code, if we have an entry miss, then we set candScore to -1. Then we check if the array[i].ts (timestamp) is less than candScore for each element of the array, asuming that our current access is not more than 500 cyles after the previous access.
However, array[i].ts is always positive, because it is a timestamp, which cannot be negative. Therefore, it seems like the "if" check( on line 12 of the code below) always fails. In this case, we might never be able to set an LRU candidate to replace in array, in case of an entry miss.

Would appreciate any comments on this, in case I am not seeing this correctly.

if (idx == 16) { // entry miss
uint32_t cand = 16;
uint64_t candScore = -1;
//uint64_t candScore = 0;
for (uint32_t i = 0; i < 16; i++) {
if (array[i].lastCycle > reqCycle + 500) continue; // warm prefetches, not even a candidate
/uint64_t score = (reqCycle - array[i].lastCycle)(3 - array[i].conf.counter());
if (score > candScore) {
cand = i;
candScore = score;
}*/
if (array[i].ts < candScore) { // just LRU
cand = i;
candScore = array[i].ts;
}
}
if (cand < 16) {
idx = cand;
array[idx].alloc(reqCycle);
array[idx].lastPos = pos;
array[idx].ts = timestamp++;
tag[idx] = pageAddr;
}
DBG("%s: MISS alloc idx %d", name.c_str(), idx);
}

Modeling the register file ports in the OoO core

Hi,

if I understand correctly, in the OoO core, there are 3 read ports in the physical register file (RF_READS_PER_CYCLE = 3), and the the effects of the read stalls are also simulated (if (curCycleRFReads > RF_READS_PER_CYCLE) { ... ).

But, How many write ports are there, and how are they simulated?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.