harvard-acc / smaug Goto Github PK

View Code? Open in Web Editor NEW

89.0 7.0 27.0 274.07 MB

SMAUG: Simulating Machine Learning Applications Using Gem5-Aladdin

Home Page: https://harvard-acc.github.io/smaug_docs

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.39% Makefile 0.08% Python 14.89% C++ 63.92% C 20.72% PureBasic 0.01%

hardware-accelerators gem5-aladdin deep-learning simulator end-to-end-simulation machine-learning

smaug's People

Contributors

Stargazers

Watchers

smaug's Issues

Flaky tests caused by floating point precision loss

https://travis-ci.org/github/harvard-acc/smaug/builds/702716460 failed because one floating point element was off by 0.001 (which is greater than the required 3 decimal places of accuracy):

======================================================================
FAIL: test_bahdanau_attention (__main__.AttentionTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./build/smaug/python/ops/attention_test.py", line 67, in test_bahdanau_attention
    self.runAndValidate(graph, tf_attention)
  File "/home/travis/build/harvard-acc/smaug/smaug/python/smaug_test.py", line 85, in runAndValidate
    assert_array_almost_equal(expected_output, sg_output, decimal=3)
  File "/home/travis/.local/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 1044, in assert_array_almost_equal
    precision=decimal)
  File "/home/travis/.local/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 840, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 3 decimals
Mismatched elements: 1 / 64 (1.56%)
Max absolute difference: 0.001953
Max relative difference: 0.02615
 x: array([[-0.566, -0.007, -0.665, -0.778, -0.135, -0.739, -0.725, -0.221,
        -0.307, -0.094, -0.493,  0.492,  0.282, -0.381, -0.438, -0.242,
         0.327, -0.43 ,  0.454, -0.639,  0.295, -0.207,  1.404, -0.821,...
 y: array([[-0.566, -0.008, -0.665, -0.779, -0.135, -0.74 , -0.725, -0.222,
        -0.307, -0.094, -0.493,  0.491,  0.283, -0.38 , -0.438, -0.241,
         0.327, -0.431,  0.453, -0.639,  0.296, -0.207,  1.402, -0.82 ,...

Difference is between -0.221 and -0.222. This test did pass before though, so it's worth tracking down where the flakiness is coming from. If it's code we own, it should be possible to always reproduce the same exact numbers; if the difference is coming from TF code, then it's going to be harder to debug, in which case we should just loosen the required accuracy.

Move proto types into separate protos.

Currently we put everything in the types.proto, but the generated "types_pb2" is a very generic name and isn't the most logical place to put everything it has. It's worth moving the types into separate protos. For example, the OpType could go to ops.proto, and DataLayouts could go in tensor.proto, etc.

Add a reference implementation with simple tiling for demonstration purposes

The current reference backend doesn't do any tiling on the input, so it's only good for verifying functional correctness, but it's not well suited for simulation (although it is possible). It would be helpful for users to have a reference implementation that implements some very simple tiling, so that we can have a fixed simulation configuration.

Update Dockerfile and image with doc generator prereqs

We currently install doxygen and Sphinx/Breathe at CI time. We should just bake this into the Docker image. Then we can simplify the CI config and we can test build in the image too.

Yuan - assigning to you. I've lost SSH access to the lab machines again so it will be easier for you to do.

Fail to replace accelerator spad with cache: `RuntimeError: Unknown array "inputs".`

Hi,
I'm trying to replace private spad in accelerators with private cache, according to Sam's suggestion in this thread. I also got from this thread that Allcache policy is currently removed, so I tried using AllAcp policy.

I modified smv-accel.cfg, replacing all the lines starting with partition,cyclic with cache,xxx,4 (where xxx is the name of array, and the 4 I guess means 4 bytes, since aladdin is using fp32.). However, stderr gives me something like Unknown array "inputs". Please ensure that you have declared this array in your Aladdin configuration file with the correct partition type, size, and partition factor, and that you have not renamed pointers in your code (except through function calls). in the last line.

And I also tried not using cache (i.e., the smv-accel.cfg remains unmodified so accelerators have private spads), which transfers data with the rest of the memory system using an ACP port. This case does not throw an error and outputs all the stats. So I guess I may have done something wrong with the cache + ACP setting. I'm attaching the zipped directories of both cases here for diagnostics.

Would appreciate if anyone could give some suggestions!

Workspace is not a singleton

Workspace is the global container of all operators and tensors in the network, and we generally just assume that there is only one. Should we enforce this property by making Workspace a singleton? I am not sure if it will impact the unit tests (since there may be a good reason to have multiple Workspaces in tests).

Errors when bulid gem5-aladdin with --debug-aladdin flag

Hi,

I would like to obtain the power trace of the gem5-aladdin. I checked the source code of aladdin and got the xxx_power_stats.txt in the stand-alone aladdin by --DDEBUG flag before aladdin building. But for gem5-aladdin building, I ran the command python2.7 `which scons` build/X86/gem5.opt PROTOCOL=MESI_Two_Level_aladdin --debug-aladdin -j8, but I got the error like this:

scons: Reading SConscript files ...
Warning: Warning: Your compiler doesn't support incremental linking and lto at the same time, so lto is being disabled. To force lto on anyway, use the --force-lto option. That will disable partial linking.
Info: Using Python config: /usr/bin/python2.7-config
Checking for C header file Python.h... yes
Checking for C library python2.7... yes
Checking for C library pthread... yes
Checking for C library dl... yes
Checking for C library util... yes
Checking for C library m... yes
Checking for accept(0,0,0) in C++ library None... yes
Checking for zlibVersion() in C++ library z... yes
Checking for GOOGLE_PROTOBUF_VERIFY_VERSION in C++ library protobuf... yes
Checking for C header file valgrind/valgrind.h... no
Checking for clock_nanosleep(0,0,NULL,NULL) in C library None... yes
Checking for timer_create(CLOCK_MONOTONIC, NULL, NULL) in C library None... no
Checking for timer_create(CLOCK_MONOTONIC, NULL, NULL) in C library rt... yes
Checking for C library tcmalloc... yes
Checking for C library readline... yes
Checking for char temp; backtrace_symbols_fd((void*)&temp, 0, 0) in C library None... yes
Checking for C library sqlite3... yes
Checking for C header file fenv.h... no
Warning: Header file <fenv.h> not found.
This host has no IEEE FP rounding mode control.
Checking for C header file png.h... no
Warning: Header file <png.h> not found.
This host has no libpng library.
Disabling support for PNG framebuffers.
Checking for C header file linux/kvm.h... no
Info: Compatible header file <linux/kvm.h> not found, disabling KVM support.
Checking for C header file linux/if_tun.h... no
Info: Compatible header file <linux/if_tun.h> not found.
Checking size of struct kvm_xsave ... no
Checking for member exclude_host in struct perf_event_attr...no
Checking for hdf5-serial using pkg-config... pkg-config not found
Checking for hdf5 using pkg-config... pkg-config not found
Checking for H5Fcreate("", 0, 0, 0) in C library hdf5... no
Warning: Couldn't find any HDF5 C++ libraries. Disabling
HDF5 support.
Checking whether i386 is declared... no
Checking whether x86_64 is declared... no
Warning: Unrecognized architecture for systemc.
Building in /workspace/gem5-aladdin/build/X86
Using saved variables file /workspace/gem5-aladdin/build/variables/X86
Warning: No IEEE FP rounding mode control in X86.
FP results may deviate slightly from other platforms.
scons: done reading SConscript files.
scons: Building targets ...
[ CXX] X86/sim/main.cc -> .o
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
[ CXX] X86/dev/io_device.cc -> .o
[ CXX] X86/dev/isa_fake.cc -> .o
[ CXX] X86/dev/dma_device.cc -> .o
[ CXX] X86/dev/intpin.cc -> .o
[ CXX] X86/dev/platform.cc -> .o
[ CXX] X86/dev/baddev.cc -> .o
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
[ CXX] X86/dev/intel_8254_timer.cc -> .o
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/io_device.cc:44:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/io_device.o] Error 1
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/isa_fake.hh:40,
from build/X86/dev/isa_fake.cc:35:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/dma_device.hh:53,
from build/X86/dev/dma_device.cc:46:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/isa_fake.o] Error 1
scons: *** [build/X86/dev/dma_device.o] Error 1
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/IntrControl.hh:6:0,
from build/X86/params/Platform.hh:6,
from build/X86/dev/platform.hh:43,
from build/X86/dev/platform.cc:32:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/platform.o] Error 1
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/baddev.hh:39,
from build/X86/dev/baddev.cc:35:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/baddev.o] Error 1
cc1plus: all warnings being treated as errors
scons: *** [build/X86/sim/main.o] Error 1
cc1plus: all warnings being treated as errors
scons: *** [build/X86/dev/intpin.o] Error 1
cc1plus: all warnings being treated as errors
scons: *** [build/X86/dev/intel_8254_timer.o] Error 1
scons: building terminated because of errors.

Here are my questions:

Is it possible to obtain the activities-based power trace like aladdin in smaug framework?
If so, could you give me any suggestions to solve these errors?

Support for Depthwise Conv layer

Hey,

Thanks for the amazing work! I would like to implement hardware-friendly DNNs like MobileNets, etc. I wanted to know if there is any upcoming support for depthwise conv layer? If no, may I know how can I add it?

Workspace::addTensor may leak memory if a Tensor name is reused

Workspace::addTensor can blindly overwrites an existing Tensor by the same name, causing memory leaks. We should enforce that no Tensor in the Network can have the same name.

The same goes for addTiledTensor.

Catch2 unit tests OOM when using the JUnit reporter

The JUnit reporters in Catch2 require the entire test to be run and all data to be buffered in memory before anything is written. For the large tiling tests, this can exceed several GB of RAM usage, causing tests to OOM in continuous integration.

Catch2 has an alternative XML format that, with a bit of work, should be convertible into JUnit XML. Filing this issue to keep track of it. For now we're going to just disable JUnit reports so CI can pass.

Improve documentation for SMV backend

In this forum post, a user ran into an issue with data types because it was not clear from documentation (either on the site or in the code) that the SMV backend only supports float16 as a data type for certain operators. This needs to be make more clear.

Can I simulate the accelerator whose clock frequency is larger than 1GHz

Hello, I think in the current framework. The cycle time of the accelerator is stored as integers, so the max accelerator clock frequency is 1GHz. Is it possible to simulate the accelerator with a higher frequency?

Add a padding operator

Sometimes tensors need to be manually padded in some dimensions. Add an operator like numpy.pad.

Systolic array loops when feature size is the same of kernel size

Hi, I have written on the gem5-aladdin group but to get to you faster I post also here.
If the feature size is say 1HWC and kernel size is NHWC, with HWC being the same numbers, the systolic array will loop.
I am reporting here since I use SMAUG but probably it would be better under gem5-aladdin.

Question about DMA setup time modeling

In the datapath, there is DMA setup time modeling, which delays the DMA transaction considering CPU cache flush or invalidation latency. However, to my knowledge, directory controller of MESI_Two_Level protocol sends an invalidate signal to the line owner when it receives DMA read or write request for the line in modified state (transition(M, DMA_READ, M_DRD) or transition(M, DMA_WRITE, M_DWR) in dir.sm). I think CPU cache flush/invalidation latency is duplicated if ignore_cache_flush is set to false. What do you think about it?

Prepare a docker image

Having a docker image is almost the only way we can make SMAUG usable.

Memory policy of AllCache causes SMAUG to crash in simulation

Reported by user [email protected]:

During examining SMAUG simulator, I noticed that three memory interfaces are possible: DMA, ACP and cache.

It seems that ACP supports I/O coherency, and cache supports full coherency.

I ran the minerva sample model with ACP interrface succesfully, but there is a problem when the memory interface is set to cache.

Following is the procedure I took.

Generate minerva pbtxt and pb file with the memory policy of 'AllCache' by modifying the Python model script like below:

with sg.Graph(name="minerva_smv_cache", backend="SMV", mem_policy=sg.AllCache) as graph:

Modify model_files so that 'topo_file' and 'params_file' point to the generated pbtxt and pb file.

Generate dynamic_trace_acc0.gz file with trace.sh script.

Modify 'memory_type' in gem5.cfg to cache

When simulation started, page mapping occurs like below:

40086105600: system.acc0_datapath: Setting host_a to memory type cache.
40086199200: system.acc0_datapath: Setting host_b to memory type cache.
40086235200: system.acc0_datapath: Setting host_results to memory type cache.
40090416000: system.acc0_datapath: Inserting array label mapping host_results -> vpn 0x3739a0, size 512.
40090416000: system.acc0_datapath: Mapping vaddr 0x3739a0 -> paddr 0x1a839a0.
40090416000: system.acc0_datapath: Inserting TLB entry vpn 0x373000 -> ppn 0x1a83000.
40092144000: system.acc0_datapath: Inserting array label mapping host_a -> vpn 0x3aafa0, size 1568.
40092144000: system.acc0_datapath: Mapping vaddr 0x3aafa0 -> paddr 0x1abafa0.
40092144000: system.acc0_datapath: Inserting TLB entry vpn 0x3aa000 -> ppn 0x1aba000.
40092144000: system.acc0_datapath: Mapping vaddr 0x3abfa0 -> paddr 0x1abbfa0.
40092144000: system.acc0_datapath: Inserting TLB entry vpn 0x3ab000 -> ppn 0x1abb000.
40093368000: system.acc0_datapath: Inserting array label mapping host_b -> vpn 0x3e4e80, size 25088.
40093368000: system.acc0_datapath: Mapping vaddr 0x3e4e80 -> paddr 0x1bd7e80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3e4000 -> ppn 0x1bd7000.
40093368000: system.acc0_datapath: Mapping vaddr 0x3e5e80 -> paddr 0x1bd8e80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3e5000 -> ppn 0x1bd8000.
...
40093368000: system.acc0_datapath: Mapping vaddr 0x3ebe80 -> paddr 0x1bdee80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3eb000 -> ppn 0x1bde000.

However, at the start of exeuction, memory access to a strange address occurs:

40094148000: system.acc0_datapath: issueTLBRequestTiming for trace addr: 0xd3a8c0

Thus, simulation fails with the error message below:

fatal: An error occurred during cache access to trace virtual address 0xd3a8c0 at node 70: Could not find a virtual address mapping for array "". Please ensure that you have called mapArrayToAccelerator() with the correct array name parameter.

Did I configure something wrong or misunderstand SMAUG operations?

I would really appreciate if you give some advice for it.

I am able to reproduce this issue. It looks like it mostly due to not correctly looking up the array name for a host memory access when it is accessed directly via virtual memory (aka caching). I suspect that since this memory policy has not been used very heavily in the past, the code regressed relative to DMA or ACP, which has seen heavier use. Still investigating.

Can we run multiple instances of NVDLA minerva model on smaug .

If we can run multiple instances 4-layer Minerva model at a time ? If yes how can we?

Migrate python error handling from assertions to exceptions

All of the user-input errors on the Python side are currently handled by assertions. We should use the idiomatic Python way of raising errors instead and leave assertions only to enforce invariants of the code that would otherwise indicate a bug.

Failed to run Smaug example

I am trying to the run the example that is provided in the Smaug tutorial. (Run a model in gem5-Aladdin simulation)

When I am trying to run the command: "build/bin/smaug-instrumented my_model_topo.pbtxt my_model_params.pb". I am getting the following output. I tried running above commands with other models like lenet5 and cifar but it was giving the same response. Can you please help me with this

Model topology file: my_model_topo.pbtxt
Model parameters file: my_model_params.pb
Number of accelerators: 1

  Loading the network model...

======================================================

  Summary of the network.

======================================================

Layer (type)iiiiiiiiiiiiiiiiiiiiiiiiiiii [[[Parameters[[

data_2 (Data)iiiiiiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[

data_1 (Data)iiiiiiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[

data (Data)iiiiiiiiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[

conv (Convolution3d)iiiiiiiiiiiiiiiiiiii [[[[[[288[[[[[[

max_pool (MaxPooling)iiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[

reorder (Reorder)iiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[

mat_mul (InnerProduct)iiiiiiiiiiiiiiiiii [[[[[62720[[[[[

======================================================
Tiling operators of the network...

smaug-instrumented: build/smaug/operators/smv/smv_convolution_tiling.cpp:241: static smaug::smv::TilingConfig smaug::smv::conv::TilingOptimizer::computeBasicTileShapes(smaug::SmvConvolutionOp *): Assertion `maxIt != fullConfigs.end() && "Failed to get best tiling config!"' failed.
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

Prepare tutorial showing how to simulate a network

We need a full tutorial of how to implement a new network and simulate it, going all the way from Python to C++ to gem5-Aladdin.

Simplify Python module naming for easier usage

The current syntax to build a simple model looks like:

from smaug.python.ops import array_ops, math_ops, nn_ops
from smaug.python.tensor import Tensor
from smaug.core import types_pb2

x = Tensor()
y = math_ops.add(x, x)
z = nn_ops.matmul(..., activation=types_pb2.RELU)

This isn't exactly the prettiest API. If this is the simple model, the complex ones are going to be kind of scary. Can we transform this to look something more like:

from smaug.ops import math, array, nn, cond  # these are module names
from smaug.core import Tensor

x = Tensor
y = math.add(x, x)
z = nn.matmul(..., activation = nn.ReLU)
w = nn.lstm(...)
a = cond.if(math.less(x, y), lambda x: x, lambda y: y)

Essentially, a bit less hierarchy and less boilerplate. There's a lot of _ops in the module names floating around which isn't super nice to look at, and having to use the proto types directly from types_pb2 is also not super friendly.

Some of this can be accomplished just by renaming a file from nn_ops.py to nn.py. Others may require a bit more work (e.g. hiding proto enums by using python enums instead until graph serialization time).

We should clarify which namespaces we want to have and what goes into each of them.

Prepare README for alpha open source release

The current README hasn't been updated in nearly two years and none of it is applicable anymore after the rewrite. Start with a clean slate, mark it as alpha software, and we'll go from there.

How to port Rinforcement learning algorithm on smaug?

Hello, I'm trying to port RL maddpg algorithm on smaug. Is there any documentation you can provide to follow for integrating the new algorithm?
Thanks

Clean up public APIs that may not need to be public or exist at all

In the course of updating docs, I've come across a number of public APIs that should either be removed or hidden. Listing them here to keep track and fix if necessary.

We have an allocateStorage(dataType) method which dispatches to an underlying templated allocateStorage<T> method. Is there any reason why the templated method needs to be public too?
TiledTensor has a ctor with no shape. Why?
Should _useRawTensor be part of the public API for TiledTensor? It's an optimization that is rarely useful and difficult for the user to understand.
Do we still need copyTensorData, which copies a linear region of memory, if we already have copyTensorRegion?
generateTiledTensor requires an Operator param, but we only use the name field. We should just pass the name directly.
There are two overloads on generateTiledTensorAndCopyData - I believe we only need the variadic templated one.
REGISTER_SPECIAL_OP is a macro used for the ref backend operators, but not for SMV. We may want to get rid of this since a user who comes across this may assume this is all that is required to add his/her operator to SMAUG.
MAYBE_UNUSED appears...unused.
ConvolutionOp implements an empty run method. Should this be pure virtual?

Simulation with ResNet fails

During simulation with ResNet, a segmentation fault occurs at gem5.
I created ResNet pb and pbtxt file by running smaug/experiments/models/imagenet-resnet/resnet_network.py
All configuration files are the same with minerva example, but only model_files was modfied so that it points to generated pb and pbtxt file.
Input trace was generated by running trace.sh

Below is the stdout log at the end.

Scheduling data (Data).
Scheduling data_1 (Data).
Scheduling data_10 (Data).
Scheduling data_100 (Data).
Scheduling data_101 (Data).
Scheduling data_102 (Data).
Scheduling data_103 (Data).
Scheduling data_104 (Data).
Scheduling data_105 (Data).
Scheduling data_106 (Data).
Scheduling data_107 (Data).
Scheduling data_108 (Data).
Scheduling data_109 (Data).

stderr log before the backtrace shows the following message.

gem5 has encountered a segmentation fault!

Please, let me know if I configured something wrong.
Thanks.

Clean up files from SMAUGv1

There are still quite a few files leftover from SMAUGv1 that we have not yet cleaned up. For example, we still have all the SMIV operators (and arch files!), compression/decompression code in utility, and other stuff too. We should either clean them up or decide what else to do with them.

harvard-acc / smaug Goto Github PK

smaug's People

Contributors

Stargazers

Watchers

Forkers

smaug's Issues

Model topology file: my_model_topo.pbtxt Model parameters file: my_model_params.pb Number of accelerators: 1

======================================================

====================================================== Tiling operators of the network...

Recommend Projects

Recommend Topics

Recommend Org

Model topology file: my_model_topo.pbtxt
Model parameters file: my_model_params.pb
Number of accelerators: 1

======================================================
Tiling operators of the network...