harvard-acc / smaug Goto Github PK
View Code? Open in Web Editor NEWSMAUG: Simulating Machine Learning Applications Using Gem5-Aladdin
Home Page: https://harvard-acc.github.io/smaug_docs
License: BSD 3-Clause "New" or "Revised" License
SMAUG: Simulating Machine Learning Applications Using Gem5-Aladdin
Home Page: https://harvard-acc.github.io/smaug_docs
License: BSD 3-Clause "New" or "Revised" License
https://travis-ci.org/github/harvard-acc/smaug/builds/702716460 failed because one floating point element was off by 0.001 (which is greater than the required 3 decimal places of accuracy):
======================================================================
FAIL: test_bahdanau_attention (__main__.AttentionTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./build/smaug/python/ops/attention_test.py", line 67, in test_bahdanau_attention
self.runAndValidate(graph, tf_attention)
File "/home/travis/build/harvard-acc/smaug/smaug/python/smaug_test.py", line 85, in runAndValidate
assert_array_almost_equal(expected_output, sg_output, decimal=3)
File "/home/travis/.local/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 1044, in assert_array_almost_equal
precision=decimal)
File "/home/travis/.local/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 840, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 3 decimals
Mismatched elements: 1 / 64 (1.56%)
Max absolute difference: 0.001953
Max relative difference: 0.02615
x: array([[-0.566, -0.007, -0.665, -0.778, -0.135, -0.739, -0.725, -0.221,
-0.307, -0.094, -0.493, 0.492, 0.282, -0.381, -0.438, -0.242,
0.327, -0.43 , 0.454, -0.639, 0.295, -0.207, 1.404, -0.821,...
y: array([[-0.566, -0.008, -0.665, -0.779, -0.135, -0.74 , -0.725, -0.222,
-0.307, -0.094, -0.493, 0.491, 0.283, -0.38 , -0.438, -0.241,
0.327, -0.431, 0.453, -0.639, 0.296, -0.207, 1.402, -0.82 ,...
Difference is between -0.221 and -0.222. This test did pass before though, so it's worth tracking down where the flakiness is coming from. If it's code we own, it should be possible to always reproduce the same exact numbers; if the difference is coming from TF code, then it's going to be harder to debug, in which case we should just loosen the required accuracy.
Currently we put everything in the types.proto, but the generated "types_pb2" is a very generic name and isn't the most logical place to put everything it has. It's worth moving the types into separate protos. For example, the OpType could go to ops.proto, and DataLayouts could go in tensor.proto, etc.
The current reference backend doesn't do any tiling on the input, so it's only good for verifying functional correctness, but it's not well suited for simulation (although it is possible). It would be helpful for users to have a reference implementation that implements some very simple tiling, so that we can have a fixed simulation configuration.
We currently install doxygen and Sphinx/Breathe at CI time. We should just bake this into the Docker image. Then we can simplify the CI config and we can test build in the image too.
Yuan - assigning to you. I've lost SSH access to the lab machines again so it will be easier for you to do.
Hi,
I'm trying to replace private spad in accelerators with private cache, according to Sam's suggestion in this thread. I also got from this thread that Allcache policy is currently removed, so I tried using AllAcp policy.
I modified smv-accel.cfg, replacing all the lines starting with partition,cyclic
with cache,xxx,4
(where xxx is the name of array, and the 4
I guess means 4 bytes, since aladdin is using fp32.). However, stderr gives me something like Unknown array "inputs". Please ensure that you have declared this array in your Aladdin configuration file with the correct partition type, size, and partition factor, and that you have not renamed pointers in your code (except through function calls).
in the last line.
And I also tried not using cache (i.e., the smv-accel.cfg remains unmodified so accelerators have private spads), which transfers data with the rest of the memory system using an ACP port. This case does not throw an error and outputs all the stats. So I guess I may have done something wrong with the cache + ACP setting. I'm attaching the zipped directories of both cases here for diagnostics.
Would appreciate if anyone could give some suggestions!
Workspace is the global container of all operators and tensors in the network, and we generally just assume that there is only one. Should we enforce this property by making Workspace a singleton? I am not sure if it will impact the unit tests (since there may be a good reason to have multiple Workspaces in tests).
Hi,
I would like to obtain the power trace of the gem5-aladdin. I checked the source code of aladdin and got the xxx_power_stats.txt in the stand-alone aladdin by --DDEBUG flag before aladdin building. But for gem5-aladdin building, I ran the command python2.7 `which scons` build/X86/gem5.opt PROTOCOL=MESI_Two_Level_aladdin --debug-aladdin -j8
, but I got the error like this:
scons: Reading SConscript files ...
Warning: Warning: Your compiler doesn't support incremental linking and lto at the same time, so lto is being disabled. To force lto on anyway, use the --force-lto option. That will disable partial linking.
Info: Using Python config: /usr/bin/python2.7-config
Checking for C header file Python.h... yes
Checking for C library python2.7... yes
Checking for C library pthread... yes
Checking for C library dl... yes
Checking for C library util... yes
Checking for C library m... yes
Checking for accept(0,0,0) in C++ library None... yes
Checking for zlibVersion() in C++ library z... yes
Checking for GOOGLE_PROTOBUF_VERIFY_VERSION in C++ library protobuf... yes
Checking for C header file valgrind/valgrind.h... no
Checking for clock_nanosleep(0,0,NULL,NULL) in C library None... yes
Checking for timer_create(CLOCK_MONOTONIC, NULL, NULL) in C library None... no
Checking for timer_create(CLOCK_MONOTONIC, NULL, NULL) in C library rt... yes
Checking for C library tcmalloc... yes
Checking for C library readline... yes
Checking for char temp; backtrace_symbols_fd((void*)&temp, 0, 0) in C library None... yes
Checking for C library sqlite3... yes
Checking for C header file fenv.h... no
Warning: Header file <fenv.h> not found.
This host has no IEEE FP rounding mode control.
Checking for C header file png.h... no
Warning: Header file <png.h> not found.
This host has no libpng library.
Disabling support for PNG framebuffers.
Checking for C header file linux/kvm.h... no
Info: Compatible header file <linux/kvm.h> not found, disabling KVM support.
Checking for C header file linux/if_tun.h... no
Info: Compatible header file <linux/if_tun.h> not found.
Checking size of struct kvm_xsave ... no
Checking for member exclude_host in struct perf_event_attr...no
Checking for hdf5-serial using pkg-config... pkg-config not found
Checking for hdf5 using pkg-config... pkg-config not found
Checking for H5Fcreate("", 0, 0, 0) in C library hdf5... no
Warning: Couldn't find any HDF5 C++ libraries. Disabling
HDF5 support.
Checking whether i386 is declared... no
Checking whether x86_64 is declared... no
Warning: Unrecognized architecture for systemc.
Building in /workspace/gem5-aladdin/build/X86
Using saved variables file /workspace/gem5-aladdin/build/variables/X86
Warning: No IEEE FP rounding mode control in X86.
FP results may deviate slightly from other platforms.
scons: done reading SConscript files.
scons: Building targets ...
[ CXX] X86/sim/main.cc -> .o
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
[ CXX] X86/dev/io_device.cc -> .o
[ CXX] X86/dev/isa_fake.cc -> .o
[ CXX] X86/dev/dma_device.cc -> .o
[ CXX] X86/dev/intpin.cc -> .o
[ CXX] X86/dev/platform.cc -> .o
[ CXX] X86/dev/baddev.cc -> .o
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
[ CXX] X86/dev/intel_8254_timer.cc -> .o
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/io_device.cc:44:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/io_device.o] Error 1
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/isa_fake.hh:40,
from build/X86/dev/isa_fake.cc:35:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/dma_device.hh:53,
from build/X86/dev/dma_device.cc:46:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/isa_fake.o] Error 1
scons: *** [build/X86/dev/dma_device.o] Error 1
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/IntrControl.hh:6:0,
from build/X86/params/Platform.hh:6,
from build/X86/dev/platform.hh:43,
from build/X86/dev/platform.cc:32:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/platform.o] Error 1
:0:6: error: ISO C++11 requires whitespace after the macro name [-Werror]
In file included from build/X86/params/PioDevice.hh:6:0,
from build/X86/params/BasicPioDevice.hh:10,
from build/X86/dev/io_device.hh:48,
from build/X86/dev/baddev.hh:39,
from build/X86/dev/baddev.cc:35:
build/X86/params/System.hh:20:10: fatal error: params/KvmVM.hh: No such file or directory
#include "params/KvmVM.hh"
^~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
compilation terminated.
scons: *** [build/X86/dev/baddev.o] Error 1
cc1plus: all warnings being treated as errors
scons: *** [build/X86/sim/main.o] Error 1
cc1plus: all warnings being treated as errors
scons: *** [build/X86/dev/intpin.o] Error 1
cc1plus: all warnings being treated as errors
scons: *** [build/X86/dev/intel_8254_timer.o] Error 1
scons: building terminated because of errors.
Here are my questions:
Hey,
Thanks for the amazing work! I would like to implement hardware-friendly DNNs like MobileNets, etc. I wanted to know if there is any upcoming support for depthwise conv layer? If no, may I know how can I add it?
Workspace::addTensor
can blindly overwrites an existing Tensor by the same name, causing memory leaks. We should enforce that no Tensor in the Network can have the same name.
The same goes for addTiledTensor
.
The JUnit reporters in Catch2 require the entire test to be run and all data to be buffered in memory before anything is written. For the large tiling tests, this can exceed several GB of RAM usage, causing tests to OOM in continuous integration.
Catch2 has an alternative XML format that, with a bit of work, should be convertible into JUnit XML. Filing this issue to keep track of it. For now we're going to just disable JUnit reports so CI can pass.
In this forum post, a user ran into an issue with data types because it was not clear from documentation (either on the site or in the code) that the SMV backend only supports float16 as a data type for certain operators. This needs to be make more clear.
Hello, I think in the current framework. The cycle time of the accelerator is stored as integers, so the max accelerator clock frequency is 1GHz. Is it possible to simulate the accelerator with a higher frequency?
Sometimes tensors need to be manually padded in some dimensions. Add an operator like numpy.pad.
Hi, I have written on the gem5-aladdin group but to get to you faster I post also here.
If the feature size is say 1HWC and kernel size is NHWC, with HWC being the same numbers, the systolic array will loop.
I am reporting here since I use SMAUG but probably it would be better under gem5-aladdin.
In the datapath, there is DMA setup time modeling, which delays the DMA transaction considering CPU cache flush or invalidation latency. However, to my knowledge, directory controller of MESI_Two_Level protocol sends an invalidate signal to the line owner when it receives DMA read or write request for the line in modified state (transition(M, DMA_READ, M_DRD) or transition(M, DMA_WRITE, M_DWR) in dir.sm). I think CPU cache flush/invalidation latency is duplicated if ignore_cache_flush is set to false. What do you think about it?
Having a docker image is almost the only way we can make SMAUG usable.
Reported by user [email protected]:
During examining SMAUG simulator, I noticed that three memory interfaces are possible: DMA, ACP and cache.
It seems that ACP supports I/O coherency, and cache supports full coherency.
I ran the minerva sample model with ACP interrface succesfully, but there is a problem when the memory interface is set to cache.
Following is the procedure I took.
Generate minerva pbtxt and pb file with the memory policy of 'AllCache' by modifying the Python model script like below:
with sg.Graph(name="minerva_smv_cache", backend="SMV", mem_policy=sg.AllCache) as graph:
Modify model_files so that 'topo_file' and 'params_file' point to the generated pbtxt and pb file.
Generate dynamic_trace_acc0.gz file with trace.sh script.
Modify 'memory_type' in gem5.cfg to cache
When simulation started, page mapping occurs like below:
40086105600: system.acc0_datapath: Setting host_a to memory type cache.
40086199200: system.acc0_datapath: Setting host_b to memory type cache.
40086235200: system.acc0_datapath: Setting host_results to memory type cache.
40090416000: system.acc0_datapath: Inserting array label mapping host_results -> vpn 0x3739a0, size 512.
40090416000: system.acc0_datapath: Mapping vaddr 0x3739a0 -> paddr 0x1a839a0.
40090416000: system.acc0_datapath: Inserting TLB entry vpn 0x373000 -> ppn 0x1a83000.
40092144000: system.acc0_datapath: Inserting array label mapping host_a -> vpn 0x3aafa0, size 1568.
40092144000: system.acc0_datapath: Mapping vaddr 0x3aafa0 -> paddr 0x1abafa0.
40092144000: system.acc0_datapath: Inserting TLB entry vpn 0x3aa000 -> ppn 0x1aba000.
40092144000: system.acc0_datapath: Mapping vaddr 0x3abfa0 -> paddr 0x1abbfa0.
40092144000: system.acc0_datapath: Inserting TLB entry vpn 0x3ab000 -> ppn 0x1abb000.
40093368000: system.acc0_datapath: Inserting array label mapping host_b -> vpn 0x3e4e80, size 25088.
40093368000: system.acc0_datapath: Mapping vaddr 0x3e4e80 -> paddr 0x1bd7e80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3e4000 -> ppn 0x1bd7000.
40093368000: system.acc0_datapath: Mapping vaddr 0x3e5e80 -> paddr 0x1bd8e80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3e5000 -> ppn 0x1bd8000.
...
40093368000: system.acc0_datapath: Mapping vaddr 0x3ebe80 -> paddr 0x1bdee80.
40093368000: system.acc0_datapath: Inserting TLB entry vpn 0x3eb000 -> ppn 0x1bde000.However, at the start of exeuction, memory access to a strange address occurs:
40094148000: system.acc0_datapath: issueTLBRequestTiming for trace addr: 0xd3a8c0
Thus, simulation fails with the error message below:
fatal: An error occurred during cache access to trace virtual address 0xd3a8c0 at node 70: Could not find a virtual address mapping for array "". Please ensure that you have called mapArrayToAccelerator() with the correct array name parameter.
Did I configure something wrong or misunderstand SMAUG operations?
I would really appreciate if you give some advice for it.
I am able to reproduce this issue. It looks like it mostly due to not correctly looking up the array name for a host memory access when it is accessed directly via virtual memory (aka caching). I suspect that since this memory policy has not been used very heavily in the past, the code regressed relative to DMA or ACP, which has seen heavier use. Still investigating.
If we can run multiple instances 4-layer Minerva model at a time ? If yes how can we?
All of the user-input errors on the Python side are currently handled by assertions. We should use the idiomatic Python way of raising errors instead and leave assertions only to enforce invariants of the code that would otherwise indicate a bug.
I am trying to the run the example that is provided in the Smaug tutorial. (Run a model in gem5-Aladdin simulation)
When I am trying to run the command: "build/bin/smaug-instrumented my_model_topo.pbtxt my_model_params.pb". I am getting the following output. I tried running above commands with other models like lenet5 and cifar but it was giving the same response. Can you please help me with this
Loading the network model...
Summary of the network.
======================================================
Layer (type)iiiiiiiiiiiiiiiiiiiiiiiiiiii [[[Parameters[[
data_2 (Data)iiiiiiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[
data_1 (Data)iiiiiiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[
data (Data)iiiiiiiiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[
conv (Convolution3d)iiiiiiiiiiiiiiiiiiii [[[[[[288[[[[[[
max_pool (MaxPooling)iiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[
reorder (Reorder)iiiiiiiiiiiiiiiiiiiiiii [[[[[[[0[[[[[[[
mat_mul (InnerProduct)iiiiiiiiiiiiiiiiii [[[[[62720[[[[[
smaug-instrumented: build/smaug/operators/smv/smv_convolution_tiling.cpp:241: static smaug::smv::TilingConfig smaug::smv::conv::TilingOptimizer::computeBasicTileShapes(smaug::SmvConvolutionOp *): Assertion `maxIt != fullConfigs.end() && "Failed to get best tiling config!"' failed.
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted
We need a full tutorial of how to implement a new network and simulate it, going all the way from Python to C++ to gem5-Aladdin.
The current syntax to build a simple model looks like:
from smaug.python.ops import array_ops, math_ops, nn_ops
from smaug.python.tensor import Tensor
from smaug.core import types_pb2
x = Tensor()
y = math_ops.add(x, x)
z = nn_ops.matmul(..., activation=types_pb2.RELU)
This isn't exactly the prettiest API. If this is the simple model, the complex ones are going to be kind of scary. Can we transform this to look something more like:
from smaug.ops import math, array, nn, cond # these are module names
from smaug.core import Tensor
x = Tensor
y = math.add(x, x)
z = nn.matmul(..., activation = nn.ReLU)
w = nn.lstm(...)
a = cond.if(math.less(x, y), lambda x: x, lambda y: y)
Essentially, a bit less hierarchy and less boilerplate. There's a lot of _ops
in the module names floating around which isn't super nice to look at, and having to use the proto types directly from types_pb2
is also not super friendly.
Some of this can be accomplished just by renaming a file from nn_ops.py
to nn.py
. Others may require a bit more work (e.g. hiding proto enums by using python enums instead until graph serialization time).
We should clarify which namespaces we want to have and what goes into each of them.
The current README hasn't been updated in nearly two years and none of it is applicable anymore after the rewrite. Start with a clean slate, mark it as alpha software, and we'll go from there.
Hello, I'm trying to port RL maddpg algorithm on smaug. Is there any documentation you can provide to follow for integrating the new algorithm?
Thanks
In the course of updating docs, I've come across a number of public APIs that should either be removed or hidden. Listing them here to keep track and fix if necessary.
allocateStorage(dataType)
method which dispatches to an underlying templated allocateStorage<T>
method. Is there any reason why the templated method needs to be public too?TiledTensor
has a ctor with no shape. Why?_useRawTensor
be part of the public API for TiledTensor
? It's an optimization that is rarely useful and difficult for the user to understand.copyTensorData
, which copies a linear region of memory, if we already have copyTensorRegion
?generateTiledTensor
requires an Operator
param, but we only use the name field. We should just pass the name directly.generateTiledTensorAndCopyData
- I believe we only need the variadic templated one.REGISTER_SPECIAL_OP
is a macro used for the ref backend operators, but not for SMV. We may want to get rid of this since a user who comes across this may assume this is all that is required to add his/her operator to SMAUG.MAYBE_UNUSED
appears...unused.ConvolutionOp
implements an empty run
method. Should this be pure virtual?During simulation with ResNet, a segmentation fault occurs at gem5.
I created ResNet pb and pbtxt file by running smaug/experiments/models/imagenet-resnet/resnet_network.py
All configuration files are the same with minerva example, but only model_files was modfied so that it points to generated pb and pbtxt file.
Input trace was generated by running trace.sh
Below is the stdout log at the end.
Scheduling data (Data).
Scheduling data_1 (Data).
Scheduling data_10 (Data).
Scheduling data_100 (Data).
Scheduling data_101 (Data).
Scheduling data_102 (Data).
Scheduling data_103 (Data).
Scheduling data_104 (Data).
Scheduling data_105 (Data).
Scheduling data_106 (Data).
Scheduling data_107 (Data).
Scheduling data_108 (Data).
Scheduling data_109 (Data).
stderr log before the backtrace shows the following message.
gem5 has encountered a segmentation fault!
Please, let me know if I configured something wrong.
Thanks.
There are still quite a few files leftover from SMAUGv1 that we have not yet cleaned up. For example, we still have all the SMIV operators (and arch
files!), compression/decompression code in utility
, and other stuff too. We should either clean them up or decide what else to do with them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.