Giter Site home page Giter Site logo

stonne-simulator / stonne Goto Github PK

View Code? Open in Web Editor NEW
107.0 5.0 26.0 83.12 MB

STONNE: A Simulation Tool for Neural Networks Engines

License: MIT License

Python 31.70% Shell 0.56% C++ 54.51% Makefile 0.01% C 4.08% Dockerfile 0.07% Batchfile 0.04% PowerShell 0.01% Starlark 0.21% CMake 1.37% Java 0.36% Cuda 5.95% Assembly 0.33% Objective-C 0.01% GLSL 0.03% Metal 0.08% Objective-C++ 0.47% PureBasic 0.21% LLVM 0.01% Yacc 0.01%
simulator dnn hardware-designs hardware-acceleration

stonne's Introduction

STONNE: A Simulation Tool for Neural Networks Engines

Bibtex

Please, if you use STONNE, please cite us:

@INPROCEEDINGS{STONNE21,
  author =       {Francisco Mu{\~n}oz-Matr{\'i}nez and Jos{\'e} L. Abell{\'a}n and Manuel E. Acacio and Tushar Krishna},
  title =        {STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators},
  booktitle =    {2021 IEEE International Symposium on Workload Characterization (IISWC)}, 
  year =         {2021},
  volume =       {},
  number =       {},
  pages =        {},
}

Docker image

We have created a docker image for STONNE! Everything is installed in the image so using the simulator is much easier. Also, this image comes with OMEGA and SST-STONNE simulators. More information can be found in our Dockerfile repository.

To pull and run the container, just type the following command:

docker run -it stonnesimulator/stonne-simulators

What is STONNE

The design of specialized architectures for accelerating the inference procedure of Deep Neural Networks (DNNs) is a booming area of research nowadays. While first-generation accelerator proposals used simple fixed dataflows tailored for dense DNNs, more recent architectures have argued for flexibility to efficiently support a wide variety of layer types, dimensions, and sparsity. As the complexity of these accelerators grows, it becomes more and more appealing for researchers to have cycle-level simulation tools at their disposal to allow for fast and accurate design-space exploration, and rapid quantification of the efficacy of architectural enhancements during the early stages of a design. To this end, we present STONNE (Simulation TOol of Neural Network Engines), a cycle-level, highly-modular and highly-extensible simulation framework that can plug into any high-level DNN framework as an accelerator device and perform end-to-end evaluation of flexible accelerator microarchitectures with sparsity support, running complete DNN models.

Design of STONNE

alt text

STONNE is a cycle-level microarchitectural simulator for flexible DNN inference accelerators. To allow for end-to-end evaluations, the simulator is connected with a Deep Learning (DL) framework (Caffe and Pytorch DL frameworks in the current version). Therefore, STONNE can fully execute any dense and sparse DNN models supported by the DL framework that uses as its front-end. The simulator has been written entirely in C++, following the well-known GRASP and SOLID programming principles of object-oriented design. This has simplified its development and makes it easier the implementation of any kind of DNN inference accelerator microarchitecture, tile configuration mappings and dataflows.

The figure presented above shows a high-level view of STONNE with the three major modules involved in the end-to-end simulation flow:

Flexible DNN Architecture

This constitutes the principal block of STONNE (see the central block in the figure), and it is mainly composed of the modeled microarchitecture of the flexible DNN accelerator (Simulation Engine) to simulate. The source code of this main block is the foundation of the simulator and is located in the 'stonne/src' folder. This contains different classes for the different components of the architecture as well as the principal class 'STONNEModel' which begins the execution of the simulator. This file contains the main functions to call the simulator and therefore can be view as the 'STONNE API' which is the manner in which the input module can interact with the simulated accelerator, configuring its simulation engine according to the user configuration file, enabling different execution modes such as LFF, loading a certain layer and tile, and transferring the weights and the inputs to the simulator memory.

Input Module

Due to the flexibility that the \texttt{STONNE API} provides, the simulator can be fed easily using any of the well-known DL frameworks already available. Currently the simulator supports both Caffe and Pytorch DL frameworks and both front-ends with its connections are located in the folders 'pytorch-frontend' and 'caffe-frontend' respectively. Later in this file, we will explain how to install and run every framework on STONNE.

Furthermore, since these DL frameworks require a more complicated installation and use, apart from this mode of execution, we have also enabled the "STONNE User Interface" that facilitates the execution of STONNE. Through this mode, the user is presented with a prompt to load any layer and tile parameters onto a selected instance of the simulator, and run it with random weights and input values. This allows for faster executions, facilitating rapid prototyping and debugging. This interface can be launched directly in the 'stonne' folder once compiled, and the code is located in 'src/main.cpp' file. Basically, it is a command line that according to hardware and dimensino parameters, allow to run the simulator with random tensors.

Output module

Once a simulation for a certain layer has been completed, this module is used for reporting simulation statistics such as performance, compute unit utilization, number of accesses to SRAM, wires and FIFOs, etc. Besides, this output module also reports the amount of energy consumed and the on-chip area required by the simulated architecture. These statistics obviously depend on the particular data format (e.g., fp16 or int8) utilized to represent the DNN model's parameters. So, STONNE supports different data formats in the whole end-to-end evaluation process and statistics report. For estimating both area and energy consumption, STONNE utilizes a table-based area and energy model, computing total energy using the cycle-level activity stats for each module. For the current table-based numbers existend in STONNE (see 'stonne/energy_tables/' path), we ran synthesis using Synopsys Design-Compiler and place-and-route using Cadence Innovus on each module inside the MAERI and SIGMA RTL to populate the table. Users can plug in the numbers for their own implementations as well.

STONNE Mapper: Module for automatic tile generation

Flexible architectures need to be configured previously to execute any layer, commonly using a tile specification. The specified tile will drastically vary the results obtained in each execution, so choosing a good configuration is an important step when doing simulations. However, finding an optimal configuration is a task that requieres so much time or a deep knowledge of how each mapping would work.

For this reasson, STONNE includes a module called STONNE Mapper. It is a mapper which can automatically generate, given the hardware configuration and the specification of the layer to be simualated, a tile configuration that will be close to the optimum one in most cases. When used, the module will generate an output file with some extra information of the process used to select the tile. STONNE Mapper is an excellent alternative for fast prototyping and a good helper in finding the optimal tile configuration when the search space is very large.

Currently, STONNE Mapper gives support to generate mappings of CONV, FC/DenseGEMM and SparseDense layers (how to use it in every case is explained later). For CONV mappings it integrates and use mRNA, another mapper designed for MAERI presented on ISPASS-2019. For FC/DenseGEMM and SparseDense it implements its own algorithms. In the most of the cases, the generated mappings gets more than the 90% of the performance from optimum configuration (based on the results obtained in our benchmarks), so it has a high trustly degree.

This module and all of the implementations were made in a Final Degree Project in June 2022. Any question about its use or implementation can be made by contacting the author directly (@Adrian-2105).

Supported Architectures

STONNE models all the major components required to build both first-generation rigid accelerators and next-generationflexible DNN accelerators. All the on-chip components are interconnected by using a three-tier network fabric composed of a Distribution Network(DN), a Multiplier Network (MN), and a Reduce Network(RN), inspired by the taxonomy of on-chip communication flows within DNN accelerators. These networks canbe configured to support any topology. Next, we describe the different topologies of the three networks (DN, MN and RN) currently supported in STONNE that are basic building blocks of state-of-the-art accelerators such as the Google’s TPU, Eyeriss-v2, ShDianNao, SCNN, MAERI and SIGMA, among others. These building blocks can also be seen in the figure presented below:

alt text

STONNE User Interface. How to run STONNE quickly.

The STONNE User Interface facilitates the execution of STONNE. Through this mode, the user is presented with a prompt to load any layer and tile parameters onto a selected instance of the simulator, and runs it with random tensors.

Installation

The installation of STONNE, along with its user interface, can be carried out by typing the next commands:

cd stonne
make all

These commands will generate a binary file stonne/stonne. This binary file can be executed to run layers and gemms with any dimensions and any hardware configuration. All the tensors are filled using random numbers.

How to run STONNE

Currently, STONNE runs 5 types of operations: Convolution Layers, FC Layers, Dense GEMMs, Sparse GEMMs and SparseDense GEMMs. Please, note that almost any kernel can be, in the end, mapped using these operations. Others operations such as pooling layers will be supported in the future. However, these are the operations that usually dominate the execution time in machine learning applications. Therefore, we believe that they are enough to perform a comprehensive and realistic exploration. Besides, note that a sparse convolution might be also supported as all the convolution layers can be converted into a GEMM operation using the im2col algorithm.

The syntax of a STONNE user interface command to run any of the available operations is as follows:

./stonne [-h | -CONV | -FC | -DenseGEMM | -SparseGEMM | SparseDense] [Hardware parameters] [Dimension and tile Parameters]

Help Menu

A help menu will be shown when running the next command:

./stonne -h: Obtain further information to run STONNE

Hardware Parameters

The hardware parameters are common for all the kernels. Other parameters can be easily implemented in the simulator. Some parameters are tailored to some specific architectures.

  • num_ms = [x]

    Number of multiplier switches (must be power of 2) (Flexible architecture like MAERI or SIGMA)

  • dn_bw = [x]

    Number of read ports in the SDMemory (must be power of 2) (All architectures)

  • rn_bw = [x]

    Number of write ports in the SDMemory (must be power of 2) (All architectures)

  • rn_type = [0=ASNETWORK, 1=FENETWORK, 2=TEMPORALRN]

    Type of the ReduceNetwork to be used (Not supported for SparseGEMM)

  • mn_type = [0=LINEAR, 1=OS_MESH]

    Type of Multiplier network to be used. Linear is for flexible architectures, OS_MESH for rigid architectures like TPU.

  • mem_ctrl = [MAERI_DENSE_WORKLOAD, SIGMA_SPARSE_GEMM, TPU_OS_DENSE, MAGMA_SPARSE_DENSE]

    Type of memory controller to be used

  • accumulation_buffer = [0,1]

    Enables the accumulation buffer. Mandatory in Rigid architectures. Also needs to be set to 1 for SparseDense (SpMM) execution.

  • print_stats = [0,1]

    Flag that enables the printing of the statistics

Dimension and tile Parameters

Obviously, the dimensions of the kernel depends on the type of the operation that is going to be run.

If you intend to use STONNE Mapper to generate the tile configuration, note that the tile parameters (T_x) will be ignored and STONNE will only use the configuration it generates. In the same way, if you use STONNE Mapper, there is not need for the user to manually specify the tile parameters.

Next, it is described the different parameters according to each supported operation:

  • CONV

    • layer_name = [CONV]

      Name of the layer to run. The output statistic file will be named accordingly

    • R = [x]

      Number of filter rows

    • S = [x]

      Number of filter columns

    • C =[x]

      Number of filter and input channels

    • K = [x]

      Number of filters and output channels

    • G = [x]

      Number of groups

    • N = [x]

      Number of inputs (Only 1 is supported so far)

    • X = [x]

      Number of input rows

    • Y = [x]

      Number of input columns

    • strides = [x]

      Stride value used in the layer

    • T_R = [x]

      Number of filter rows mapped at a time

    • T_S = [x]

      Number of filter columns mapped at a time

    • T_C = [x]

      Number of filter and input channels per group mapped at a time

    • T_K = [x]

      Number of filters and output channels per group mapped at a time

    • T_G = [x]

      Number of groups mapped at a time

    • T_N = [x]

      Number of inputs mapped at a time (Only 1 is supported so far)

    • T_X_ = [x]

      Number of input rows mapped at a time

    • T_Y_ = [x]

      Number of input columns mapped a time

    • STONNE Mapper

      • If used, the following parameters can be skipped: strides, T_R, T_S, T_C, T_K, T_G, T_N, T_X_ and T_Y_.

      • When using it, it is mandatory to also use the option -accumulation_buffer=1 to ensure that the tile configuration can adjust to the hardware resources.

      • generate_tile = [0 | none, 1 | performance, 2 | energy, 3 | energy_efficiency]

        STONNE Mapper is disabled by default (0, none). To use it you must to specify a target (1, 2 or 3; also the names can be used). The targets for the tile generation on CONV layers can be: performance (1) for maximize the performance, energy (2) for minimize the energy consumption and energy-efficiency (3) for get a balance between performance and energy.

      • generator = [Auto, mRNA]

        [Testing option] At the moment, only mRNA algorithm is supported for these type of layers.

    • Constraints

      Please make sure that these next constraints are followed (i.e., tile dimension must be multiple of its dimension):

      If the architecture to be run is flexible (MAERI or SIGMA):

      1. T_R % R = 0
      2. T_S % S = 0
      3. T_C % C = 0
      4. T_K % K = 0
      5. T_G % G = 0
      6. T_X_ % ((X - R + strides) / strides) = 0
      7. T_Y_ % ((Y - S + strides) / strides) = 0
  • FC

    • layer_name = [FC]

      Name of the layer to run. The output statistic file will be called by this name

    • M = [x]

      Number of output neurons

    • N = [x]

      Batch size

    • K = [x]

      Number of input neurons

    • T_M = [x]

      Number of output neurons mapped at a time

    • T_N = [x]

      Number of batches mapped at a time

    • T_K = [x]

      Number of input neurons mapped at a time

    • STONNE Mapper

      • If used, the following parameters can be skipped: T_M, T_N and T_K.

      • When using it, it is mandatory to also use the option -accumulation_buffer=1 to ensure that the tile configuration can adjust to the hardware resources.

      • generate_tile = [0 | none, 1 | performance]

        STONNE Mapper is disabled by default (0, none). To use it you must to specify a target (1; also the names can be used). The only target available at the moment for the tile generation on FC/DenseGEMM layers is performance (1) for maximize the performance. However, the generated mapping is also the best mapping for the other targets for this type of layers.

      • generator = [Auto, StonneMapper, mRNA]

        [Testing option] The user can select which algorithm to use to generate the mapping. By default, StonneMapper is always used because it gets better results in all cases (because it is a direct improvement of mRNA). This option should only be used if it is needed to test the mRNA tile generation for these type of layers.

  • DenseGEMM

    • layer_name = [DenseGEMM]

      Name of the layer to run. The output statistic file will be called by this name

    • M = [x]

      Number of rows MK matrix

    • N = [x]

      Number of columns KN matrix

    • K = [x]

      Number of columns MK and rows KN matrix (cluster size)

    • T_M = [x]

      Number of M rows mapped at a time

    • T_N = [x]

      Number of N columns at a time

    • T_K = [x]

      Number of K elements mapped at a time

    • STONNE Mapper

      • If used, the following parameters can be skipped: T_M, T_N and T_K.

      • When using it, it is mandatory to also use the option -accumulation_buffer=1 to ensure that the tile configuration can adjust to the hardware resources.

      • generate_tile = [0 | none, 1 | performance]

        STONNE Mapper is disabled by default (0, none). To use it you must to specify a target (1; also the names can be used). The only target available at the moment for the tile generation on FC/DenseGEMM layers is performance (1) for maximize the performance. However, the generated mapping is also the best mapping for the other targets for this type of layers.

      • generator = [Auto, StonneMapper, mRNA]

        [Testing option] The user can select which algorithm to use to generate the mapping. By default, StonneMapper is always used because it gets better results in all cases (because it is a direct improvement of mRNA). This option should only be used if it is needed to test the mRNA tile generation for these type of layers.

  • SparseGEMM

    • layer_name = [SparseGEMM]

      Name of the layer to run. The output statistic file will be called by this name

    • M = [x]

      Number of rows MK matrix

    • N = [x]

      Number of columns KN matrix

    • K = [x]

      Number of columns MK and rows KN matrix (cluster size)

    • MK_sparsity = [x]

      Percentage of sparsity MK matrix (0-100)

    • KN_sparsity = [x]

      Percentage of sparsity KN matrix (0-100)

    • dataflow = [MK_STA_KN_STR, MK_STR_KN_STA]

      Dataflow to use during operations

    • optimize = [0,1]

      Apply compiler-based optimizations

  • SparseDense

    • layer_name = [SparseDense]

      Name of the layer to run. The output statistic file will be called by this name

    • M = [x]

      Number of rows MK matrix

    • N = [x]

      Number of columns KN matrix

    • K = [x]

      Number of columns MK and rows KN matrix (cluster size)

    • MK_sparsity = [x]

      Percentage of sparsity MK matrix (0-100)

    • T_N = [x]

      Number of N columns mapped at a time

    • T_K = [x]

      Number of K elements mapped at a time

    • STONNE Mapper

      • If used, the following parameters can be skipped: T_N and T_K.

      • When using it, it is mandatory to also use the option -accumulation_buffer=1 to ensure that the tile configuration can adjust to the hardware resources.

      • generate_tile = [0 | none, 1 | performance]

        STONNE Mapper is disabled by default (0, none). To use it you must to specify a target (1; also the names can be used). The only target available at the moment for the tile generation on FC/DenseGEMM layers is performance (1) for maximize the performance. However, the generated mapping is also the best mapping for the other targets for this type of layers.

      • generator = [Auto, StonneMapper]

        [Testing option] At the moment, only StonneMapper algorithm is supported for these type of layers.

Examples

Example running a CONV layer (manual mapping):

./stonne -CONV -R=3 -S=3 -C=6 -G=1 -K=6 -N=1 -X=20 -Y=20 -T_R=3 -T_S=3 -T_C=1 -T_G=1 -T_K=1 -T_N=1 -T_X_=3 -T_Y_=1 -num_ms=64 -dn_bw=8 -rn_bw=8

Example running a CONV layer generating the tile with STONNE Mapper (energy target):

./stonne -CONV -R=3 -S=3 -C=6 -G=1 -K=6 -N=1 -X=20 -Y=20 -generate_tile=energy -num_ms=64 -dn_bw=8 -rn_bw=8 -accumulation_buffer=1

Example running a FC layer (manual mapping):

./stonne -FC -M=20 -N=20 -K=256 -num_ms=256 -dn_bw=64 -rn_bw=64 -T_K=64 -T_M=2 -T_N=1

Example running a FC layer generating the tile with STONNE Mapper (with mRNA algorithm and performance target):

./stonne -FC -M=20 -N=20 -K=256 -generate_tile=performance -generator=mRNA -num_ms=256 -dn_bw=64 -rn_bw=64 -accumulation_buffer=1

Example of running a DenseGEMM (manual mapping):

/stonne -DenseGEMM -M=20 -N=20 -K=256 -num_ms=256 -dn_bw=64 -rn_bw=64 -T_K=64 -T_M=2 -T_N=1

Example of running a DenseGEMM over TPU:

./stonne -DenseGEMM -M=4 -N=4 -K=16 -ms_rows=4 -ms_cols=4 -dn_bw=8 -rn_bw=16  -T_N=4 -T_M=1 -T_K=1 -accumulation_buffer=1 -rn_type="TEMPORALRN" -mn_type="OS_MESH" -mem_ctrl="TPU_OS_DENSE"

Example of running a SparseGEMM:

./stonne -SparseGEMM -M=20 -N=20 -K=256 -num_ms=128 -dn_bw=64 -rn_bw=64  -MK_sparsity=80 -KN_sparsity=10 -dataflow=MK_STA_KN_STR

Example of running a SparseDense (manual mapping):

./stonne -SparseDense -M=20 -N=20 -K=256 -MK_sparsity=80 -T_N=4 -T_K=32 -num_ms=128 -dn_bw=64 -rn_bw=64 -accumulation_buffer=1

Note that accumulation buffer needs to be set to 1 for the SparseDense case to work

Example of running a SparseDense generating the tile with STONNE Mapper (performance [1] target):

./stonne -SparseDense -M=20 -N=20 -K=256 -MK_sparsity=80 -generate_tile=1 -num_ms=128 -dn_bw=64 -rn_bw=64 -accumulation_buffer=1

Output

Every layer execution generates three files in the path in which the simulator has been executed (the env variable OUTPUT_DIR can be set to indicate another output path):

  • A JSON file with all the hardware statistics generated during the execution.

  • A counters file with the number of use of every component of the architecture generated. This can be utilized to generate the energy model.

  • [Only if STONNE Mapper was used] A brief report about the process made by the module to select an efficient tile and the tile used.

Note that after the execution, the results obtained in the output tensor by the simulator are compared with a CPU algorithm to ensure the correctness of the simulator. Note that if the simulator does not output the correct results, an assertion will raise at the end of the execution.

Generating Energy Numbers

In order to generate the energy consumption of the execution we have developed a Python script that takes in the counters file generated during the execution and a table-based energy model. The script is located in energy_tables folder and can be run by means of the next command:

./calculate_energy.py [-v] -table_file=<Energy numbers file> -counter_file=<Runtime counters file> -[out_file=<output file>]

The current energy numbers are located in the file energy_tables/energy_model.txt. We obtained the energy numbers through synthesis using Synopsys Design-Compiler and place-and-route using Cadence Innovus on each module inside the MAERI and SIGMA RTL to populate the table. Users can plug in the numbers for their own implementations as well.

PyTorch Frontend

At this point, the user must be familiar with the usage of STONNE and the set of statistics that the tool is able to output. However, with the STONNE user interface presented previously, the user must have realized that the inputs and outputs coming in and out in the simulator are random. Here, it is explained how to run real DNN models using pytorch and STONNE as a computing device.

The pytorch-frontend is located in the folder 'pytorch-frontend' and this basically contains the Pytorch official code Version 1.7 with some extra files to create the simulation operations and link them with the 'stonne/src' code. The current version of the frontend is so well-organized that running a pytorch DNN model on STONNE is straightforward.

Installation

First, you will need Python 3.6 or higher and a C++14 compiler. Also, we highly recommend to create an Anaconda environment before continue. Learn how to install it in their official documentation. Once installed, you can create and activate a new virtual environment with the next commands:

conda create --name stonne python=3.8 
conda activate stonne

Once the environment is ready, you have to install some installation dependencies before continue. You can install them with the next commands:

pip install pyyaml==6.0 setuptools==45.2.0 numpy==1.23.5 

Now you can start to install the PyTorch frontend. First, if you don't have CUDA on your computer, export the next variables:

export MAX_JOBS=1
export NO_CUDA=YES
export NO_CUDA=1

Second, you can build and install PyTorch (torch) from sources using the next commands (it takes around 20 minutes):

cd pytorch-frontend/
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

Third, you can build and install our PyTorch frontend (torch_stonne) package using the next commands:

cd stonne_connection/
python setup.py install

Finally, to be able to run all the benchmarks, you will need to install some extra dependencies. We recommend you to install the specific versions listed below in order to avoid package dependency problems and overwriting the previous torch installation. You can install them with the next commands:

pip install transformers==4.25.1
pip install torchvision==0.8.2 --no-deps

You can check that the installation was successful by running the next test simulation:

python $STONNE_ROOT/pytorch-frontend/stonne_connection/test_simulation.py

Running PyTorch in STONNE

Running pytorch using STONNE as a device is almost straightforward. Let's assume we define a DNN model using PyTorch. This model is composed of a single and simple convolutional layer. Next, we present this code:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(5,5,5, groups=1) # in_channels=5, out_channels=5, filter_size=5
    def forward(self, x):
        x = self.conv1(x)
        return x

This code can be easily run in CPU just by means of creating an object of type Net and running the forward method with the correct tensor shape as input.

net = Net()
print(net)
input_test = torch.randn(5,50,50).view(-1,5,50,50)
result  = net(input_test)

Migrating this model to STONNE is as simple as turning the Conv2d operation into a SimulatedConv2d operation. Next, we can observe an example:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.SimulatedConv2d(5,5,5,'$PATH_TO_STONNE/simulation_files/maeri_128mses_128_bw.cfg', 'dogsandcats_tile.txt', sparsity_ratio=0.0, stats_path='.', groups=1) 
    def forward(self, x):
        x = self.conv1(x)
        return x

As we can see, we have inserted 4 new parameters:

  • sim_file (str)

    This is the path to the configuration file of STONNE. This file defines the hardware to be simulated in every execution of STONNE. You can see multiple examples in the folder 'simulation_files'.

  • tile (str)

    This is the path to a file that defines the tile to be used to partition that layer. An example of this file might be found in minibenchmarks/dogsandcats_tile.txt (note that an example for a linear tile file might be found in minibenchmarks/dogsandcats_tile_fc.txt). Also an example using STONNE Mapper for generate automatically a tile can be found in minibenchmarks/dogsandcats_tile_stonnemapper.txt (same parameters as if used from the CLI). This parameter only will make sense if the hardware configuration file contains a dense memory controller. If the memory controller is sparse, then the execution will not require tiling as it is explained in SIGMA paper.

  • sparsity_ratio (float 0.0-1.0)

    This is the sparsity ratio used to prune the weight tensor. This parameter only makes sense if a sparsity controller is used in the hardware configuration file. Otherwise this will be ignored. They way to proceed in the current version of STONNE is indicating this parameter. Then, previously to the simulation, the weight tensor is pruned accordingly to that parameter and the bitmaps are created accordingly. Note that the weights are not retrained and therefore this will affect to the accuracy of the model. However, in terms of a simulation perspective, this lower accuracy is not affected at all. Obviously, this is a way to proceed. It is possible, with low efforts, to run an already pruned and re-trained model. To do so, the code have to be briefly modified to remove the pruning functions and use the real values as they are. By the moment, STONNE only allows bitmap representation of sparsity. If you have a model with other compression format, you could either code your own memory controller to support it or code a simple function to turn your representation format into a bitmap representation.

  • stats_path

    This is an optional parameter and points to a folder in which the stats of the simulation of that layer will be stored.

The addition of these 4 parameters and the modification of the function will let PyTorch run the layer in STONNE obtaining the real tensors.

In the current version of the pytorch-frontend we also support nn.SimulatedLinear and torch_stonne.SimulatedMatmul operations that correspond with both nn.Linear and nn.Matmul operations in the original PyTorch framework. The only need is to change the name of the functions and indicate the 3 extra parameters. We still do not support SparseDense operations on the pytorch-frontend.

Simulation with real benchmarks

In order to reduce the effort of the user, we have already migrated some models to STONNE. By the moment, we have 4 DNN benchmarks in this framework: Alexnet, SSD-mobilenets, SSD-Resnets1.5 and BERT. All of them are in the folder 'benchmarks'. Note that to migrate these models, we have had to understand the code of all of them, locate the main kernels (i.e., convolutions, linear and matrix multiplication operations) and turn the functions into the simulated version. That is the effort you require to migrate a new model. We will update this list over time.

Running these models is straightforward as we have prepared a script (benchmarks/run_benchmarks.py file) that performs all the task automatically. Next, we present one example for each network:

cd benchmarks
  • Running BERT:
python run_benchmarks.py "bert" "../simulation_files/sigma_128mses_64_bw.cfg" "NLP/BERT/tiles/128_mses/" "0.0" ""
  • Running SSD-Mobilenets
python run_benchmarks.py "ssd_mobilenets" "../simulation_files/sigma_128mses_64_bw.cfg" "object_detection/ssd-mobilenets/tiles/128_mses" "0.0" ""
  • Running SSD-Resnets:
 python run_benchmarks.py "ssd_resnets" "../simulation_files/sigma_128mses_64_bw.cfg" "object_detection/ssd-mobilenets/tiles/128_mses" "0.0" "" 

stonne's People

Contributors

adrian-2105 avatar francisco-munoz avatar raveeshgarg avatar stonne-simulator avatar trellixvulnteam avatar xuleimath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

stonne's Issues

Error when trying to integrate the MADDPG with Stonne

Hello,
I'm trying to execute the MADDPG rl algorithm with stonne. But, I receive the following error. It says the tile cannot be parsed. But, MADDPG uses MLP neural network with nn.linear and I just did nn.SimulatedLinear.So, can you let me know where the problem might be?
python main.py simple_spread simple_spread --n_episodes 60000 --discrete_action
Episodes 1-2 of 60000
Traceback (most recent call last):
File "main.py", line 251, in
run(config)
File "main.py", line 125, in run
torch_agent_actions, step_1_cnt, step_2_cnt = maddpg.step(torch_obs, step_1_cnt, step_2_cnt, explore=True)
File "/home/kailash/Desktop/maddpg-pytorch-copy/algorithms/maddpg.py", line 92, in step
action, counter_list = a.step(obs, counter_list, explore=explore)
File "/home/kailash/Desktop/maddpg-pytorch-copy/utils/agents.py", line 75, in step
action = self.policy(obs)
File "/home/kailash/anaconda3/envs/maddpg-pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/kailash/Desktop/maddpg-pytorch-copy/utils/networks.py", line 47, in forward
h1 = self.nonlin(self.fc1(self.in_fn(X)))
File "/home/kailash/anaconda3/envs/maddpg-pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/kailash/anaconda3/envs/maddpg-pytorch/lib/python3.6/site-packages/torch/nn/modules/simulatedLinear.py", line 76, in forward
output = torch_stonne.simulated_linear_forward(self.class.name, input, self.weight, self.path_to_arch_file, self.path_to_tile, self.sparsity_ratio, True) # The last true is to transpose the matrices
RuntimeError: dogsandcats_tile_fc.txt could not be opened for parsing

Can we implement training on STONNE?

Hi Francisco Muñoz-Martínez,
Can we train an algorithm on STONNE? If not, can you guide me through the ways on (any documentation) to change the simulator to accelerate training?

STT-RAM Implementation

Hi, I plan on implementing part of the STONNE framework but was wondering how feasible it would be to implement an emerging non-volatile memory technology such as STT-RAM. This would replace the SDMemory module and we would hope to track the behavior/performance of the memory technology on the python frontend module. Any advice as to how one could patch in a similar piece of software would be great.

Thank you, Michael

PyTorch frontend

Hello, I have 2 questions:

  1. What is the version of PyTorch frontend that you have included in the repository? [v1.7.0 or v.17.1]
  2. Could you provide a file list with the customizations you made in PyTorch? I ran diff using both PyTorch v1.7.0 and v1.7.1 but I couldn't get the result I was looking for. It seems that almost all files were changed.

Steps to Install STONNE successfully

Hi all,

I have recently started using STONNE for my research. It is an awesome tool for modeling deep neural network accelerators. I am grateful to the authors for making it open-source. However, I ran into several issues while installing STONNE. I want to share the sequences of steps and commands that lead to a successful installation for me. The below steps have worked well for my lab mates too.

Below are the steps:

  1. Use VM or WSL with Ubunto 18.04
  2. Install Anaconda
  3. Setup a virtual environment using conda
    conda create --name stonne python=3.8
  4. Activate stonne environment
    conda activate stonne
  5. Clone Stonne
    git clone https://github.com/stonne-simulator/stonne.git
  6. Build STONNE user interface
cd stonne
make all

The main issues with installation occur during the Pytorch-FrontEnd Installation of STONNE
7) Before beginning frontend installation, install numpy in the stonne env
conda install -c anaconda numpy
8) If you don't have CUDA access e.g no graphic card, you need to export these environment variables before frontend installation

export MAX_JOBS=1
export NO_CUDA=YES
export NO_CUDA=1

  1. Pytorch-frontend installation, it is important to set the CMAKE_PREFIX_PATH
cd pytorch-frontend/
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install
  1. You will encounter some module missing errors. You can use conda to install modules one after the other module depending on the error or you install all of them at once using requirement.txt
    The above step should build STONNE torch, the installation takes somewhere between 15-20mins depending on your machine.

  2. Once the installation is complete, then you need to connect the torch with stonne.

cd stonne_connection/
python setup.py install
  1. You need torchvision for accessing various CNN models and datasets. The authors recommend building the vision from the source, but it throws several errors. The simple workaround that works without messing up the installed pytorch-stonne with below command:
    pip install torchvision --no-deps
  2. You can check the installation by running test-2.py in folder stonne-connection

Hope this helps in installing STONNE smoothly.

I can't solve this error: "libtorch_python.so: undefined symbol"

Hello, I'm trying to build stonne-simulator on my conda environment(env name: Test) with python 3.7.
Here is what I stepped before.

  1. clone the repository and create environment(conda create -n python=3.7
  2. change directory to pytorch-frontend and run setup.py (python setup.py install)
  3. change directory to pytorch-frontend/stonne_connection and run setup.py (python setup.py install)
    -> in step 3, it occurred following statemetnt.
    Traceback (most recent call last):
    File "setup.py", line 2, in
    from torch.utils import cpp_extension
    File "/home/inje/anaconda3/envs/Test/lib/python3.7/site-packages/torch/init.py", line 189, in
    from torch._C import *
    ImportError: /home/inje/anaconda3/envs/Test/lib/python3.7/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZN5torch3jit7testing9FileCheck10check_nextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The error also occurred on prebuilted torch environment(python3.7+ torch1.7+cuda102).
Is there any problem that I miss it? I'm hopefully wait for your comments!
Thanks for reading

Question about the 'simulated_matmul()'

I have a question about the simulated_matmul() function.

The function seems to be porting the matmul() function to STONNE in pythorch.

Do torch.matmul() and torch_stonne.simulated_matmul() work exactly the same?

According to the notes, it should work the same way.

However, it does not seem to perform exactly the same motion in the following situations.
image

import torch
import torch_stonne

a = torch.tensor([[3],[0],[1],[2]])  # 1x4 matrix
b = torch.tensor([[0,2,2,3]])  # 4x2 matrix

r=torch_stonne.simulated_matmul("", a,  b, "../../simulation_files/maeri_128mses_128_bw.cfg", "test_tile.txt", 0)
print(r)
print(torch.matmul(a,b))

I want you to check it out

Questions about list of simulated functions

Thank you for always answering my questions kindly.

I recently use the stonne simulator again.

In my case, "torch.add/torch.sub / torch.Basic matrix arithmetic functions such as "matmul" are used a lot.

"Matmul" confirmed that there was a function simulated by the STONNE engine.

However, there seems to be no add/sub. I wonder if I understand well.

In addition, I wonder where I can find the list of functions simulated in STONNE engine.

Thank you for releasing useful software.

I hope you are always happy.

Python-frontend installation errors

Hi, Thanks for this great work,
I encountered the errors when I tried to build pytorch-frontend.
image
I knew there must be some path error I made,
But I can't find out. Could you please help me resolve this?

Pytorch Frontend Install Error

Hello.
I'm Jongsang Yoo , who is studying HW accelerator.
First of all, thank you for providing a good program as an open source. I think it will be very helpful for studying.

There is a problem with the PyTorch Frontend installation.
I have excutered all the commands in the readme file you provided. ReadMe

image

(stonne) jognsang@DESKTOP-TV817D3:~/lab/STONNE/stonne/pytorch-frontend$ python setup.py install
Building wheel torch-1.7.0a0+633fa53
-- Building version 1.7.0a0+633fa53
cmake -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/jognsang/lab/STONNE/stonne/pytorch-frontend/torch -DCMAKE_PREFIX_PATH=/home/jognsang/anaconda3/envs/stonne -DNUMPY_INCLUDE_DIR=/home/jognsang/anaconda3/envs/stonne/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/jognsang/anaconda3/envs/stonne/bin/python -DPYTHON_INCLUDE_DIR=/home/jognsang/anaconda3/envs/stonne/include/python3.8 -DPYTHON_LIBRARY=/home/jognsang/anaconda3/envs/stonne/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=1.7.0a0+633fa53 -DUSE_NUMPY=True /home/jognsang/lab/STONNE/stonne/pytorch-frontend
Traceback (most recent call last):
  File "setup.py", line 748, in <module>
    build_deps()
  File "setup.py", line 327, in build_deps
    build_caffe2(version=version,
  File "/home/jognsang/lab/STONNE/stonne/pytorch-frontend/tools/build_pytorch_libs.py", line 54, in build_caffe2
    cmake.generate(version,
  File "/home/jognsang/lab/STONNE/stonne/pytorch-frontend/tools/setup_helpers/cmake.py", line 329, in generate
    self.run(args, env=my_env)
  File "/home/jognsang/lab/STONNE/stonne/pytorch-frontend/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/home/jognsang/anaconda3/envs/stonne/lib/python3.8/subprocess.py", line 359, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/home/jognsang/anaconda3/envs/stonne/lib/python3.8/subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/home/jognsang/anaconda3/envs/stonne/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/jognsang/anaconda3/envs/stonne/lib/python3.8/subprocess.py", line 1720, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'build'

However, the following error occurs.
How can I solve this problem?
Please give me some advice.

My environment is Window 10 + WSL 2 (Ubuntu 22.04).

ps. I confirmed that STONNE works with the STONNE User Interface.
image

Printing individual time breakup values for the Simulator

Inside STONNEModel.cpp, from line numbers 542 to 547, to print individual values, the current object is not referenced. Is that by design? The benchmarked time values are stored in the variables associated to that object before. See code snippet below

std::cout << "Time mem(ms): " << time_mem << std::endl;

As opposed to

std::cout << "Time mem(ms): " << this->time_mem << std::endl;

Missing building blocks of SIGMA

Hi Francisco Muñoz-Martínez,

I have read your awesome paper and now I plan to simulate SIGMA on STONNE. After reading your codes, I found that in your SIGMA configuration file:

[MSNetwork]
ms_size=128
[ReduceNetwork]
type="ASNETWORK"
[SDMemory]
dn_bw=128
rn_bw=128
controller_type="SIGMA_SPARSE_GEMM"

According to the codes, I think it used the default trees-like DSNetworkTop Distribution Network and Augmented Reduction Tree(ART) as Reduction Network. But in the SIGMA paper, they used Benes Topology Distribution Network and Forwarding Adder Network(FAN) Reduction Network.

I wonder if the two building blocks of SIGMA are implemented in the current repo and whether the current configuration is capable to simulate the correct behaviour(timing, energy consumed and the on-chip area) of SIGMA?

Thanks a lot.

stonne_connect compile fail when pytorch compilered with clang++

Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /Users/hao.chen/work/ai/stonne/pytorch-frontend/stonne_connection/build/temp.macosx-10.9-x86_64-3.8/torch_stonne.o.d -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/hao.chen/opt/anaconda3/envs/stonne/include -arch x86_64 -I/Users/hao.chen/opt/anaconda3/envs/stonne/include -arch x86_64 -I/Users/hao.chen/work/ai/stonne/pytorch-frontend/stonne_connection/../../stonne/include/ -I/Users/hao.chen/work/ai/stonne/pytorch-frontend/stonne_connection/../../stonne/external/ -I/Users/hao.chen/opt/anaconda3/envs/stonne/lib/python3.8/site-packages/torch/include -I/Users/hao.chen/opt/anaconda3/envs/stonne/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/Users/hao.chen/opt/anaconda3/envs/stonne/lib/python3.8/site-packages/torch/include/TH -I/Users/hao.chen/opt/anaconda3/envs/stonne/lib/python3.8/site-packages/torch/include/THC -I/Users/hao.chen/opt/anaconda3/envs/stonne/include/python3.8 -c -c /Users/hao.chen/work/ai/stonne/pytorch-frontend/stonne_connection/torch_stonne.cpp -o /Users/hao.chen/work/ai/stonne/pytorch-frontend/stonne_connection/build/temp.macosx-10.9-x86_64-3.8/torch_stonne.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_clang"' '-DPYBIND11_STDLIB="_libcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1002"' -DTORCH_EXTENSION_NAME=torch_stonne -D_GLIBCXX_USE_CXX11_ABI=0 -std=gnu++14
FAILED: /Users/hao.chen/work/ai/stonne/pytorch-frontend/stonne_connection/build/temp.macosx-10.9-x86_64-3.8/torch_stonne.o

and I get warning like following
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++) is not compatible with the compiler Pytorch was
built with for this platform, which is clang++ on darwin. Please
use clang++ to to compile your extension. Alternatively, you may
compile PyTorch from source using g++, and then you can also use
g++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

so I change the command to CXX=clang++ python setup.py install but still failed.

Error in installation of vision

hello dear,
Is it possible to install STONNE on ubuntu18.04? which OS is recommended?
when I wanted to install vision this error was occured( i have attached it).
How I can solve this problem,please help

[](url
error.docx
)

Parameter stats_path='.' missing for the SimulatedConv2d

Hello,

We are currently testing STONNE to simulate some neural network architectures. We have a question, in the README.md file you mentioned a parameter call stats_path inside the SimulatedConv2d operation (i.e. I guess it is in the SimulatedLinear as well). However we could not find any parameter with this name in SimulatedConv.py or in simulatedLinear.py. Could you please update if there are plans to implement this functionality? Your help is much appreciated.

Regards,

Isai Roman.

Three questions about Utilization

Hi,
Sorry for the bothering,
I got three questions about memory hierarchy assumption, Tiling and Sparsity supports,

  1. Memory: if we are calling a SimulatedConv2d or simply run ./stonne with arguments,
    my understanding is we are feeding the tensor in PyTorch to an emulator, if that is correct, what kind of memory hierarchy should we assume? Did we assume that everything is prepared in the scratchpad, or we can assume that the bandwidth is sufficient to feed all the computation needs?
  2. Tiling: About tiling, I saw that the stonne has connections for users to customize tiling, I wondered if there is any philosophy behind that tiling so we can have a tradeoff between latency and throughput, or the tiling totally depends on the size of accelerator and the size of computation tensors
  3. Sparsity Supports: I saw that in Stonne, we are directly connecting with tensor in Pytorch, in that case, all the sparsity are represented as bit-map, (am I right?), so my question is in that case, did Stonne supports other sparsity data structure for now?

The unit of data and frequency of the stonne

Hi Francisco,

I hope you are doing well.

Sorry to bother you again, but I found the data in the output file doesn't have the unit. Do you mind telling me the units of these values, such as energy, area in the output file?

Besides, I cannot find the whole execution time for the whole operation(It just tells me how many cycles when "running a convolution layer"), nor the frequency. If there is a default frequency, could you please tell me?

Can you tell me the data size it reads from the GlobalBuffer. Should it be 8bits? or 32bits? or 64bits?

BEST
Yile

DRAM modeling

Hi Francisco,

I looked through the accelerator code and try to understand how the interaction between the STONNE engine and DRAM is modeled.
Can you briefly explain how DRAM bandwidth/latency is modeled?
From the paper, you mentioned Gbuffer has double buffering and DRAM is modeled with DRAMsim.
Can you also point me to the corresponding codes?

Thanks.
Xin

`weights_tensorflow.pb` is in owner's trash

Helllo,

I tried to download the weights for the mobilenet from benchmarks > object_detection > ssd-mobilenets using the download_weights script, and I got an error message. I think that the file with the weights was deleted from a contributor's google drive.

`NLP/BERT/` directory is missing

I tried to run the BERT benchmark and unfortunately the required files were deleted from the repository. Was there some bug, or are you planning on re-uploading them soon?

Memory leaks during big simulation runs - Destructors not being called

Hello Adrian,

Hey I've been having serious memory issues with big simulation runs (i.e. for instance take MobileNetV2, TPU 256x256 and corresponding architecture attached). The process was killed by the OOM killer in our servers in the lab because of memory starvation (Please check the picture; the memory needs are in the order of TB). So I start digging in and I partially solved the biggest memory issues changing the abstract classes of the hardware components. Since they do not have any virtual destructor, the destructor of the derived classes were not called (reference about this topic, https://www.quantstart.com/articles/C-Virtual-Destructors-How-to-Avoid-Memory-Leaks/). Could you please help us fix this in your master branch? If you need to check my commit, please let me know.

Regards,

Isai.

photo_2023-06-10_13-09-39
tpu_256_256_pes.cfg.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.