Giter Site home page Giter Site logo

hw2vec's Introduction

HW2VEC: A Graph Learning Tool for Automating Hardware Security

Build Status GitHub license PyPI version

HW2VEC is an open-source graph learning tool for hardware security applications. HW2VEC provides an automated pipeline to extract a graph representation (abstract syntax tree or data flow graph) from a hardware design in various abstraction levels (RTL or gate-level netlist). Besides, HW2VEC includes graph learning functional components for users to apply graph learning approaches to these hardware designs in non-Euclidean form according to their problem settings. In this readme, we demonstrate how to use HW2VEC and provide its use-cases for two hardware security applications: Hardware Trojan Detection and IP Piracy Detection. We hope that HW2VEC can be helpful for researchers and practitioners in hardware security research community. In this repo, we integrate Pyverilog as part of our graph extraction pipeline (HW2GRAPH) and Pytorch-Geometric into our graph learning pipeline (GRAPH2VEC). The architecture of HW2VEC is shown as follows:

To Get Started

We recommend our users to use Linux system and Anaconda as the virtual environment. The environment requirements for hw2vec is as follows,

  • python >= 3.6
  • torch == 1.6.0
  • torch_geometric == 1.6.1
  • pygraphviz

You can either install hw2vec from pypi or clone our repo. Here we provide one recommended command sequence,

# run the following commands if you choose to install hw2vec from pip instead of installing from pypi or cloning our repo.
$ conda create --name hw2vec python=3.6
$ conda activate hw2vec
$ python -m pip install hw2vec 

# for installing pygraphviz. 
$ python -m pip install pygraphviz

# for install torch and torch_geometric.
$ conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch

This set of commands assumes you have cuda 10.1 in your local and you are using Linux. Please refer to the installation guides of torch and pytorch_geometric if you have different cuda settings. If installing pygraphviz in windows, please refer to this issue for more information.

Use Cases Examples

Use Case 1: Transforming a hardware design to a graph then to a graph embedding

In this use case, we demonstrate how to use HW2VEC to transform a hardware design into a graph and then into an embedding with a pre-trained model. In the sample script examples/use_case_1.py, HW2GRAPH first uses preprocessing and graph generation modules to convert the hardware design p into the corresponding graph g. Then g is fed to GRAPH2VEC with the uses of Data Processing to generate X and A. X and A are processed through the pre-trained model with Graph Convolution layers, Graph Pooling layers, and Graph Readout operations to generate the graph embedding hg. This resulting hg can be further inspected with the utilities of the model.

To run this use case, unzip the assets/datasets.zip then use the following commands:

$ cd examples
$ python use_case_1.py

Users can refer to the following code piece in use_case_1.py to configure the hardware design code path and pre-trained model weights and config paths.

hw_design_dir_path = Path("../assets/TJ-RTL-toy/TjFree/det_1011/") # Change this path to other hardware design folder path.
pretrained_model_weight_path = "../assets/pretrained_DFG_TJ_RTL/model.pth" # Change this path to your desired pretrained model weight path.
pretrained_model_cfg_path = "../assets/pretrained_DFG_TJ_RTL/model.cfg" # Change this path to your desired pretrained model config path.
cfg.graph_type = "DFG" # each pretrained model is bundled with one graph type so you will need to change this to fit the used pretrained model.

The expected embedding hg is:

tensor([[1.5581e-02, 2.5182e-01, 2.1535e-02, 3.0264e-02, 3.3349e-02, 4.6067e-02,
         5.5791e-02, 3.6810e-02, 8.3800e-02, 5.1623e-02, 1.1715e-03, 1.2781e-01,
         ...      
         1.2203e-01, 1.2821e-01]], grad_fn=<CppNode<ScatterMax>>)

Use Case 2: Hardware Trojan Detection

In this use case, we demonstrate how to use HW2VEC to detect hardware trojans (HT), which are intentional, malicious modifications of circuits by attackers. examples/use_case_2.py implements a proposed GNN-based approach to model the circuit's behavior and identify the presence of HTs. The dataset used in this use case is obtained from a famous trojan benchmark. The converted hardware DFG dataset can be downloaded from here.

To realize the model with HW2VEC, we first use HW2GRAPH to convert each hardware design p into a graph g. Then, we transform each g to a graph embedding hg. Lastly, hg is used to make a prediction with an MLP layer. To train the model, the cross-entropy loss L is calculated collectively for all the graphs in the training set.

To run this use case, use the script examples/use_case_2.py with the toy dataset (assets/TJ-RTL-toy). To train the model on the dataset, we provide the following command sequences.

$ cd examples
# for running HT detection on our toy RTL dataset using DFG graph type
$ python use_case_2.py --yaml_path ./example_gnn4tj.yaml --raw_dataset_path ../assets/TJ-RTL-toy --data_pkl_path dfg_tj_rtl.pkl --graph_type DFG (--device cuda)

# for running HT detection on our toy RTL dataset using AST graph type
$ python use_case_2.py --yaml_path ./example_gnn4tj.yaml --raw_dataset_path ../assets/TJ-RTL-toy --data_pkl_path ast_tj_rtl.pkl --graph_type AST (--device cuda)

Users can adjust the configuration (example_gnn4tj.yaml) to play with the model's hyperparameters.

---
learning_rate: 0.001 # The initial learning rate for the model.
seed: 0 # Random seed.
epochs: 200 # Number of epochs to train.
hidden: 200 # Number of hidden units.
dropout: 0.5 # Dropout rate (1 - keep probability).
batch_size: 4 # Number of graphs in a batch.
num_layer: 2 # Number of layers in the neural network.
test_step: 10 # The interval between mini evaluation along the training process.
pooling_type: "topk" # Graph pooling type.
readout_type: "max" # Readout type.
ratio: 0.8 # Dataset splitting ratio
poolratio: 0.8 # Ratio for graph pooling.
embed_dim: 2 # The dimension of graph embeddings.

Users can refer to the following code piece in use_case_2.py for model configuration and modify it directly.

convs = [
    GRAPH_CONV("gcn", data_proc.num_node_labels, cfg.hidden),
    GRAPH_CONV("gcn", cfg.hidden, cfg.hidden)
]
model.set_graph_conv(convs)

pool = GRAPH_POOL("sagpool", cfg.hidden, cfg.poolratio)
model.set_graph_pool(pool)

readout = GRAPH_READOUT("max")
model.set_graph_readout(readout)

output = nn.Linear(cfg.hidden, cfg.embed_dim)
model.set_output_layer(output)

Some of the performance metrics that we can provide are as follows:

Graph Type Precision Recall F1 Score
DFG 0.8750 1.0000 0.9333
AST 0.8750 1.0000 0.9333

Use Case 3: IP Piracy Detection

This use case demonstrates how to use HW2VEC to confront IP piracy - determining whether one of the two hardware designs is stolen from the other. The implemented method addresses IP piracy by assessing the similarities between hardware designs with a GNN-based model. The dataset used in this use case is obtained from a famous trojan benchmark. The converted hardware DFG dataset can be downloaded from here.

To implement the approach proposed, the GNN model has to be trained with a graph-pair classification trainer in GRAPH2VEC. The first step is to convert a pair of circuit designs p1, p2 to a pair of graphs g1, g2 with HW2GRAPH. Then, GRAPH2VEC transforms them into graph embeddings hg1, hg2. To assess the similarity of hg1 and hg2, a cosine similarity is computed as the final prediction of piracy.

To run this use case, unzip the assets/datasets.zip and use the script examples/use_case_3.py with a toy dataset (assets/IP-RTL-toy or assets/IP-Netlist-toy). To train the model, we provide the following command sequences.

$ cd examples
# for running IP piracy detection on our toy RTL dataset using DFG graph type
$ python use_case_3.py --yaml_path ./example_gnn4ip.yaml --raw_dataset_path ../assets/IP-RTL-toy --data_pkl_path dfg_ip_rtl.pkl --graph_type DFG (--device cuda)
# for running IP piracy detection on our toy RTL dataset using AST graph type
$ python use_case_3.py --yaml_path ./example_gnn4ip.yaml --raw_dataset_path ../assets/IP-RTL-toy --data_pkl_path ast_ip_rtl.pkl --graph_type AST (--device cuda)
# for running IP piracy detection on our toy Netlist dataset using DFG graph type
$ python use_case_3.py --yaml_path ./example_gnn4ip.yaml --raw_dataset_path ../assets/IP-Netlist-toy --data_pkl_path dfg_ip_netlist.pkl --graph_type DFG (--device cuda)
# for running IP piracy detection on our toy Netlist dataset using AST graph type
$ python use_case_3.py --yaml_path ./example_gnn4ip.yaml --raw_dataset_path ../assets/IP-Netlist-toy --data_pkl_path ast_ip_netlist.pkl --graph_type AST (--device cuda)

Users can adjust the configuration (example_gnn4ip.yaml) to play with the model's hyperparameters.

---
learning_rate: 0.001 # The initial learning rate for the model.
seed: 0 # Random seed.
epochs: 200 # Number of epochs to train.
hidden: 16 # Number of hidden units.
dropout: 0.5 # Dropout rate (1 - keep probability).
batch_size: 64 # Number of graphs in a batch.
num_layer: 5 # Number of layers in the neural network.
test_step: 10 # The interval between mini evaluation along the training process.
pooling_type: "sagpool" # Graph pooling type.
readout_type: "max" # Readout type.
ratio: 0.8 # Dataset splitting ratio
poolratio: 0.5 # Ratio for graph pooling.
embed_dim: 16 # The dimension of graph embeddings.

Users can refer to the following code piece for model configuration in use_case_3.py and modify it directly.

convs = [
    GRAPH_CONV("gcn", data_proc.num_node_labels, cfg.hidden),
    GRAPH_CONV("gcn", cfg.hidden, cfg.hidden)
]
model.set_graph_conv(convs)

pool = GRAPH_POOL("sagpool", cfg.hidden, cfg.poolratio)
model.set_graph_pool(pool)

readout = GRAPH_READOUT("max")
model.set_graph_readout(readout)

output = nn.Linear(cfg.hidden, cfg.embed_dim)
model.set_output_layer(output)

Some of the performance metrics that we can provide are as follows:

Graph Type Dataset Accuracy F1 Score
DFG RTL 0.9841 0.9783
AST RTL 0.8557 0.8333

Citation

If you find our tool is useful in your research, please kindly consider citing our papers.

@misc{yu2021hw2vec,
      title={HW2VEC: A Graph Learning Tool for Automating Hardware Security}, 
      author={Shih-Yuan Yu and Rozhin Yasaei and Qingrong Zhou and Tommy Nguyen and Mohammad Abdullah Al Faruque},
      year={2021},
      eprint={2107.12328},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}
@TECHREPORT{UCI-TR-21-02,
  AUTHOR =        {Yasamin Moghaddas and Tommy Nguyen, Shih-Yuan Yu, Rozhin Yasaei, Mohammad Abdullah Al Faruque, “Technical Report for HW2VEC – A Graph Learning Tool for Automating Hardware Security“, CECS TR 21-02, posted on July 27, 2021},
  TITLE =         {Technical Report for HW2VEC – A Graph Learning Tool for Automating Hardware Security},
  NUMBER =        {TR-21-02},
  INSTITUTION =   {Center for Embedded and Cyber-Physical Systems University of California, Irvine},
  ADDRESS =       {Irvine, CA 92697-2620, USA},
  MONTH =         {July},
  YEAR  =         {2021},
   URL   =        {http://cecs.uci.edu/files/2021/07/TR-21-02.pdf}
}

hw2vec's People

Contributors

arkdu avatar aungthu17593 avatar dreamingsarah avatar louisccc avatar rozhinys avatar tthenguyen avatar ymoghadd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hw2vec's Issues

why use both log_softmax and CrossEntropyLoss ?

def train_epoch_tj(self, data):
    output, _ = self.model.embed_graph(data.x, data.edge_index, data.batch)
    output = self.model.mlp(output)
    output = F.log_softmax(output, dim=1)

    loss_train = self.loss_func(output, data.label)
    return loss_train

def __init__(self, cfg, class_weights=None):
    super().__init__(cfg)
    self.task = "TJ"
    if class_weights.shape[0] < 2:
        self.loss_func = nn.CrossEntropyLoss()
    else:    
        self.loss_func = nn.CrossEntropyLoss(weight=class_weights.float().to(cfg.device))

That means using double logsoftmax, which might introduce some negative effects.

PyTorch Geometric on MacOS

# Works on OS X, with conda installed.

# Create conda environment for PyTorch Geometric
echo "Creating pyg environment"
conda create -n pyg python=3.6

echo "Activate pyg Env"
source activate pyg

# PyTorch Conda Installation
echo "Installing PyTorch"
conda install pytorch torchvision -c pytorch

# Change of Compilers
echo "Compiler Changing on OS X"
conda install -y clang_osx-64 clangxx_osx-64 gfortran_osx-64
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++

# Install dependencies
echo "Installing PyG Dependencies"
pip install torch_scatter
pip install torch_sparse
pip install torch_cluster
pip install torch_geometric

Run "Use_Case_1.py" in CPU mode

Hi,

My environment is an ubuntu virtual machine, and I have installed the pytorch cpu version.
I tried to run use_case_1.py, and changed line 105 of models.py.
torch.load(model_weight_path, map_location='cpu')
However, I get the error in the figure below:
image

Then I changed the key in the dictionary to make the names match, and ran it with the following error:
image

Then I used the reshape function to convert (37, 200) to (200,37).
The program can output the data, but the output data is different every time I run it.

Is there a step I'm missing?
image
image
image

Looking forward to your reply.

Benoit

pygraphviz on MacOS 13.3.1

Use the below url to install pygraphviz on Mac (tested and working).

https://ports.macports.org/port/py38-pygraphviz/

`(hw2vec) rahulvishwakarma@Rahuls-Air examples % python
Python 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:43)
[GCC Clang 11.1.0] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import pygraphviz as pgv
G = pgv.AGraph()
G.add_node("a")
G.add_edge("b", "c")
print(G)
strict graph "" {
a;
b -- c;
}

`

Normalization

Normalization should be on by default, and this can help reduce some code in DataProcessor

Error while building hw2vec

Hi, I cloned the repo and then run python3 setup.py install and it throws these errors.

image

Can someone help me out?

.yaml parameters

Parameters not related to model building or training should be moved to config.py

Use_Case_1.py" in GPU mode

Use_Case_1.py" in GPU mode throws the same error as when ran with only CPU.

!python use_case_1.py

Traceback (most recent call last):
  File "/content/hw2vec/examples/use_case_1.py", line 33, in <module>
    graph_emb = use_case_1(cfg, hw_design_dir_path,\
  File "/content/hw2vec/examples/use_case_1.py", line 18, in use_case_1
    model.load_model(pretrained_model_cfg_path, pretrained_model_weight_path)
  File "/content/hw2vec/hw2vec/graph2vec/models.py", line 105, in load_model
    self.load_state_dict(torch.load(model_weight_path, map_location=torch.device('cpu')))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GRAPH2VEC:
	Missing key(s) in state_dict: "layers.0.graph_conv.lin.weight", "layers.1.graph_conv.lin.weight", "pool1.graph_pool.gnn.lin_rel.weight", "pool1.graph_pool.gnn.lin_rel.bias", "pool1.graph_pool.gnn.lin_root.weight". 
	Unexpected key(s) in state_dict: "layers.0.graph_conv.weight", "layers.1.graph_conv.weight", "pool1.graph_pool.gnn.lin_l.weight", "pool1.graph_pool.gnn.lin_l.bias", "pool1.graph_pool.gnn.lin_r.weight". 

Can you please help with this issue?

use_case_2.py and use_case_3.py works fine as documented on the GitHub page.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.