yhassanzadeh13 / opera Goto Github PK

Offline thread-based emulator local network for distributed systems

Java 99.88% Makefile 0.12%

opera's Introduction

Overview

The Distributed Systems simulator is an offline simulator for distributed systems. The simulator provides an integratable and easy-to-use interface for running, and testing your distributed system on various underlays, as well as extracting grafana metrics.

Architecture Overview

The simulator provides a channel for every node for receiving, and sending messages to the other nodes in the clsuter. These messages are of type events, and they are used to invoke certain actions in the destination node.

Install

Under the master branch, you will find a Maven project which you can clone and use directly.
Additionally, the simulator requires Docker to be installed on your machine.
Docker is available for free on its official website

Usage

simulator setup
Integrating node class
Integrating communication events
Interaction with the simulator
Starting the simulation
Registering Prometheus metrics
Visualizing metrics using Grafana
Supporting a new communication protocol

simulator Setup

load the simulator.simulator package to your project.

Integrating node class

Your node class should implement BaseNode interface from the simulator.simulator package. Every node is supposed to have a unique Identifier. which will be generated and be passed to the node by the simulator.simulator.
Five methods needs to be overridden:

onCreat: this is where you can setup your node. All nodes' onCreat method will be called before any other node start processing. Once the node finishes its setup it should declare itself as ready by calling network.ready() where network is an instance of MiddleLayer which will be passed by the Simulation upon creation.
onStart: to start the node's initial process. After all te nodes in the cluster are ready. the node onStart method will be called by the simulator.
onStop: this method will be called by the simulator once the node terminate. This method can be used for garbage collection.
onNewMessage: the node will receive all the event requests through this class. Every event request will be received in a separated thread.
newInstance: this method serves as a node factory method. For a given Identifier, and a network layer MiddleLayer, it should return a new node instance.

Integrating communication events

All the event classes in the network should implement the Event interface from the simulator.simulator package. You will need these events to send messages between nodes through the simulator Two methods should be implemented.

actionPerformed: receive an instance from the host node that will perform the event and will be used to activate the event by the user. Once the destination node receive the event, it can activate the action by calling event.actionPerformed(this)
logMessage: should return a message of the event state. It is used for the logging purpose.

Interaction with the simulator

The simulator provides a simulated network underlay for the sake of the nodes' communication. It provides the following methods:

network.ready: for the node to declare itself as ready after it finishes its setup.
network.send(BaseNode targetNode, Event event): is used to send an event from one node to another.
network.done: can be used for the node to terminate itself. The simulator will delete this node from the network and call the node onStop method.

simulator.simulator static logger can also be accessed using simulator.simulator.getLogger()

Starting the simulation

Consider you have a myNode class, and you want to run a simulation of 100 nodes.
you need to create a new simulator instance and pass a fixture factory node, and the number of nodes in the simulation. Subsequently, you can either start a constant simulation using constantSimulation(duration) or start a simulation with churn feature using churnSimulation(Long simulationTime, BaseGenerator InterArrivalTime, BaseGenerator sessionLength).
Various types of distributions can be accessed from the Generator package.

myNode fixtureNode = new myNode();
simulator<myNode> simulation = new simulator<myNode>(fixtureNode, 5, "tcp");

simulation.constantSimulation(10000);

simulation.churnSimulation(10000, new UniformGenerator(100, 500),
        new WeibullGenerator(1000, 3000, 1, 4));

Supported communication protocols are: **tcp**, **javaRMI**, **udp**, and **mockNetwork**
The output log of the simulation will be generated in a `log.out` file under your project's directory.

Registering Prometheus metrics

The simulator provides three metric types under the Metric package-- SimulatorCounter, SimulatorGauge, and SimulatorHistogram
the static register method can be used to register a new metric:

SimulatorCounter.register("MetricName")

The simulator provides basic default metrics such as packets delay, number of sent messages, number of received messages, session length, inter-arrival time.

Visualizing simulator metrics in Grafana

The simulator uses a Docker container for Prometheus and configures it automatically with Grafana. You can directly access Prometheus on localhost:9090, and Grafana on localhost:3030.

Example of visualizing the default metrics

Access Grafana on localhost:3030. The default username and password is admin. Create a new dashboard, and add a new panel. Enter your metric in Metrics field. In order to obtain metrics for a specific node, specify the corresponding Identifier for that node. From Visualization, specify the type of visualization that you want to obtain.

Example of obtaining the session length metric for a specific node

You can add multiple panels to a dashboard, and save it.

Example of a sample dashboard

Supporting a new communication protocol

Create a new communication protocol and extend the underlay superclass. Additionally, add your protocol name, and class name in the underlayTypes.yml file. Supported communication protocols are: tcp, javaRMI, udp, and mockNetwork

Simulation Examples

Two simulation examples are provided under the SimulatorExamples package.

HelloServers

Simulate a basic interaction between the nodes of the servers where every node send "Hello" to a randomly selected node, and the node replies by "Thank You".

ServersBattle

A slightly more complicated example that illustrate the nodes' setup, termination, the interaction between the events and the nodes parameters and thread safety.
It demonstrates a battle between the servers.

Every node start with an initial random power level.
Every node sends a BattleInvitation of a battle with a random duration to a randomly selected node.
This node either confirm the battle or decline it (in case it was involved in another fight) by sending an BattleConfirmation event.
In case of confirmation, the host node either starts the fight or abort the game in case it was involved in another fight while the invitation was pending.
In case the fight started, the node with the higher power level win. It gains +5 points for winning, -10 for losing, and 1 for drawing. It let the opponent node knows the results by sending BattleResult event.
When a node reaches a zero power level, it dies and send a goodbye message. The simulation continues until either the simulation duration finishes, or a winner (a single node) is declared.
Prometheus' metrics for the fight duration, health level, and number of fights are provided for each node.

Documentation

the JavaDoc Documentation can be found under the doc directory under the project directory.

Running tests

All tests:mvn test
Tests in class: mvn test -Dtest="underlay.UnderlayTest"
Specific test: mvn test -Dtest="underlay.UnderlayTest.A_testTCP"

Setting up Development Environment

IntelliJ IDEA

Make sure you have the latest version of Java JDK installed on your machine.

Clone the project repository using the Github address.

Import the project as a Maven project into your IntelliJ IDEA.

Navigate to the folder the project is cloned to.

Create a project from the existing sources.

Proceed with other steps as you always do. Once the project is created, you will be prompted with a message on the bottom of your IDEA Add as Maven Project. Click on that.

Wait until IDEA finishes the indexing process. The final project structure will look like below.

Now, to confirm that your setup works, run make test in your terminal. If you do not run into any errors, you are good to go!

opera's People

Contributors

Stargazers

opera's Issues

[Lint] Fix checkstyle issues

Context:

To run Checkstyle (currently available through make lint), and resolve all errors/warnings it finds.

How to proceed:

Create a branch <yourname>/<issue-number>-<issue-name> from feature/fix-broken-test and not master.
Run make lint, and one by one solve the reported errors and warnings, run all tests through mvn test after each change to make sure that lint fixes are not breaking changes.
The test should pass without making any change to its original implementation.

Definition of Done:

Checkstyle issues have been resolved across the entire repo, passes, and the PR passes CI integration.

[Fix-tests]UnderlayTest.C_testRMI

[Fix-tests]UnderlayTest.A_testTCP

[Metrics] size gauge

Context:

To develop a MessageSizeGauger class that given any Object of Java, returns its size in bytes.

Definition of Done:

Fork a branch from master name it <your name>/<issue number>-<issue name>, develop the issue, made a PR, and fix CI pipeline issues if raised.
Implement the feature in main/java/Metrics/
Define the Gauger as an interface with a Gauge(Object obj) method.
Implement an instance of that as MessageSizeGauger.
Implementation should be backed with tests.

Reference:

https://stackoverflow.com/questions/9368764/calculate-size-of-object-in-java

[Integrita] Develop MVP client and server

Context:

To implement a single IntegritaNode that can act as either a client or a server depending on its identifier. Note that we must have both the client and the sever implementation as part of the same node class at the current stage. Otherwise, it cannot be integrated with opera.

For this issue, always consider a simulation scenario involving only two nodes. So the ArrayList<UUID> allID that is passed to the nodes at their OnCreate life-cycle event handler is always an ArrayList of size two. The node having the first identifier in the allID (i.e., allID[0]) should act as the client, while the other node (i.e., allID[1]) should be a server. The behavior of client and servers are defined with the operations they perform. The operations in opera are defined by messages node pass to each other. The actionPerformed event handler of a message defines the operations the receiver of the message does on receiving the message.

Client and Server Logic

The following messages empower interaction between a user and a server as defined in Algorithm1,2 and 3 of the Integrita paper. The structure of messages is simplified, but as the project evolves, more fields will be introduced to these basic data types.

The following messages are generated by the user.

GetStatus:
- requestID: a unique integer request identifier for each GetStatus message which should be generated by the user
Pull
- nodeID
- requestID: it is a unique integer id for each pull message which should be generated by the user
Push
- nodeID
- requestID: it is a unique integer id for each pull message which should be generated by the user
- Operation number of type integer

A server sends the following messages as the response to the user-generated messages:

GetStatusResponse: this message is sent from the server to a user as the response to the GetStatus message. It has the following fields:
- index: An index of type integer
- requestID: an integer value, this is exactly the same as the requestID of the received GetStatus message
PullResponse: this message is sent from the server to a user as the response to the Pull message. It has the following fields:
- record: A byte array of size 256
- requestID: an integer value, this is exactly the same as the requestID of the received Pull message
PushResponse: this message is sent from the server to a user as the response to the Push message. It has the following fields:
- status: this field has boolean type: this field is true if the push is successful otherwise false. For now, we assume it always returns true.
- requestID: an integer value, this is exactly the same as the requestID of the received Push message

Testing Scenarios

A user sends a GetStatus message and in return receives a GetStatusResponse. The Request ID of the sent and received messages should match.
A user sends a Push message and in return receives a PushResponse. The Request ID of the sent and received messages should match. The status field of PushResponse should be true.
A user sends a Pull message and in return receives a PullsResponse. The Request ID of the sent and received messages should match. The record field of PullResponse should be a byte array of size 256 initialized with zeros.

Definition of Done

#23
Implement the client and server behavior as specified with the requested testing scenario in src/main/java/scenario/Integrita also feel free to develop additional tests.
Open up a PR against master and make sure that all tests on your branch are passing (you can check by running make test on the root directory of opera), currently all tests are passing on the created branch.

[Simulator] supporting multiple types

Context:

Currently, Opera is only capable of running simulations for a single type of node through the fixture nodes:

LightChainNode fixtureNode = new LightChainNode();    
Simulator<LightChainNode> simulator= new Simulator<>(fixtureNode, numNodes, "mockNetwork");

The purpose of this issue is to make it capable of running cetrain instances of different types of nodes, by defining a node factory:

LightChainNode fixtureNode = new LightChainNode();
RegistryNode registryNode = new RegistryNode();

NodeFactory factory = new NodeFactory();
factory.put(fixtureNode, 21);
factory.put(registryNode, 1);

Simulator<LightChainNode> simulator = new Simulator<>(factory, "mockNetwork");

Definition of Done:

Fork a branch from master name it <your name>/<issue number>-<issue name>, develop the issue, made a PR, and fix CI pipeline issues if raised.
Simulator receives a node factory instead of a fixture.
This change is refactored all over the existing code, and broken code and tests fixed.
This new feature is supported by tests that make sure that two different types of nodes can send and receive messages during the simulation.

[Simulator] Refactor loggings

Context:

This issue encapsulates the workaround to improve logging at Opera by replacing log4j with log4joop while following the predefined set of best practices:

Refactor actionPerformed method of Event to also accept a logger for logging the internal events:

boolean actionPerformed(BaseNode hostNode, Logger logger);

All methods invoked in actionPerformed is not allowed to log internally. Rather their result should be logged at info level in the actionPerformed, for example:

Bad Practice:
The following method is called in theactionPerformed, hence it should not have any internal logging (but unfortunately has!)

  public Block getLatestBlock(UUID requester) {
    logger.info("[Registry] Getting Latest Block for node " + requester);

    // some operations ...

    logger.info("[Registry] " + this.insertedBlocks.size() + " blocks found");

    logger.info("[Registry] Sending Latest Block " + latestBlock.getID() + " to node " + requester);
    this.network.send(requester, new DeliverLatestBlockEvent(latestBlock));

    return chosenBlock;
  }

Good Practice:
We drop all logging from the method internally, and only log its result.

  public Block getLatestBlock(UUID requester) {
    // some operations ...
    return chosenBlock;
  }

Block block = node.getLatestBlock(this.requester);
// log at info level here with some UNIQUE attribute of result, e.g., block ID.

For classes that implement BaseNode only overriden methods are allowed to have logging inside. Other methods in such classes are not allowed to log a message. Rather the lifecycle caller should log them. There is an exceptional case that thread methods are allowed to log internally.
All logs should be in past simple tense and happen only after the operation they log is done, they should also encapsulate all necessary information about the operation they log.

Bad Practice:
The following log is bad since:
1. It logs events before happening.
2. It does not encapsulate context information about the log, i.e., every time the node runs the loop it results in the same log.
  Although it is obvious that every time it invokes the validator and tx gets changed. Though this contextual information remains concealed from the user.
```
      logger.info("Node " + this.uuid + " is requesting validators");
      for (UUID validator : validators) {
        // send an asynchronous validation request
        network.send(validator, new ValidateTransactionEvent(tx));
      }
```
Good Practice:
⚠️ Note: just for simplicity it is not in log4joop in this example, (although it should be).
```
      
      for (UUID validator : validators) {
        // send an asynchronous validation request
        network.send(validator, new ValidateTransactionEvent(tx));
        logger.info("Node " + this.uuid + " requested validation from " + validator + " for transaction: " + tx.getID());
      }
```
Log level conventions:
- All logs in scenario or SimulatorExamples should be in info level.
- All catch block of exceptions should log the stack trace at fatal level.
- All other logs should be in debug level.

Definition of Done:

Fork a branch from master name it <your name>/<issue number>-<issue name>, develop the issue, made a PR, and fix CI pipeline issues if raised.
Break the PR into several smaller ones if they change more than 10 files.

Checkout `integrita/22-client-server-mvp` branch (this branch has already been created, do not create a new branch).

[Metrics] Define and group metrics by simulation id

Currently, we are aggregating metrics by the opera job.

This diminishes the clarity between successive simulations, as the sole differentiating factor between two groups of simulations is the time interval separating them.

The objective of this concern is to establish the concept of a unique simulation ID (a distinct and unambiguous identifier) and consolidate all relevant metrics associated with a particular simulation ID in Grafana.

Fixing broke tests

Context:

Running mvn test as part of our CI was supposed to run all tests and fail CI if any test would fail. However, we noticed that it is currently not configured properly, and hence there are a handful of broken tests sitting on master branch.
The purpose of this issue is to configure the maven plugins in the project properly so that any broken test would fail the CI. As the candidate, we suggest Surefire plugin of maven be adopted to our project.
Once the plugging is set properly, the broken tests should be identified, their root cause investigated, and fixed one by one.

How to proceed?

Create a branch <yourname>/<issue-number>-<issue-name>
Make sure that you setup the Surefire plugin has setup correctly so that mvn test will _run all tests in src/test/java/** packages. Even develop a simple test and intentionally let it fail to verify whether Surefire can catch it properly.
Identify all broken tests, start by the easiest one to fix, and fix it, and repeat this process iteratively till all tests are passed.

Definition of Done:

Surefire plugin has been configured properly.
mvn test will run all tests in specified package and will detect and catch any broken test.
All broken tests have been identified and fixed one by one.

yhassanzadeh13 / opera Goto Github PK

opera's Introduction

Overview

Content:

Architecture Overview

Install

Usage

simulator Setup

Integrating node class

Integrating communication events

Interaction with the simulator

Starting the simulation

Registering Prometheus metrics

Visualizing simulator metrics in Grafana

Example of visualizing the default metrics

Supporting a new communication protocol

Simulation Examples

HelloServers

ServersBattle

Documentation

Running tests

Setting up Development Environment

IntelliJ IDEA

opera's People

Contributors

Stargazers

opera's Issues

Context:

How to proceed:

Definition of Done:

Context:

Definition of Done:

Reference:

Context:

Client and Server Logic

Testing Scenarios

Definition of Done

Context:

Definition of Done:

Context:

Definition of Done:

Context:

How to proceed?

Definition of Done:

Recommend Projects

Recommend Topics

Recommend Org