Giter Site home page Giter Site logo

andremaz / transformer-pointer-critic Goto Github PK

View Code? Open in Web Editor NEW
9.0 4.0 2.0 10.28 MB

Implementation of Transformer Pointer-Critic Deep Reinforcement Learning Algorithm

License: MIT License

Python 91.68% R 8.24% Dockerfile 0.07%
transformer tf2 tensorflow2 reinforcement-learning ptr-net pointer-networks actor-critic deep-reinforcement-learning transformer-architecture transformer-network

transformer-pointer-critic's Introduction

Transformer Pointer-Critic

This is a repo with the source code for the Attention-Based Model and Deep Reinforcement Learning for Distribution of Event Processing Tasks. If this code is useful for your work, please cite our paper:

@article{MAZAYEV2022100563,
	title = {Attention-based model and deep reinforcement learning for distribution of event processing tasks},
	journal = {Internet of Things},
	pages = {100563},
	year = {2022},
	issn = {2542-6605},
	doi = {https://doi.org/10.1016/j.iot.2022.100563},
	url = {https://www.sciencedirect.com/science/article/pii/S2542660522000580},
	author = {Andriy Mazayev and Faroq Al-Tam and Noélia Correia}
}

Contents

Installation

python3 -m venv --system-site-packages ./venv
source ./venv/bin/activate
pip install --upgrade pip

# Install the actual deps
pip install -r requirements.txt

For more info check Tensorflow's installation guide.

Architecture

Simple Overview simple_arch

Problem Statement

Goal

Given a set of tasks (a.k.a. Rules, and web Resources), decide for the best Rule distribution across a set of devices a.k.a Nodes (each having a random amount of CPU, RAM and storage resources) while taking into account the QoS.

Three QoS are considered:

  • Greedy Distribution - Place as much Rules as possible
  • Fair Distribution - Place as much Rules as possible but ensure that all Nodes receive a fair amount of Rules to process
  • Cost Distribution - Place as much Rules as possible but minimize the number of Nodes while doing it

Input Representation

The input has two parts: Nodes and Rules. Each entry in the Nodes part describes the amount of available resources in the node, while each entry in the Rules part describes the demanded resources. The Nodes part has a dummy node that receives rejected Rules.

Input example with 2 Nodes and 2 Rules to distribute

array([
    [ 0.00, 0.00, 0.00], -> Node dummy. Rejected `Rules` will be "placed" here
    [ 0.70, 0.80, 0.40], -> Node 1. Available CPU: 0.70 | Available RAM: 0.80 | Available Storage: 0.40
    [ 0.50, 0.40, 0.20], -> Node 2. Available CPU: 0.50 | Available RAM: 0.40 | Available Storage: 0.20
    [ 0.10, 0.12, 0.17]  -> Rule 1. Required CPU: 0.10 | Required RAM: 0.12 | Required Storage: 0.17
    [ 0.18, 0.32, 0.16]  -> Rule 2. Required CPU: 0.18 | Required RAM: 0.32 | Required Storage: 0.16
    ],
    dtype=float32, shape=(5, 3))

Repo structure

.
├── agents
│ ├── agent.py - Agent Class implementation
│ ├── models
│ │ └── transformer - Contains of the actor and the critic models
│ ├── plotter.py - Plots the losses and the stats
│ └── trainer.py - Training function
├── configs - Centralized location for configuring the Agent, Env., training and testing function
│ ├── configs.py - Loader helper method
│ └── ResourceV3.json - Actual configs file
├── environment
│ ├── env_factory.py - Helper method to init and load the environment
│ ├── custom
│ │ ├── resource_v3
│ │ │ ├── attention_plotter.py - Plots attention (for potential policy analysis)
│ │ │ ├── env.py - Environment Class implementation
│ │ │ ├── heuristic - Contains implementation of baseline heuristics
│ │ │ ├── misc - Contains helper functions
│ │ │ ├── node.py - Node Class implementation
│ │ │ ├── resource.py - Task/`Rule`/`Resource` Class implementation
│ │ │ ├── reward.py - Contains the implementation of different rewards
│ │ │ └── tester.py - Testing function
├── main.py
├── requirements.txt
├── results
├── test_agent.py
└── tests - Unit and integrations tests
    ├── runner.py
    └── unit

Configuration

The configuration of the Env., Agent, Training and Testing functions are centralized and located in configs/ResourceV3.json

{
    "trainer_config": {
        "description": "Trainer function configs.",
        "n_iterations": 100000,
        "n_steps_to_update": 30,

        "export_stats": {
            "export_stats": true,
            "folder": "training"
        },

        "store_model_weights": {
            "export_weights": true,
            "folder": "model",
            "filename": "actor"
        }
    },

    "tester_config": {
        "description": "Testing function configs.",

        "add_brakes": false,

        "show_per_test_stats": true,
        "show_inference_progress": true,
        "show_solutions": false,
        "show_detailed_solutions": false,
        "plot_attentions": false,
        "batch_size": 1,

        "testbed" : {
            "num_tests": 100,
            "node_sample_configs": {
                "min": 5,
                "max": 50,
                "step": 5
            },
            "node_available_resources": {
                "min": 0,
                "max": 100,
                "step": 100
            },
            "request_sample_configs": {
                "min": 10,
                "max": 100,
                "step": 10
            }
        },

        "heuristic": {
            "dominant_resource": {
                "generate_params_combos": true,
                "resource_sort_descending": true,
                "node_sort_descending": true
            },
            "random": {},
            "cplex_greedy_and_critical": {
                // Disabled by default. If needed, you need to have docplex lib installed
                // More info: https://ibmdecisionoptimization.github.io/docplex-doc/
                "use": false,
                "greedy_with_critical_resource": false,
                "time_limit_ms": 60000,
                "num_threads": 8
            },
            "cplex_node_reduction": {
                // Disabled by default. If needed, you need to have docplex lib installed
                // More info: https://ibmdecisionoptimization.github.io/docplex-doc/
                "use": false,
                "time_limit_ms": 60000,
                "num_threads": 8
            }
        },

        "export_stats": {
            "global_stats": {
                "export_stats": true,
                "folder": "tests",
                "filename": "test"
            },
            "per_problem_stats": {
                "export_stats": false,
                "folder": "tests/per_instance"
            }
        }
    },

    "env_config": {
        "description": "Environment configs.",

        "batch_size": 128,
        "mask_nodes_in_mha": true,
        "generate_request_on_the_fly": false,

        "seed_value": 1235,

        "normalization_factor": 100,
        "decimal_precision": 2,

        "num_features": 3,
        "num_profiles": 1000,

        "profiles_sample_size": 20,
        "node_sample_size": 10,

        "EOS_CODE": -2,
        "req_min_val": 1,
        "req_max_val": 30,

        "node_min_val": 0,
        "node_max_val": 100,

        "reward": {
            "type": "greedy",
            "greedy": {},
            "single_node_dominant": {
                "rejection_penalty": -2
            },
            "global_dominant": {
                "rejection_penalty": -2
            },
            "reduced_node_usage": {
                "rejection_penalty": -2,
                "use_new_node_penalty": -1
            }
        }
    },

    "tpc": {
        "description": "Transformer Pointer Critic Agent configs.",
        "agent_config": {

            "gamma": 0.99,
            "values_loss_coefficient": 1.0,
            "entropy_coefficient": 0.01,
            "stochastic_action_selection": true,

            "actor": {
                "use_default_initializer": true,
                "num_layers": 1,
                "dim_model": 128,
                "num_heads": 8,
                "inner_layer_dim": 128,
                "encoder_embedding_time_distributed": true,
                "attention_dense_units": 128,
                "logit_clipping_C": 10.0,

                "learning_rate": 0.0001,
                "clipnorm": 1.0
            },

            "critic": {
                "use_default_initializer": true,
                "num_layers": 3,
                "dim_model": 128,
                "num_heads": 8,
                "inner_layer_dim": 512,
                "encoder_embedding_time_distributed": true,
                "last_layer_units": 128,
                "last_layer_activation": "linear",

                "learning_rate": 0.0005,
                "clipnorm": 1.0
            }
        }
    }
}

Training and Testing

After configuring (see Configuration) run main.py. The main.py will train and test the agent. Also, if configured, it will solve problem instances with "classic" heuristics and store the overall results in results folder. After the completion you will see a End... Goodbye! message.

Results

The images below show the performance of the agent. As the baseline CPLEX (with a time limit of 60 seconds) and several simple heuristics are used. Green highlighted areas in the images below show the configurations where CPLEX was able to obtain optimal solutions.

Greedy Results simple_arch

Critical-Aware Results simple_arch

Cost-Aware Results simple_arch

Useful Links

Pointer Critic

Unit Test and Coverage

python environment/custom/resource/tests/runner.py

or to generate an HTML-based coverage file

coverage run tests/runner.py && coverage html --omit=*/venv/*,*/usr/*,*/lib/*,*/tests/* -i

Potential Improvements and Interesting ToDos

Implement Self-Critic

Instead of using a dedicated network (the Critic) to estimate the state-value paris, which are used as a baseline, use greedy rollout baseline. Greedy rollout baseline in Attention, Learn to Solve Routing Problems! shows promising results.

How to do it

The easiest (not the cleanest) way to implement it is to create a agents/baseline_trainer.py file with two instances (env and env_baseline) of environment and agents (agent and agent_baseline).

Then:

  • When we sample a state from env we would copy it's state into env_baseline.
  • Delete the critic model from agent and agent_baseline as it is no longer necessary.
  • Copy the network weighs for agent actor into agent_baseline actor.
  • Set agent_baseline.stochastic_action_selection to False. This way the agent will select the action in a greedy way.
  • The agent will gather rewards from env and agent_baseline will do the same with env_baseline.

Implement Vehicle Routing Problem environment

It would be interesting to see how the network performs in VRP

How to do it

  • Look at the KnapsackV2 and ResourceV3 environments in environments/custom and adapt them to the VRP
  • Add the VRP env to environments/env_factory.py
  • Add the JSON config file into the configs folder.

transformer-pointer-critic's People

Contributors

andremaz avatar fmstam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

csyml leicheng42

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.