RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

1️⃣ First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)

⚡ Speed up the training of Attention Model by 8 times (25hours $\to$ 3 hours)

🔎 A flexible framework for developing model, algorithm, environment, and search for operation research

News

13/04/2023: We release web demo on Hugging Face 🤗!
24/03/2023: We release our paper on arxiv!
20/03/2023: We release jupyter lab demo and pretrained checkpoints!
10/03/2023: We release our codebase!

Demo

We provide inference demo on colab notebook:

Environment	Search	Demo
TSP	Greedy
CVRP	Multi-Greedy

Installation

Conda

conda env create -n <env name> -f environment.yml
# The environment.yml was generated from
# conda env export --no-builds > environment.yml

It can take a few minutes.

Optional dependency

wandb

Refer to their quick start guide for installation.

File structures

All the major implementations were under rlor folder.

./rlor
├── envs
│   ├── tsp_data.py # load pre-generated data for evaluation
│   ├── tsp_vector_env.py # define the (vectorized) gym environment
│   ├── cvrp_data.py 
│   └── cvrp_vector_env.py 
├── models
│   ├── attention_model_wrapper.py # wrap refactored attention model to cleanRL
│   └── nets # contains refactored attention model
└── ppo_or.py # implementaion of ppo with attention model for operation research problems

The ppo_or.py was modified from cleanrl/ppo.py. To see what's changed, use diff:

# apt install diff
diff --color ppo.py ppo_or.py

Training OR model with PPO

TSP

python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp

CVRP

python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp

Enable WandB

python ppo_or.py ... --track

Add --track argument to enable tracking with WandB.

Where is the tsp data?

It can be generated from the official repo of the attention-learn-to-route paper. You may modify the ./envs/tsp_data.py to update the path to data accordingly.

Acknowledgements

The neural network model is refactored and developed from Attention, Learn to Solve Routing Problems!.

The idea of multiple trajectory training/ inference is from POMO: Policy Optimization with Multiple Optima for Reinforcement Learning.

The RL environments are defined with OpenAI Gym.

The PPO algorithm implementation is based on CleanRL.

git-liuhu / rlor Goto Github PK

rlor's Introduction

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

News

Demo

Installation

Conda

Optional dependency

File structures

Training OR model with PPO

TSP

CVRP

Enable WandB

Where is the tsp data?

Acknowledgements

rlor's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent