Giter Site home page Giter Site logo

dqnts's Introduction

DQNTS: Deep Q-Learning Tabu Search

Introduction

In this project, I've tried to face the Max-Mean Dispersion Problem using Tabu Search guided by a deep reinforcement learning algorithm during the dispersion phase. Given a complete graph G(V, E) where each edge has associated a distance or affinity $d_{ij}$, the Max Mean Dispersion Problem is the search of a subset of vertex $M \subset V, |M| >= 2$ which maximize the mean dispersion, calculated as follow:

$$ MeanDispersion(M) = \frac{\sum_{i,j \in M}d_{ij}}{|M|} $$

This problem is known to be strongly NP-hard and has the characteristic that the subset $M$ does not have a fixed size, but can vary between $2$ and $|V|$.

RLTS

In 2020, Nijimbere et al. proposed an approach based on the combination of reinforcement learning and tabu search, named RLTS. The main idea is to use Q-Learning to build a high-quality initial solution, then improving it with a one-flip tabu search. Then Q-Learning algorithm is trained during the search of the solution, by giving a positive or negative reward if the node remain or not in the solution after the tabu search.

DQNTS

My proposed solution is to use a Deep Q-Network to generate the initial solution and tabu search to improve it, hence the name DQNTS. The approach is similar to \cite{nijimbere2020tabu}, the network learns a heuristic to build the initial solution, then uses a one-flip tabu search to improve that solution. The difference is that the DQN is pre-trained, hence there is no need to start with a random solution and I can generate a high-quality initial solution from the beginning.

Network Architecture

The network architecture is based on "Learning combinatorial optimization algorithms over graphs" by Dai et al. The state2tens embedding is done with 4 features extracted from each node, which are:

  • 1 if the node in the solution, 0 otherwise
  • the sum of all edges connected to the node
  • the sum of all edges connected to the node and the solution nodes
  • the sum of all edges connected to the node and the nodes not in the solution

Network training

To train the network, first, we construct a feasible solution using an $\epsilon$-greedy strategy, stopping when no positive rewards are predicted by the network. Then the solution is given to the one-flip tabu search which improves it and returns the best solution. For every node in the initial solution which remains in the final solution after tabu search, a reward of +1 is given, otherwise, the reward is -1. The network has been trained over 10 different instances (MDPIx_35 and MDPIIx_35, $1 <= x <= 10$) for 5001 episodes.

Tabu Search

The tabu search implementation is the same as in \cite{nijimbere2020tabu}, with the only difference in the parameter $\alpha = 100$ instead of $\alpha = 50000$. My implementation couldn't finish even a single iteration in the time limit imposed with the parameter $\alpha$ proposed in the paper. This made me think that my implementation is way slower, but still, I left the same time constraints as the results were good enough.

General Algorithm

DQN is used during the construction of the initial solution: for all the nodes, the network estimates the reward, then all the values estimated get interpolated in the range $[-1,+1]$. All the nodes with a value $>= 0$ are named as "good nodes". Among these "good nodes", a random amount is taken to construct the initial solution. Picking one node at a time would result in a better solution, but this approach was too slow for large graphs. This solution is then processed with a one-flip tabu search until no best solutions are found for $\alpha = 100$ iterations in a row. Now a new initial solution is generated and the process is repeated until the time limit is not violated. Finally, the best solution found is returned.

Result

Results can be seen on the pdf presentation uploaded in this repository under the pdf directory.

dqnts's People

Contributors

lucalumetti avatar

Stargazers

Alessandro Lucantonio avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.