tomdbar / eco-dqn Goto Github PK

View Code? Open in Web Editor NEW

79.0 79.0 24.0 33.48 MB

Implementation of ECO-DQN as reported in "Exploratory Combinatorial Optimization with Reinforcement Learning".

Home Page: https://arxiv.org/abs/1909.04063

License: MIT License

Python 100.00%

combinatorial-optimization publication-code pytorch reinforcement-learning

eco-dqn's People

Contributors

Stargazers

Watchers

eco-dqn's Issues

Question about observations invented in your paper

So, according to your paper, you invented 7 observations:

Vertex state, i.e. if v is currently in the solution set, S.
Immediate cut change if vertex state is changed.
Steps since the vertex state was last changed.
Difference of current cut-value from the best observed.
Distance of current solution set from the best observed.
Number of available actions that immediately increase the cut-value.
Steps remaining in the episode.

And you said that (4-6) ensures the extrinsic and intrinsic rewards are Markovian, however I do not get the point.
Could you explain it ?
Thx!

norm.max() when doing batched validation

Shouldn't this line in mpnn.py:

edge_embeddings = F.relu(self.edge_feature_NN(torch.cat([embedded_edges, norm / norm.max()],dim=-1)))

be replaced with something like:
max_values, _ = torch.max(norm, dim=1, keepdim=True)
edge_embeddings = F.relu(self.edge_feature_NN(torch.cat([embedded_edges, norm / max_values], dim=-1))

so that the batches are each divided by the max of their part of the norm tensor?

Training on different sizes of graphs

The current implementation trains using sets of graphs with constant size. How difficult would it be to generalise the training to train on graphs of various sizes (say in a range according to uniform distribution)?

Setting time out

Hello, I would like to set a time out for the evaluation of eco-dqn. How can I do that?

what is the probability for an edge to have negative weight in your dataset

It looks like your data has negative weight edge, is that true? If so, what is the probability for an edge to have negative weight in your dataset, I did not find the probability in your report?

Thank you very much for help.

Best,
Jinye

About computing optimality

Hi, Thanks for sharing this awesome code. I am wondering how can I compute the optimality reported in the paper.

I guess I should read the graph info pickle file to get the optimality answer and compute w.r.t to them?

It would be really helpful if you could share the related code.

Thanks!

Add Observations

Hi , tomdbar .

If I want to Add some observations , which part of your code shell I change.

Work out with different results

Hi, I am working on your code, and I am confused about your data.

For data in _graphs/validation/opts/cuts_BA_20spin_m4_100graphs.pkl
The average value is 17.7, but it seems that this is not the max cut of graph.

I solved the max-cut over these graph using cplex and got result with 46.69.
What does the data in _graphs/validation/opts/cuts_BA_20spin_m4_100graphs.pkl mean? The best cut - init cut?

The code below is the max-cut solver


import re
import random
import networkx as nx
import dgl
import cplex
import torch

import numpy as np

def max_cut_solve(graphs):

    prob = cplex.Cplex()
    prob.set_log_stream(None)
    prob.set_error_stream(None)
    prob.set_warning_stream(None)
    prob.set_results_stream(None)
    prob.set_problem_name("MAX Cut")
    prob.set_problem_type(cplex.Cplex.problem_type.LP)
    prob.objective.set_sense(prob.objective.sense.maximize)
    # names: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
    n = graphs.number_of_nodes()
    edges = [(i, j) if i > j else (j, i) for i, j in zip(graphs.edges()[0].numpy(), graphs.edges()[1].numpy())]


    names = []
    w_obj = []
    # edge ij, i > j
    for i in range(n):
        for j in range(i):
            names.append(str(i)+str(j))
            w_obj.append(int((i, j) in edges))



    low_bnd = [0 for x in range(n*(n-1)//2)]
    upr_bnd = [1 for x in range(n*(n-1)//2)]

    prob.variables.add(names=names, obj=w_obj, lb=low_bnd, ub=upr_bnd)
    all_int = [(var, prob.variables.type.integer) for var in names]
    prob.variables.set_types(all_int)
    constraints = []
    rhs = []
    for i in range(n):
        for j in range(i):
            for k in range(j):
                constraints.append([[str(i)+str(j), str(i)+str(k), str(j)+str(k)], [-1, 1, 1]])
                rhs.append(0)
                constraints.append([[str(i)+str(j), str(i)+str(k), str(j)+str(k)], [-1, -1, -1]])
                rhs.append(-2)


    # for src, dist in zip(graphs.edges()[0].numpy(), graphs.edges()[1].numpy()):
    #     constraints.append([[str(src), str(dist)], [1, 1]])

    constraint_names = ["".join(x[0]) for x in constraints]
    # rhs = [1] * len(constraints)
    constraint_senses = ["G"] * len(constraints)
    prob.linear_constraints.add(names=constraint_names,
                                lin_expr=constraints,
                                senses=constraint_senses,
                                rhs=rhs)
    prob.solve()


    # print(prob.solution.get_values())
    # print(prob.solution.get_values())
    # print(prob.solution.get_values())
    # print(prob.solution)
    cut_edges = [i and j for i, j in zip(prob.solution.get_values(), w_obj)]
    cut_u = []
    cut_v = []
    cnt = 0
    for i in range(n):
        for j in range(i):
            if cut_edges[cnt]:
                cut_u.append(i)
                cut_u.append(j)
                cut_v.append(j)
                cut_v.append(i)
            cnt += 1


    bg = dgl.to_bidirected(graphs)
    tmp = bg.edge_ids(torch.tensor(cut_u), torch.tensor(cut_v))

    # print(bg)
    bg.remove_edges(tmp)
    # print(bg)
    nx_g = bg.to_networkx().to_undirected()

    assignment = [0 for i in range(n)]
    for cc in max(nx.connected_components(nx_g), key=len):
        assignment[cc] = 1
    assignment.append(sum(cut_edges))
    return assignment





# u, v = torch.tensor([0, 0, 0, 1]), torch.tensor([1, 2, 3, 3])
# g = dgl.graph((u, v))
# print(g)
# print(max_cut_solve(g))

Difference between testing during training and testing after training

Looking at the implementation I see that testing during training to produce the learning curve is implemented in a different way compared to testing after training. Are there any technical differences between the two implementations?

tomdbar / eco-dqn Goto Github PK

eco-dqn's People

Contributors

Stargazers

Watchers

Forkers

eco-dqn's Issues

Question about observations invented in your paper

norm.max() when doing batched validation

Training on different sizes of graphs

Setting time out

what is the probability for an edge to have negative weight in your dataset

About computing optimality

Add Observations

Work out with different results

Difference between testing during training and testing after training

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent