pemami4911 / neural-combinatorial-rl-pytorch Goto Github PK

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning https://arxiv.org/abs/1611.09940

License: MIT License

Python 95.35% Shell 4.65%

pytorch reinforcement-learning seq2seq neural-combinatorial-optimization

neural-combinatorial-rl-pytorch's Introduction

neural-combinatorial-rl-pytorch

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

I have implemented the basic RL pretraining model with greedy decoding from the paper. An implementation of the supervised learning baseline model is available here. Instead of a critic network, I got my results below on TSP from using an exponential moving average critic. The critic network is simply commented out in my code right now. From correspondence with a few others, it was determined that the exponential moving average critic significantly helped improve results.

My implementation uses a stochastic decoding policy in the pointer network, realized via PyTorch's torch.multinomial(), during training, and beam search (not yet finished, only supports 1 beam a.k.a. greedy) for decoding when testing the model.

Currently, there is support for a sorting task and the planar symmetric Euclidean TSP.

See main.sh for an example of how to run the code.

Use the --load_path $LOAD_PATH and --is_train False flags to load a saved model.

To load a saved model and view the pointer network's attention layer, also use the --plot_attention True flag.

Please, feel free to notify me if you encounter any errors, or if you'd like to submit a pull request to improve this implementation.

Adding other tasks

This implementation can be extended to support other combinatorial optimization problems. See sorting_task.py and tsp_task.py for examples on how to add. The key thing is to provide a dataset class and a reward function that takes in a sample solution, selected by the pointer network from the input, and returns a scalar reward. For the sorting task, the agent received a reward proportional to the length of the longest strictly increasing subsequence in the decoded output (e.g., [1, 3, 5, 2, 4] -> 3/5 = 0.6).

Dependencies

Python=3.6 (should be OK with v >= 3.4)
PyTorch=0.2 and 0.3
tqdm
matplotlib
tensorboard_logger

PyTorch 0.4 compatibility is available on branch pytorch-0.4.

TSP Results

Results for 1 random seed over 50 epochs (each epoch is 10,000 batches of size 128). After each epoch, I validated performance on 1000 held out graphs. I used the same hyperparameters from the paper, as can be seen in main.sh. The dashed line shows the value indicated in Table 2 of Bello, et. al for comparison. The log scale x axis for the training reward is used to show how the tour length drops early on.

Sort Results

I trained a model on sort10 for 4 epochs of 1,000,000 randomly generated samples. I tested it on a dataset of size 10,000. Then, I tested the same model on sort15 and sort20 to test the generalization capabilities.

Test results on 10,000 samples (A reward of 1.0 means the network perfectly sorted the input):

task	average reward	variance
sort10	0.9966	0.0005
sort15	0.7484	0.0177
sort20	0.5586	0.0060

Example prediction on sort10:

input: [4, 7, 5, 0, 3, 2, 6, 8, 9, 1]
output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Attention visualization

Plot the pointer network's attention layer with the argument --plot_attention True

TODO

Add RL pretraining-Sampling
Add RL pretraining-Active Search
Active Search
Asynchronous training a la A3C
Refactor USE_CUDA variable
Finish implementing beam search decoding to support > 1 beam
Add support for variable length inputs

Acknowledgements

Special thanks to the repos devsisters/neural-combinatorial-rl-tensorflow and MaximumEntropy/Seq2Seq-PyTorch for getting me started, and @ricgama for figuring out that weird bug with clone()

neural-combinatorial-rl-pytorch's People

Contributors

Stargazers

Watchers

Forkers

ml-lab g-wang benjamesbabala hedgefair sxdkxgwan xwhfut jokerwhy233 derylucio shubhampachori12110095 montana kyriechin davidtranno1 zafarali orion-chen kamillegault alejandroe-sanchezv binderwang afcarl gumplus sunshinejunfu wizquierdo geyal anthonysu huangyanfei wasd12345 benikah anhhuyalex rahulptel amrmkayid kimdoanh89 xsingit zdebou sigmam rongkaizhang jiaodaxiaozi willsmithsky andrewjiangxw chrisliu007 alexluoqk anticold mengbinghen hyzcn sean0719 shaelynw scorpjd kercker obitoquilt jinglong92 btbujiangjun o7s8r6 zhangyxyyx pengzheng9045 yangmeiyi pepsalehi jialewang97 ssitb xx-fighting kuonanhong networkanddatasciencelab jiaxin96 kejiejiang catezi c8pan titaneric bwantan shaoyn0817 nykabhishek windyli-h nuaa-codemonkey sugarsword zongkailiu fenghaif giilde jiawlu drwxyh neteasefans giabarbou yoon-th tslnihaogit yangcan29 daunfamily insane-42 wangshgeo tingyingwu2010 nifengfeixiang rui-ranyu zzzqi nicotan1997 hanbaoan123 jinqiuzhao hao155 daringpig amhijazi paulorocosta eru1206 xrosliang fjcufinaleteam10 gaosz0755 xihushui599 1520292247

neural-combinatorial-rl-pytorch's Issues

how to define a grouped knapsack task?

How to run the program

It was a paper I was interested in and I am very grateful for your program. I have a question about the program. Sorry for the rudimentary question, I have read LEADME but I don't know how to run the program because I have no knowledge of networking, shell commands, etc. Can you please tell me how to learn and test the program?

variable length inputs

Hi, thanks a lot for your code. Haven't had a chance to look through it in detail yet but it seems like a pretty solid implementation. A question though: in your to do, I see an item:

"Add support for variable length inputs"

Just wanted to clarify what this means since in general a pointer net could deal with arbitrary size inputs, so what inputs is it referring to which apparently have a fixed length?

Thanks!

n_epochs > 1 : multinomial sampling error probs<0

Hi, for both the sorting task and the TSP task, when I run with any n_epochs > 1, toward the beginning of the 2nd epoch I get the following error in the stochastic decoder in neural_combinatorial_rl.py related to multinomial distribution sampling. When I print out "probs" I see that they very quickly converge to all 0's and a 1 distribution, then the following iteration are all nans. Any idea what's going on here, or did you just end up avoiding this by running with 1 epoch with many iterations?

Thanks!

(using the pytorch-0.4 branch)

(below is shown for the sort_10 task)

probs
tensor([[0.0000, 0.0000, 0.0000, 0.5416, 0.4584, 0.0000],
[0.0000, 0.0000, 0.5079, 0.4921, 0.0000, 0.0000],
[0.0000, 0.5320, 0.0000, 0.0000, 0.0000, 0.4680],
[0.0000, 0.0000, 0.0000, 0.5146, 0.4854, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.5275, 0.4725],
[0.0000, 0.0000, 0.4794, 0.0000, 0.0000, 0.5206],
[0.4945, 0.5055, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.5195, 0.0000, 0.4805, 0.0000],
[0.5135, 0.0000, 0.0000, 0.0000, 0.4865, 0.0000],
[0.4877, 0.0000, 0.5123, 0.0000, 0.0000, 0.0000],
[0.5207, 0.0000, 0.0000, 0.0000, 0.0000, 0.4793],
[0.0000, 0.5231, 0.0000, 0.0000, 0.0000, 0.4769]],
grad_fn=)
probs
tensor([[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 1.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 1.],
[0., 0., 1., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 1.]], grad_fn=)
probs
tensor([[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan]], grad_fn=)

... leads to...

Traceback (most recent call last):
File "trainer.py", line 295, in
R, probs, actions, actions_idxs = model(bat)
File "C:\Users\me\Anaconda3\envs\myenv\lib\site-packages\torch\nn\modules
\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\me\Desktop\neural-combinatorial-rl-pytorch\neural_combinatorial_rl.py", line 514, in forward
probs_, action_idxs = self.actor_net(embedded_inputs)
File "C:\Users\me\Anaconda3\envs\myenv\lib\site-packages\torch\nn\modules
\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\me\Desktop\neural-combinatorial-rl-pytorch\neural_combinatorial_rl.py", line 374, in forward
enc_h)
File "C:\Users\me\Anaconda3\envs\myenv\lib\site-packages\torch\nn\modules
\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Users\me\Desktop\neural-combinatorial-rl-pytorch\neural_combinatorial_rl.py", line 189, in forward
selections)
File "C:\Users\me\Desktop\neural-combinatorial-rl-pytorch\neural_combinatorial_rl.py", line 255, in decode_stochastic
idxs = c.sample()
File "C:\Users\me\Anaconda3\envs\myenv\lib\site-packages\torch\distributions\categorical.py", line 90, in sample
sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering
probability entry < 0) at c:\programdata\miniconda3\conda-bld\pytorch_153309062
3466\work\aten\src\th\generic/THTensorRandom.cpp:407

a question in the mwm2D problem

Could the input of the mwm2D be changed into a weight value matrix？I really need your help

request return 400

run download_google_drive_file function,the result of request is 400.

RuntimeError

Hi, when I run
python3 trainer.py
There comes an error:

Traceback (most recent call last):
File "trainer.py", line 203, in
R, probs, actions, actions_idxs = model(bat)
File "/home/maroon/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/maroon/PycharmProjects/neural-combinatorial-rl-pytorch/neural_combinatorial_rl.py", line 539, in forward
R = self.objective_fn(actions, self.use_cuda)
File "/home/maroon/PycharmProjects/neural-combinatorial-rl-pytorch/sorting_task.py", line 49, in reward
current += res.float()
RuntimeError: invalid argument 1: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at /pytorch/torch/lib/TH/generic/THTensor.c:299

How can I solve it? Thank you.

Subset selection

Is this implementation useable for tasks where not all of the input is selected, but some subset of the input? I have seen partial implementation (such as the finish_symbol variable), but I believe it's not finished?

plot_reward.py: reward_csv?

Hi,
plot_reward.py looks like the script you used to create the figures in the README showing the average TSP tour length as a function of training progress.
In that script, there are two arguments which are required (paths to two things):

data
reward_csv

But in trainer.py I don't see these things being saved out (for example I don't see any pandas import or csv saving).
So then maybe you had another script to parse the logfiles and make this reward_csv? If so, could you provide it?

Thanks!

Why is the progress bar not moving in line for batch_id, sample_batch in enumerate(tqdm(training_dataloader, disable=args['disable_progress_bar'])):?

is this project still open?

can't find the SGD in criticnetwork and cant find the temperture in softmax

beam search

The To Do mentions "Finish implementing beam search decoding to support > 1 beam".

It looks like you have various beam width variables in different places of the code. Was your note because:

you implemented beam search but it didn't seem to work as well when width>1 and you wanted to investigate further, or
there are still some places in the code that would need to be modified to before using beam search?

If 2), then which places would need to be modified?

Thanks!

a question

Hi,

first thank you for your repo!

I just want to ask you, is there a source from where I can understand more about the architecture of your work, besides the mentioned paper?
I mean, maybe you wrote a blog post about your algorithm or something like this.

Kind regards,
Paddy

update:

Also, could you please tell how I can reuse the model which I've trained. I mean I finished the training and want to apply it for sorting a list, f.e.:

python model.py [3,4,6,5,1,2]

And the output would be a sorted list:

[1,2,3,4,5,6]

mask

Hello @pemami4911,

The problem really was with the mask. I've fixed it and the network started to learn. My Decoder now is:

class Decoder(nn.Module):
    def __init__(self, feactures_dim,hidden_size, n_layers=1):
        super(Decoder, self).__init__()
                
        self.W1 = Var(hidden_size, hidden_size)
        self.W2 = Var(hidden_size, hidden_size)
        self.b2 = Var(hidden_size)
        self.V = Var(hidden_size)
        
        self.lstm = nn.LSTM(hidden_size, hidden_size, batch_first=True, num_layers=n_layers)

    def forward(self, input, hidden, enc_outputs,mask, prev_idxs):
        
        w1e = torch.matmul(enc_outputs,self.W1)
        w2h = (torch.matmul(hidden[0][-1], self.W2) + self.b2).unsqueeze(1)
        
        u = F.tanh(w1e + w2h)
        a = torch.matmul(u, self.V)
        
        a, mask = self.apply_mask( a, mask, prev_idxs)
        a = F.softmax(a)
        res, hidden = self.lstm(input, hidden)          
        return a, hidden, mask
    
    def apply_mask(self, attentions, mask, prev_idxs):    
        if mask is None:
            mask = Variable(torch.ones(attentions.size())).cuda()
            
        maskk = mask.clone()

        if prev_idxs is not None:

            for i,j in zip(range(attentions.size(0)),prev_idxs.data):
                maskk[i,j[0]] = 0
                
            masked= maskk*attentions + maskk.log()
        else:
            masked = attentions  

        return masked, maskk

For n=10, I'm obtaining the following during training the AC version:


Step 0
Average train model loss:  -81.07959747314453
Average train critic loss:  29.443866729736328
Average train pred-reward:  -0.08028505742549896
Average train reward:  5.2866997718811035
Average loss:  -15.106804847717285
------------------------
Step  1000
Average train model loss:  -0.7814755792869255
Average train critic loss:  0.7740849611759186
Average train pred-reward:  4.219553744537756
Average train reward:  4.272005982398987
Average loss:  -6.201663847446442
------------------------

(...)

Step  19000
Average train model loss:  -0.06441724334075116
Average train critic loss:  0.1361817416474223
Average train pred-reward:  3.0573679950237276
Average train reward:  3.0583059163093567
Average loss:  -1.5689961900115013

I've checked and the returned solutions are all feasible so it seems that it is really converging.
I will clean my code and hope that by the end of the week I will have some training history plots and test set validation.
If you like, I can share my notebook.

Best regards

Meet lots of Deprecated warning with higher version of Pytorch. Do you have any idea?

Hi, thank you for your repo! It's an awesome implement and helps me a lot. However when running it with higher version of pytorch, it raises lots of deprecated warning as this:

IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.

I searched around the project but find uint8 nowhere. Do you have any idea how to fix this? Or do I have to run it at version of 0.4. (I use 1.2 currently since the API didn't change a lot since 0.4)

TSP_20 task reward function

Hi, thanks for sharing your code.

In the file "tsp_task.py", I see the TSP reward defined in the usual way as the sum of distances between consecutive vertices in the cylce. But it looks like you also started to generalize this, or maybe had to manually tune things a bit for tsp_20? I see this:

# For TSP_20 - map to a number between 0 and 1
# min_len = 3.5
# max_len = 10.
# TODO: generalize this for any TSP size
#tour_len = -0.1538*tour_len + 1.538 
#tour_len[tour_len < 0.] = 0.
return tour_len

So for the results you showed for the tsp_20 and tsp_50 tasks, did you use the usual sum of distances that would run as is in the current uncommented code, or did you find it helped to do those affine then clip operations you have commented out? Are these values derived theoretically or empirically?

Any other thoughts here in terms of generalizing the reward to work with arbitrary tour length?

Thanks!

Error when executing main.sh

I ran main.sh (bash . /main.sh), I got this error.

~~Validating~~

0%| | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "./trainer.py", line 297, in
R, probs, actions, action_idxs = model(bat)
File "/home/riko/anaconda3/envs/pytorch2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/riko/notebook/neural-combinatorial-rl-pytorch/neural-combinatorial-rl-pytorch/neural_combinatorial_rl.py", line 512, in forward
probs_, action_idxs = self.actor_net(embedded_inputs)
File "/home/riko/anaconda3/envs/pytorch2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/riko/notebook/neural-combinatorial-rl-pytorch/neural-combinatorial-rl-pytorch/neural_combinatorial_rl.py", line 372, in forward
enc_h)
File "/home/riko/anaconda3/envs/pytorch2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/riko/notebook/neural-combinatorial-rl-pytorch/neural-combinatorial-rl-pytorch/neural_combinatorial_rl.py", line 219, in forward
embedded_inputs, beam, batch_size, n_best, i)
File "/home/riko/notebook/neural-combinatorial-rl-pytorch/neural-combinatorial-rl-pytorch/neural_combinatorial_rl.py", line 295, in decode_beam
if all_idxs.size(0) > 1:
RuntimeError: dimension specified as 0 but tensor has no dimensions

I changed the argument in line 253 of neural-combratoral-ri.py to 1.
Before: idxs = probs.multinomial().squeeze(1)
After: idxs = probs.multinomial(1).squeeze(1)

In addition, I removed the $ mark from RANDOM_SEED=$1 because there was also an error in main.sh.

pemami4911 / neural-combinatorial-rl-pytorch Goto Github PK

neural-combinatorial-rl-pytorch's Introduction

neural-combinatorial-rl-pytorch

Adding other tasks

Dependencies

TSP Results

Sort Results

Attention visualization

TODO

Acknowledgements

neural-combinatorial-rl-pytorch's People

Contributors

Stargazers

Watchers

Forkers

neural-combinatorial-rl-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org