thiagopbueno / tf-plan Goto Github PK

View Code? Open in Web Editor NEW

15.0 4.0 6.0 304 KB

Planning through backpropagation using TensorFlow.

Home Page: https://tf-plan.readthedocs.io/

License: GNU General Public License v3.0

Python 99.65% JavaScript 0.06% Makefile 0.29%

planning tensorflow rddl

tf-plan's Introduction

tf-plan

Planning via gradient-based optimization in continuous MDPs using TensorFlow.

tf-plan is an implementation based on the NIPS 2017 paper:

Wu Ga, Buser Say, and Scott Sanner, 2017
Scalable Planning with Tensorflow for Hybrid Nonlinear Domains.
In Advances in Neural Information Processing Systems (pp. 6273-6283).

Quickstart

tf-plan is a Python3.5+ package available in PyPI.

$ pip3 install -U tf-plan

Features

tf-plan solves discrete time MDPs with continuous state-action spaces and deterministic transitions.

The domains/instances are specified using the RDDL language.

It is built on Python3's RDDL toolkit:

pyrddl: RDDL lexer/parser in Python3.
rddl2tf: RDDL2TensorFlow compiler.
rddlgym: A toolkit for working with RDDL domains in Python3.

Please refer to the projects' documentation for further details.

Usage

$ tfplan --help
Usage: tfplan [OPTIONS] [tensorplan|straightline|hindsight] RDDL

  Planning via gradient-based optimization in TensorFlow.

  Args:
      RDDL Filename or rddlgym domain/instance id.

Options:
  -b, --batch-size INTEGER        Number of trajectories in a batch.
                                  [default: 128]
  -hr, --horizon INTEGER          Number of timesteps.  [default: 40]
  -e, --epochs INTEGER            Number of training epochs.  [default: 500]
  --optimizer [Adadelta|Adagrad|Adam|GradientDescent|ProximalGradientDescent|ProximalAdagrad|RMSProp]
                                  [default: GradientDescent]
  -lr, --learning-rate FLOAT      Optimizer's learning rate.  [default: 0.001]
  -n, --num-samples INTEGER       Number of runs.  [default: 1]
  --num-workers INTEGER RANGE     Number of worker processes (min=1, max=12).
                                  [default: 1]
  --logdir PATH                   Directory used for logging training
                                  summaries.  [default: /tmp/tfplan/]
  -v, --verbose                   Verbosity flag.
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Examples

Navigation

$ tfplan tensorplan Navigation-v1 -b 32 -hr 20 -e 200 --optimizer RMSProp -lr 0.05
    location(x)  location(y)   move(x)   move(y)  distance(z1)  distance(z2)  deceleration(z1)  deceleration(z2)     reward  done
0      0.000000     1.000000  0.997812  0.251873      6.103278      2.500000          0.998212          0.905148 -11.313708   0.0
1      0.901553     1.227575  0.997849  0.928313      5.244620      1.870730          0.995207          0.808415 -10.526089   0.0
2      1.704362     1.974440  0.997174  0.994867      4.152069      1.045723          0.983263          0.556283  -9.433639   0.0
3      2.249789     2.518605  0.991678  0.996339      3.389630      0.891024          0.960244          0.488901  -8.664492   0.0
4      2.715346     2.986350  0.891563  0.996716      2.740580      1.215423          0.917949          0.622608  -8.005720   0.0
5      3.224894     3.555996  0.421230  0.996869      2.010508      1.812289          0.819746          0.795918  -7.241465   0.0
6      3.499726     4.206403  0.444045  0.996535      1.528732      2.335447          0.705929          0.885625  -6.575032   0.0
7      3.777337     4.829425  0.924151  0.995804      1.266264      2.921140          0.621903          0.941680  -5.935030   0.0
8      4.318551     5.412601  0.984080  0.992210      1.138953      3.710104          0.574958          0.976961  -5.140282   0.0
9      4.871320     5.969937  0.985487  0.985184      1.475559      4.492920          0.690257          0.990931  -4.355447   0.0
10     5.545391     6.643800  0.975556  0.964462      2.212087      5.444489          0.854325          0.997096  -3.402467   0.0
11     6.376412     7.465371  0.940539  0.905565      3.269241      6.612030          0.954476          0.999284  -2.234082   0.0
12     7.273490     8.329092  0.739812  0.695562      4.453168      7.856998          0.988132          0.999839  -0.988905   0.0
13     8.004405     9.016289  0.029843  0.117345      5.424326      8.860193          0.996100          0.999952  -0.016874   0.0
14     8.034130     9.133170 -0.087813 -0.059488      5.538250      8.961620          0.996578          0.999957  -0.137474   0.0
15     7.946621     9.073888  0.021886 -0.044026      5.440866      8.857259          0.996173          0.999952  -0.091152   0.0
16     7.968422     9.030033  0.021429 -0.015050      5.415970      8.843177          0.996062          0.999951  -0.043579   0.0
17     7.989765     9.015043 -0.013826  0.016563      5.415193      8.848604          0.996059          0.999951  -0.018195   0.0
18     7.975994     9.031540 -0.028730  0.037703      5.421383      8.849745          0.996087          0.999951  -0.039636   0.0
19     7.947378     9.069093  0.010496 -0.030616      5.437246      8.854523          0.996157          0.999951  -0.086850   1.0

Documentation

Please refer to https://tf-plan.readthedocs.io/ for the code documentation.

Support

If you are having issues with tf-plan, please let me know at: [email protected].

License

tf-plan is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

tf-plan is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with tf-plan. If not, see http://www.gnu.org/licenses/.

tf-plan's People

Contributors

Stargazers

Watchers

Forkers

miquelramirez ramonpereira afcarl nadia-el siowmeng danielbdias

tf-plan's Issues

Report training losses curves for each evaluation timestep

Add monitor phase in online planning to attempt executing some previouly-planned actions

Add separate sub-parsers for online and offline tfplan script arguments and options

Build straight-line value function ops

Add option to share action meta-parameters across planning scenarios. Share tf.Variables in open-loop policy across parallel planners to jointly optimize a single meta-action sequence in online mode (e.g., straight-line planning).

Add the total reward for each online step to trajectory stats in stochastic planners

Add time profiling for stochastic planners

Add wall clock uptime for each online step to trajectory stats for stochastic planners

Fix sampled noise inputs between online execution steps

Refactor online planner to avoid re-building the RNN dynamic unrolling ops each timestep

Specify the sequence_length parameter to be used in the tf.nn.dynamic_rnn() function.

Refactor simulation modules used in tensorplan and stochastic online tfplan and add it to train package

Report experiment summary

Refactor ActionOptimizer class to define general interface for Planner class

Re-define ActionOptimizer to be mode-agnostic. Might need to accept training hooks and optional arguments for building corresponding optimization and solution graphs.

Remove dependencies to tf-rddlsim package

Add arguments to ActionOptimizer to select and configure inner optimizer

Add hyperparameter options to select and pre-configure TensorFlow optimizer used inside tfplan.train.ActionOptimizer class.

Add logging capabilities for trajectory debugging

Build the hindsight value function ops

Separate in the online mode the computation of the action at current timestep and the independent remaining actions corresponding to the planning batch. Share meta-action variables in the current timestep to approximate the hindsight value function for the current state.

Sample a batch of trajectory scenarios via re-parameterization

Sample a batch of scenarios as inputs to RNNs at the beginning of training epoch and fix the corresponding re-parameterized noise ops between training epochs.

Apply discount factor over rewards for online planning

Create separate click app for tensorplan

Report trajectory results for each trial

Unroll trajectory graph with actions and noise as inputs

Add an RNN with action fluents and reparameterization noise as inputs in order to symbolically approximate the value function for a given start state.

Dump nonfluents in CSV file before running any planner

Unify test/evaluator.py and planners/environment.py modules

Define single tfplan.test.evaluator (mode-agnostic) interface by unifying offline trajectory evaluation and online environment step interaction.

Separate planning horizon and evaluation horizon

Add regularization term over the l2-norm of the policy variables

GPU vs CPU performance

Hi @thiagopbueno,

I'm also working with @ramonpereira and @miquelramirez and I have been trying to run tf-plan in a Linux box with GPUs. However, in our experiments (the same domains as in issue #2 ), it seems that running the planner with tensorflow-gpu installed instead of plain tensorflow takes substantially longer.

In the example running

time tfplan lqr_instance_problem0.rddl -hr 100 -e 1000 -v --viz=generic -b 128 -lr 0.005 -m online

I have two different times at the end of the process, depending on whether I'm running with GPUs or not.
First, with GPUs (either 1 or 3 GPUs does not change the times substantially)

real    193m34.868s
user    405m10.476s
sys     40m27.540s

Whereas the times for running with CPUs are:

real    66m31.023s
user    208m0.132s
sys     55m38.096s

This is really weird to me, as the GPU time is almost 3 times slower than running on CPUs.

I don't know if I would classify this as a bug, I would think this is a call for enhancement. Given that many of the domains one would use RDDL to solve are very complex, it would be great to be able to leverage Tensorflow's GPU speed up to the max.

Issue with tf-plan

Hi @thiagopbueno,

I'm working with @miquelramirez in a project here in Melbourne, and we're using your tf-plan planner to play with some of the linear (and non-linear) domains we have formalized. Actually, we're trying to use some RDDL domains generated/translated from Tarski as input into your planner, and the planner has returned the following error for one of the domains (lqg_nav_1d_001.rddl in the zip file):

Running tf-plan v0.4.5 ...
>> RDDL:            ../rddl/deterministic/lqg_nav_1d_001.rddl
>> Planning mode:   offline
>> Horizon:         10
>> Batch size:      32
>> Training epochs: 1000
>> Learning rate:   0.01

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1626, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shapes must be equal rank, but are 1 and 2 for 'action_optimizer/trajectory/while/reward/Select' (op: 'Select') with input shapes: [32,1], [32], [32].

We are not sure, but we think that it might be related to the cpfs{} defined in our domains.
We have tried to use other RDDL domains and the planner has worked properly.
If you have time, please try to use the other domains in the zip file.

rddls_from_tarski.zip

Regards.
Ramon F. P.

thiagopbueno / tf-plan Goto Github PK

tf-plan's Introduction

tf-plan

Quickstart

Features

Usage

Examples

Navigation

Documentation

Support

License

tf-plan's People

Contributors

Stargazers

Watchers

Forkers

tf-plan's Issues

Recommend Projects

Recommend Topics

Recommend Org