nrontsis / pilco Goto Github PK

View Code? Open in Web Editor NEW

312.0 18.0 84.0 586 KB

Bayesian Reinforcement Learning in Tensorflow

License: MIT License

Python 58.49% MATLAB 41.51%

reinforcement-learning model-based-rl gaussian-processes tensorflow machine-learning

pilco's Introduction

Probabilistic Inference for Learning Control (PILCO)

A modern & clean implementation of the PILCO Algorithm in TensorFlow v2.

Unlike PILCO's original implementation which was written as a self-contained package of MATLAB, this repository aims to provide a clean implementation by heavy use of modern machine learning libraries.

In particular, we use TensorFlow v2 to avoid the need for hardcoded gradients and scale to GPU architectures. Moreover, we use GPflow v2 for Gaussian Process Regression.

The core functionality is tested against the original MATLAB implementation.

Example of usage

Before using PILCO you have to install it by running:

git clone https://github.com/nrontsis/PILCO && cd PILCO
python setup.py develop

It is recommended to install everything in a fresh conda environment with python>=3.7

The examples included in this repo use OpenAI gym 0.15.3 and mujoco-py 2.0.2.7. Theses dependecies should be installed manually. Then, you can run one of the examples as follows

python examples/inverted_pendulum.py

Example Extension: Safe PILCO

As an example of the extensibility of the framework, we include in the folder safe_pilco_extension an extension of the standard PILCO algorithm that takes safety constraints (defined on the environment's state space) into account as in https://arxiv.org/abs/1712.05556. The safe_swimmer_run.py and safe_cars_run.py in the examples folder demonstrate the use of this extension.

Credits:

The following people have been involved in the development of this package:

References

See the following publications for a description of the algorithm: 1, 2, 3

pilco's People

Contributors

Stargazers

Watchers

Forkers

a5a kyr-pol jenny-nlc binderwang qixing-anhuiuniversity ricardodominguez intmyworld liuweiping2020 shisthruna28 williamd4112 marianodepaula cbiehl collector-m bzp92 zadiq ven-kyoshiro razcle lishuailong xbigot kajiyu molomono hareshkarnan diegoae um-arm-lab flyinskybtx fabiankreutmayr rohansaphal97 linesd blackhc quantumiracle githubbeinner shaluols tk1363704 watabe951 nrjc anouarseg aathmant laotanzhurou sebelinho matsumotokoki patxikuku archielee tianhuanyu ebimor bigbear11 bututoubaobei vcharvet dengnaitian bkjackson emsal0 aandrien giorgosmamakoukas mfkiwl caogang1213 ikamensh h0uter alxhrzg the-intelligence-of-information franzesegiovanni mozammalchy klonggan liang813 sarahboufelja vhanand beamiter wangmengqi32c mk788 xinruozhishui201314 virajmehta willdudley samerabdelmoeti passion4energy zouzgang ringwraith dornenkrone lupusorina victor-yg luhao2021 ahmad-abdellatif hardikparwana dasc-lab axel-ceder astomodynamics chengyu-zhu

pilco's Issues

Continuous Integration

Use Travis for continuous integration.

Cost for trajectory following

Hi, I'm trying to use PILCO on Path tracking for my graduation thesis, but for now the control results are not ideal.
I think it could be improved with a reword for trajectory following.
Do you know an easy way to do this ?
Thanks a lot for the help

Stefan

Reference for predicting with uncertain inputs with SMGPR

Do you have a reference available for the derivation of prediction with uncertain inputs with a sparse GP? I haven't been able to find one anywhere.

Initialisation of ExponentialReward: eye(.) vs ones(.)

The initialisation of the weights W of the exponential reward has a default initialisation of np.ones(.), as defined in the following line:

PILCO/pilco/rewards.py

Line 25 in 6ebcc7d

self.W = Param(np.ones((state_dim, state_dim)), trainable=False)

But when I calculate the reward mean concentrically centered on the target,

import matplotlib.pyplot as plt
from pilco.rewards import ExponentialReward
import tensorflow as tf
import numpy as np
with tf.Session(graph=tf.Graph()) as sess:
    R = ExponentialReward(state_dim=2, t=np.array([0.,0.]),W=np.ones((2,2)))
    muRs = lambda th:R.compute_reward(np.array([np.sin(th),np.cos(th)]), s=np.eye(2))
    left = [i/36*2*np.pi for i in range(36)]
    height = np.array([muRs(th)[0].eval()[0][0] for th in left])
plt.xlabel('rad')
plt.ylabel('mean of reward')
plt.plot(left, height,label='ones')
plt.legend()

the score is not a constant, as it can observed in the following plot:

So I think np.eye(state_dim) is better, i.e. having

self.W = Param(np.eye(state_dim), trainable=False)

which gives the following score across theta:

Extra noise in controller's input covariance

I've noticed that the original implementation of PILCO adds an extra variance, equal to the noise of the underlying GPs, in the controller's input. This is done in this line.

Currently, we don't do this, so I disabled the relevant line for the unit tests to pass.

@kyr-pol what do you think. Should we add in our code?

mujoco-py needed

Hi,
does any else the problem that mujoco-py is missing. The installation is due to regestritation and fees a little bit annoying. I'm working with python3 and ubuntu18.04.
Best regards,
ilja_stas

Bugs in model update?

Hello,
I found a strange behavior in model optimization of mgpr.py.

(1)
Is best_params["k_lengthscales"] = model.kernel.lengthscales
best_params["lengthscales"] = model.kernel.lengthscales ?

(2)
It seems that best_params is updated when optimizer.minimize(model.training_loss, model.trainable_variables) is executed. It means that the values of best_params always changes regardless of whether if loss < best_loss is True or False.

My environment is;
Python 3.7.12
tensorflow 2.9.1
gpflow 2.5.2
gym 0.18.0

Thanks,

Improve numerical stability

Currently, we get Cholesky decomposition failures, especially in the RBF Controller (see #5).

We could try the following:

Constraint the noise (jitter) parameters to a positive, sufficiently large, value, i.e. 1e-4, via use of gpflow transforms.
Initialise the RBF inputs/outputs better. In my experience, the current the np.random.rand initialisation can lead to very bad conditioning. One option would be to use the same initialisation as the MATLAB implementation. E.g. in the cart-pole example they have:

policy.p.inputs = gaussian(mm(poli), ss(poli,poli), nc)';  % init. location of 
                                                           % basis functions
policy.p.targets = 0.1*randn(nc, length(policy.maxU));  % init. policy targets 
                                                        % (close to zero)
policy.p.hyp = log([1 1 1 1 0.7 0.7 0.7 0.7 1 0.01]');  % initialize policy
                                                        % hyper-parameters

Notice that they also initialise the hyper-parameters, and manually set the cost and other stuff, so this might be necessary to be done by a user for the algorithm to work.

pendulum_swing_up.py example doesn't solve the task

Attached is a plot of the time domain performance of the controller after the 8 episodes. As you can see, it is not close to stabilizing about 0 or +/-2pi.

Cannot run on low-profile GPU

I tried to run the code on various platforms, including PC, laptop and embedded systems. It appeared no problem on my PC with powerful GPU - RTX 2080. However, with mediocre GPU on laptop (Quadro M1000M) and embedded system (Jetson TX2), the program constantly complained about "too many resources requested to launch". The following shows the complete error message, which was triggered in function predict_given_factorization in mgpr.py.

F tensorflow/core/kernels/determinant_op_gpu.cu.cc:137] Non-OK-status: CudaLaunchKernel( DeterminantFromPivotedLUKernel<Scalar, false>, config.block_count, config.thread_per_block, 0, device.stream(), config.virtual_thread_count, n, lu_factor.data(), pivots, nullptr, output.data()) status: Internal: too many resources requested for launch

I wonder if anyone encountered this problem before. Any help will be greatly appreciated.

calculate_factorizations question

I'm a little confused with this following line. K has dimensions (target_dim * n * n) where n is the number of training points.
line 71 python
L = tf.cholesky(K + self.noise[:, None, None]*batched_eye)
In the reference paper and implementation, they do moment matching for every target dimension.

line 50 matlab

for i=1:E % compute K and inv(K)
inp = bsxfun(@rdivide,gpmodel.inputs,exp(X(1:D,i)'));
K(:,:,i) = exp(2*X(D+1,i)-maha(inp,inp)/2);
if isfield(gpmodel,'nigp')
L = chol(K(:,:,i) + exp(2*X(D+2,i))*eye(n) + diag(gpmodel.nigp(:,i)))';
else
L = chol(K(:,:,i) + exp(2*X(D+2,i))*eye(n))';
end
iK(:,:,i) = L'\(L\eye(n));
beta(:,i) = L'\(L\gpmodel.targets(:,i));
end

Is finding the inverse of K + noise for every target dimension the same a stacking the inverse of every target dimension computed inside the for loop? I know you tested the code but I'm still confused and couldn't find any information regarding the tensorflow cholesky decomposition and solution for tensors with more than "3 dimension".

The same question applies for the rest of the moment matching implementation.

Regards

Test thoroughly against MATLAB's implementation.

This can be done in CI with Octave oct2py and pytest.

Error with cloudpickle

PILCO for time delay

Hi,
I'm thinking to use PILCO for a system with a significant time delay (called dead time in German).
Do you know any publications for Gaussian Processes with such a system?
Do you think PILCO is able to control a highly nonlinear system with a dozen of inputs and a relevant time delay?
Thanks,
iljastas

[BUG] mountain_car.py fails due to missing import

from gpflow import set_trainable
seems to be missing in examples/mountain_car.py which results in NameError on line 52

AttributeError: 'Parameter' object has no attribute 'value'

When I run mountain_car.py in the examples file, it works fine at the beginning, but after a while it will report an error. The error is

model GP1 is right
Traceback (most recent call last):
File "C:/Users/xxwan/.mujoco/mujoco-py/PILCO-master/examples/mountain_car.py", line 57, in
pilco.optimize_models()
File "c:\users\xxwan\desktop\tendontrack_pilco\pilco\models\pilco.py", line 62, in optimize_models
self.mgpr.optimize(restarts=restarts)
File "c:\users\xxwan\desktop\tendontrack_pilco\pilco\models\mgpr.py", line 89, in optimize
"lengthscales": model.kernel.lengthscales.value(),
AttributeError: 'Parameter' object has no attribute 'value'

How can I solve this problem? Thanks.

Resetting different components of PILCO: Models, controller and reward function

In many cases the user might want to re-initialise some component, while keeping the rest as they are, for example:

Restarting the model or the controller optimisation, to avoid getting trapped in local minima. We might want to restart one of the two (and keep the other intact) or restart both and make one or multiple restarts and keep the most promising version etc.
By changing the reward function (while keeping the same model) and optimising the controller we can use previous episodes to solve tasks with new goals (possible approach for gym's Reacher-v2 environment, or for transfer learning demos).

Since this might interfere with the tensorflow graph, (see https://github.com/GPflow/GPflow/issues/756 and https://github.com/GPflow/GPflow/issues/719) we might want to provide a method that takes care of it cleanly.

Computation of cross-covariance of state and action

From only looking at the docstrings of the relevant functions, I think I noticed a discrepancy to the paper. I am writing this without checking the math in the code so I may be wrong.

V returned in RbfController.compute_action() in controllers.py
corresponds to Cov[x,u]

From backtracking to MGPR.predict_given_factorizations() in models/mgpr.py, I think the docstrings indicate that:

V = cov[x,x]^{-1} @ cov[x,pi] @ cov[pi,u]

where I call pi the action before squashing

From section 5.5 of the 2015 paper, it says:

V = cov[x,pi] @ cov[pi,pi]^{-1} @ cov[pi,u]

Are these expressions equivalent or have I misread something. Thanks!

Add a more elaborate example

Potentially acrobot, according to Deisenroth's suggestion.

Fix RBF controller

Please provide example for using RBF controller.
In pilco 'init' function neither the controller nor the reward is assigned to the object if not 'None'
Thanks.

Computation time for policy optimization

I find that the computation time for policy optimization will gradually increase, and the project is terminated by the tensorflow ResourceExhaustedError.

Third output of function 'predict_given_factorizations' in mgpr.py

Hi!

I try to read the code, but I can't understand what the third output from predict_given_factorizations in mgpr.py.

It's noted as inv(s) * input-ouputcovariance, but can you give more explanation about that and why
do we need that.

Thanks!

Performance issue in the definition of create_models, pilco/controllers.py(P1)

Hello, I found a performance issue in the definition of create_models, pilco/controllers.py, tf.ones will be calculated repeatedly during program execution, resulting in reduced efficiency. I think it should be created before the loop.

The same issue exists in tf.ones in line 31 and tf.ones in line 19.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Is squash_sin() right?

Hello,
In squash_sin(), M seems E[u_max sin(pi_tilde)], not E[u_max (9/8 sin(pi_tilde) + 1/8 sin(3 pi_tilde))].

Missing \delta^2 in C = max_action * tf.diag( tf.exp(-tf.diag_part(s)/2) * tf.cos(m))

It seems C_ii = max_action_i * (E[X_i]*sin(X_i)] - E[X_i]*E[sin(X_i))])
E[X_i*sin(X_i)] = exp(-var(X_i) / 2)(var(X_i) * cos(mean(X_i)) + mean(X_i)sin(mean(X_i)))
However, there is no var(X_i) in the formula for C in the code

I think it should be C = max_action * tf.diag(tf.exp(-tf.diag_part(s) / 2) * tf.diag_part(s) * tf.cos(m))

SMGPR : the induced points are different for each model

In the implementation of calculate_factorization of the class SMGPR,
Z_0 (induced points of models[0]) is used for all the SGPR models but since the
models are optimized separately, all the sets Z_i are different.
In the original PILCO implementation all the models share the same induced points.
Could this affect the performances of SMGPR ?

I saw that GPFlow seems to be able to handle shared induced inputs.
https://gpflow.readthedocs.io/en/master/notebooks/advanced/multioutput.html

How dose it work?

From example, inverted_pendulum.py, is it just learning the dynamics of pendulum?

How can I test it if it learns well?

Extra control dimension for varying target values

Hey,
I'm a student from TUM using your PILCO implementation. I want to optimize the controller for various target values depending on an input target state. I'm planning to add the difference (of one of the states) of the target state and the momentary value as an extra control dimension.
As a result of that the model would be dependant on the states, but the controller would be dependant on the states and e.g. the difference x1_target - x1.
By setting the target value of the extra control dimension zero and putting in data with different target states I should be able to optimize the controller for different inputs.
Do you know an easy way to do this or something similar to take account for different targets like e.g. a vehicle controller where you set different curvatures to get a controller.

Thanks for the help,
Manuel

Question about MGPR.

Hi,

Is there any reference for multi input/output Gaussian Process Regression?
I'd like to understand the detail.

Thanks

Gradient based policy optimisation.

Hello,

if I understood correctly, the authors of PILCO uses a gradient based method
for optimising the policy. In the current implementation it doesn't seem to the
case, you use L-BFGS-B without giving the computation of the jacobian.

Did you make any experiments using a gradient based method ?

local variable 'model' referenced before assignment

When I run inverted_pendulum.py, this below error came up.

-----Learned models------
---Lengthscales---
GP0 GP1 GP2 GP3
0 7531.310 5412.474 42419.628 14051.520
1 8967.154 5576.802 42.032 4.307
2 6.223 18.859 143.548 8.425
3 9088.658 42.614 165237.366 178.836
4 28.919 58.716 18.404 15.033
---Variances---
GP0 GP1 GP2 GP3
0 0.027 0.595 14.387 38.815
---Noises---
GP0 GP1 GP2 GP3
0 1.000e-06 1.000e-06 1.000e-06 1.000e-06
Controller's optimization: done in 22.0 seconds with reward=34.581.
No of ops: 5245

Rollout: 1
Traceback (most recent call last):
File "inverted_pendulum.py", line 57, in
pilco.optimize_models()
File "/home/wonchul/Desktop/PILCO-master_/pilco/models/pilco.py", line 57, in optimize_models
self.mgpr.optimize(restarts=restarts)
File "/home/wonchul/Desktop/PILCO-master_/pilco/models/mgpr.py", line 51, in optimize
best_parameters = model.read_values(session=session)
UnboundLocalError: local variable 'model' referenced before assignment

How can I fix it?

Cholesky decomposition was not successful. The input might not be valid.

I am using my own gym env to test PILCO as a baseline, while this problem always occurs after about 2-3 iterations.

  File "/home/lab/Github/PILCO/examples/gym_tracking_tendon.py", line 196, in <module>
    pilco.optimize_policy()
  File "/home/lab/Github/PILCO/pilco/models/pilco.py", line 96, in optimize_policy
    try:
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/gpflow-2.0.0-py3.7.egg/gpflow/optimizers/scipy.py", line 73, in minimize
    func, initial_params, jac=True, method=method, **scipy_kwargs
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/scipy/optimize/_minimize.py", line 610, in minimize
    callback=callback, **options)
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 345, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 295, in func_and_grad
    f = fun(x, *args)
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 327, in function_wrapper
    return function(*(wrapper_args + args))
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 65, in __call__
    fg = self.fun(x, *args)
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/gpflow-2.0.0-py3.7.egg/gpflow/optimizers/scipy.py", line 95, in _eval
    loss, grad = _tf_eval(tf.convert_to_tensor(x))
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 638, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
    self.captured_inputs)
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/home/lab/anaconda3/envs/pilco/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Cholesky decomposition was not successful. The input might not be valid.
	 [[{{node while/body/_1/Cholesky}}]] [Op:__inference__tf_eval_1032837]

The main function is just mimic the inv_double_pendulum:

if __name__ == '__main__':
    env = TendonGymEnv()
    e = np.array([[1]])   # Max control input. Set too low can lead to Cholesky failures.


    X, Y, _, _ = rollout(env=env, pilco=None, random=True, timesteps=40, render=False)
    for i in range(1, 5):
        X_, Y_, _, _ = rollout(env=env, pilco=None, random=True, timesteps=40, render=False)
        X = np.vstack((X, X_))
        Y = np.vstack((Y, Y_))

    state_dim = Y.shape[1]
    control_dim = X.shape[1] - state_dim
    # controller = RbfController(state_dim=state_dim, control_dim=control_dim, num_basis_functions=10)
    controller = LinearController(state_dim=state_dim, control_dim=control_dim)

    pilco = PILCO((X, Y), controller=controller, horizon=40)
    pilco.controller.max_action = e

    # # for numerical stability
    # for model in pilco.mgpr.models:
    #     model.likelihood.variance.assign(0.001)
    #     set_trainable(model.likelihood.variance, False)
    #     model.likelihood.fixed=True

    return_lst = []
    for rollouts in range(100):
        print("**** ITERATION no.", rollouts, " ****")
        try:
            pilco.optimize_models()
        except:
            pdb.set_trace()

        pilco.optimize_policy()
        # import pdb

        # pdb.set_trace()
        X_new, Y_new, _, sum_return = rollout(env=env, pilco=pilco, timesteps=300, render=False)
        return_lst.append(sum_return)
        # Update dataset
        X = np.vstack((X, X_new))
        Y = np.vstack((Y, Y_new))
        pilco.mgpr.set_data((X, Y))

And I debug the input X and Y carefully, there is no NaN in the array. This is bothering me for a long time, so I wonder if you can give me a favor, I will appreciate that very much.

Can it be applied to 'Pendulum-v0'?? && memory problem

Hi!

I tried to apply it with Pendulum-v0 environment.
However, I don't think it worked well at all.
Could you give me some advice?

And, when I run it, there came some error b/c of memory shortage.
is there something I can manage the memory?

Implement `predict_on_noisy_input` with MCMC in gpytorch

I'm recently starting to re-implement PILCO in pytorch for better intergration with my other works. To leverage the fast prediction (KISS-GP) in gpytorch, I decided to use MCMC sampling approach to implement the core function in mgpr.py - predict_on_noisy_input which use moment matching based on the original paper.

However, the result I got from sampling is dramatically different from moment matching. I wonder if anyone can help me identify the problem. The following code shows both optimize and predict_on_noisy_input.

    def optimize(self,restarts=1, training_iter = 200):
        self.likelihood.train()
        self.model.train()

        # Use the adam optimizer
        optimizer = torch.optim.Adam([
            {'params': self.model.parameters()},  # Includes GaussianLikelihood parameters
            ], lr=self.lr)
        # "Loss" for GPs - the marginal log likelihood
        mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)
        for i in range(training_iter):
            # Zero gradients from previous iteration
            optimizer.zero_grad()
            # Output from model
            output = self.model(self.X)
             # Calc loss and backprop gradients
            loss = -mll(output, self.Y).sum()
            loss.backward()
            print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iter, loss.item()))
            optimizer.step()




    def predict_on_noisy_inputs(self, m, s, num_samps=500):
        """
        Approximate GP regression at noisy inputs via moment matching
        IN: mean (m) (row vector) and (s) variance of the state
        OUT: mean (M) (row vector), variance (S) of the action
             and inv(s)*input-ouputcovariance

        We adopt the sampling approach by leveraging the power of GPU
        """
        assert(m.shape[1] == self.num_dims and s.shape == (self.num_dims,self.num_dims))
        self.likelihood.eval()
        self.model.eval()

        if self.cuda == True:
            m = torch.tensor(m).float().cuda()
            s = torch.tensor(s).float().cuda()
            inv_s = torch.inverse(s)

        sample_model = torch.distributions.MultivariateNormal(m,s)
        pred_inputs = sample_model.sample((num_samps,)).float()
        pred_inputs[pred_inputs != pred_inputs] = 0
        pred_inputs,_ = torch.sort(pred_inputs,dim=0)
        pred_inputs = pred_inputs.reshape(num_samps,self.num_dims).repeat(self.num_outputs,1,1)

        #centralize X ?
        # self.model.set_train_data(self.centralized_input(m),self.Y)
        with torch.no_grad(), gpytorch.settings.fast_pred_var():
            pred_outputs = self.model(pred_inputs)




        #Calculate mean, variance and inv(s)* input-output covariance
        M = torch.mean(pred_outputs.mean,1)[None,:]
        V_ = torch.cat((pred_inputs[0].t(),pred_outputs.mean),0)
        fact = 1.0 / (V_.size(1) - 1)
        V_ -= torch.mean(V_, dim=1, keepdim=True)
        V_t = V_.t()  # if complex: mt = m.t().conj()
        covs =  fact * V_.matmul(V_t).squeeze()
        V = covs[0:self.num_dims,self.num_dims:]
        V = inv_s @ V
        S = covs[self.num_dims:,self.num_dims:]


        return M, S, V

Use Matlab's implementation to train OpenAI gym tasks

This will allow easier debugging and comparison between the two implementations.

Matrix contain inf or nan

hello,

when I use mgpr to predict next observation, sometimes “matrix contains infs or nans" error occurs. I find it first appear in mgpr.py predict_given_factorizations, with code c = self.variance / tf.sqrt(tf.linalg.det(B)). It turns out that tf.linalg.det(B) is negative. What should I do about this ?

Thanks

Make trigonometric augmentation user friendly

Matlab's implementation augments the variables that represents angles to two new states that are simply their cos and sin.

This is important for the performance of the algorithm and is possible to be done in the current implementation by simply passing an augmented dataset to the PILCO object.

Investigate how to do this in a more user friendly way.

What is the V for in the predict_given_factorizations

Hey there,

I would like to ask what is the V value returned from the "predict_given_factorizations". It seems like the next mean multiplied by something. But I am not sure what it means.

Would you please enlighten me.

Thanks.
Regards.
TIng

Implement squashing function

Support upper/lower limits to the controller's output via the use of a squashing sinusoidal function, similarly to the original MATLAB implementation.

How do you save your trained model?

I've tried using pickle but it's unable to serialize the PILCO object after training.

Is there a builtin save method? Or does another kind of serialization work? Thanks.

NotImplementedError: Cannot convert a symbolic (graph mode) `DeferredTensor` to a numpy array.

Hello,
I followed the instructions given in readme file and installed all the modules and libraries from requirements.txt. But the model runs for first iteration and it gives following error.(that is while optimisation, pilco.optimize_policy())

NotImplementedError: Cannot convert a symbolic (graph mode) DeferredTensor to a numpy array.

Investigate performance of tf.linalg.det in predict_given_factorizations

See this comment

Could you please share exact version of some dependency packages

Hello，

I have installed dependency packages required in requirement.txt, but code can't run well under my environment.

when I run the example safe_swimmer_run.py, I meet the errorOperatorNotAllowedInGraphError: iterating over tf.Tensor is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature. I guess that may be caused by wrong version of some packages.

my version of some main package is
tensorflow 2.4.1
gpflow 2.1.4
gym 0.18.0
mujoco-py 1.50.1.0
numpy 1.19.2

and my python is 3.7

thanks in advance!

outputs from mgpr.predict_on_noisy_inputs and pilco.propagate confusion

Hi,

Thanks for some great code. I'm trying to read through it and make descriptive comments since theres a lot of dense code and several layers of wrappers of abstractions to TF :)

My question is to try to nail down the behavior of mgpr.predict_on_noisy_inputs (namely mgpr.predict_given_factorizations)

If I reference PILCO.py's propagate function i see the return is a delta value. (since the immediate next line to predict_on_noisy_inputs is
M_x = M_dx + m_x

However, its not clear to me where this delta is performed or predicted. From the description of mgpr.predict_given_factorizations, it says to return a mean and variance of X, and I also don't really see the line where it might only returning a "change" in X. It makes sense to me that only X is returned, since both the system model and the controller seem to be of the form x(t+1)=f(x(t),u(t)) and u(t)=g(x(t)) respectively.

Thanks!

How to derive the closed form for M, S, and C in squash_sin(m, s, max_action=None)

Hi guys,
I'm trying to analytically derive the closed forms for M, S and C used in squash_sin function. Deisenroth's thesis only points to some integrations in Appendix A.1 but I don't how he came up with the form used in the code.
Do you happen to know how to derive the forms? Any pointers would be much appreciated.

Thanks.

installation: issue with gast, tensorflow

Starting up in a new virtualenv with Python 3.7.5 (arch linux)

(had to run sudo otherwise setup.py gives a permission denied)

Get this as the final error: error: gast 0.2.2 is installed but gast>=0.3.2 is required by {'tensorflow-probability'}

It seems there might be a dependency here that's underspecified (maybe tensorflow version is too high)

Full log:

[env]em@toaster:~/school/cpsc515/proj|master⚡
⇒  sudo python3.7 PILCO/setup.py install
running install
running bdist_egg
running egg_info
writing pilco.egg-info/PKG-INFO
writing dependency_links to pilco.egg-info/dependency_links.txt
writing requirements to pilco.egg-info/requires.txt
writing top-level names to pilco.egg-info/top_level.txt
reading manifest file 'pilco.egg-info/SOURCES.txt'
writing manifest file 'pilco.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
warning: install_lib: 'build/lib' does not exist -- no Python modules to install

creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying pilco.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pilco.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pilco.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pilco.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pilco.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/pilco-0.1-py3.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing pilco-0.1-py3.7.egg
Removing /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/pilco-0.1-py3.7.egg
Copying pilco-0.1-py3.7.egg to /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages
pilco 0.1 is already the active version in easy-install.pth

Installed /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/pilco-0.1-py3.7.egg
Processing dependencies for pilco==0.1
Searching for tensorflow
Reading https://pypi.org/simple/tensorflow/
Downloading https://files.pythonhosted.org/packages/57/bb/e690554331d46e35e47032ff0f7a231061cd71e82e3616dbd42ba1be9474/tensorflow-2.4.0rc0-cp37-cp37m-manylinux2010_x86_64.whl#sha256=8785ee37a48014273ddcb32034569059c83c144b15e6c34e485ef5b001ef373e
Best match: tensorflow 2.4.0rc0
Processing tensorflow-2.4.0rc0-cp37-cp37m-manylinux2010_x86_64.whl
Installing tensorflow-2.4.0rc0-cp37-cp37m-manylinux2010_x86_64.whl to /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages
Adding tensorflow 2.4.0rc0 to easy-install.pth file
Installing estimator_ckpt_converter script to /home/em/school/cpsc515/proj/env/bin
Installing import_pb_to_tensorboard script to /home/em/school/cpsc515/proj/env/bin
Installing saved_model_cli script to /home/em/school/cpsc515/proj/env/bin
Installing tensorboard script to /home/em/school/cpsc515/proj/env/bin
Installing tf_upgrade_v2 script to /home/em/school/cpsc515/proj/env/bin
Installing tflite_convert script to /home/em/school/cpsc515/proj/env/bin
Installing toco script to /home/em/school/cpsc515/proj/env/bin
Installing toco_from_protos script to /home/em/school/cpsc515/proj/env/bin

Installed /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/tensorflow-2.4.0rc0-py3.7-linux-x86_64.egg
Searching for tabulate
Reading https://pypi.org/simple/tabulate/
Downloading https://files.pythonhosted.org/packages/c4/f4/770ae9385990f5a19a91431163d262182d3203662ea2b5739d0fcfc080f1/tabulate-0.8.7-py3-none-any.whl#sha256=ac64cb76d53b1231d364babcd72abbb16855adac7de6665122f97b593f1eb2ba
Best match: tabulate 0.8.7
Processing tabulate-0.8.7-py3-none-any.whl
Installing tabulate-0.8.7-py3-none-any.whl to /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages
Adding tabulate 0.8.7 to easy-install.pth file
Installing tabulate script to /home/em/school/cpsc515/proj/env/bin

Installed /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/tabulate-0.8.7-py3.7.egg
Searching for scipy>=0.18.0
Reading https://pypi.org/simple/scipy/
Downloading https://files.pythonhosted.org/packages/fa/cf/94686c3e2b21cba82904a2bbb014f7529d483021802a0116c3a256b00563/scipy-1.5.3-cp37-cp37m-manylinux1_x86_64.whl#sha256=aebb69bcdec209d874fc4b0c7ac36f509d50418a431c1422465fa34c2c0143ea
Best match: scipy 1.5.3
Processing scipy-1.5.3-cp37-cp37m-manylinux1_x86_64.whl
Installing scipy-1.5.3-cp37-cp37m-manylinux1_x86_64.whl to /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages
Adding scipy 1.5.3 to easy-install.pth file

Installed /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/scipy-1.5.3-py3.7-linux-x86_64.egg
Searching for multipledispatch>=0.6
Reading https://pypi.org/simple/multipledispatch/
Downloading https://files.pythonhosted.org/packages/89/79/429ecef45fd5e4504f7474d4c3c3c4668c267be3370e4c2fd33e61506833/multipledispatch-0.6.0-py3-none-any.whl#sha256=a55c512128fb3f7c2efd2533f2550accb93c35f1045242ef74645fc92a2c3cba
Best match: multipledispatch 0.6.0
Processing multipledispatch-0.6.0-py3-none-any.whl
Installing multipledispatch-0.6.0-py3-none-any.whl to /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages
Adding multipledispatch 0.6.0 to easy-install.pth file

Installed /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/multipledispatch-0.6.0-py3.7.egg
Searching for gast<0.3,>=0.2.2
Reading https://pypi.org/simple/gast/
Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz#sha256=fe939df4583692f0512161ec1c880e0a10e71e6a232da045ab8edd3756fbadf0
Best match: gast 0.2.2
Processing gast-0.2.2.tar.gz
Writing /tmp/easy_install-nvkje8bv/gast-0.2.2/setup.cfg
Running gast-0.2.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-nvkje8bv/gast-0.2.2/egg-dist-tmp-zy8nd5d1
zip_safe flag not set; analyzing archive contents...
Moving gast-0.2.2-py3.7.egg to /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages
Adding gast 0.2.2 to easy-install.pth file

Installed /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/gast-0.2.2-py3.7.egg
Searching for dataclasses
Reading https://pypi.org/simple/dataclasses/
Downloading https://files.pythonhosted.org/packages/e1/d2/6f02df2616fd4016075f60157c7a0452b38d8f7938ae94343911e0fb0b09/dataclasses-0.7-py3-none-any.whl#sha256=3459118f7ede7c8bea0fe795bff7c6c2ce287d01dd226202f7c9ebc0610a7836
Best match: dataclasses 0.7
Processing dataclasses-0.7-py3-none-any.whl
Installing dataclasses-0.7-py3-none-any.whl to /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages
Adding dataclasses 0.7 to easy-install.pth file

Installed /home/em/school/cpsc515/proj/env/lib/python3.7/site-packages/dataclasses-0.7-py3.7.egg
error: gast 0.2.2 is installed but gast>=0.3.2 is required by {'tensorflow-probability'}