cbfinn / gps Goto Github PK

View Code? Open in Web Editor NEW

593.0 47.0 239.0 40.79 MB

Guided Policy Search

Home Page: http://rll.berkeley.edu/gps/

License: Other

Shell 0.08% Python 80.17% CMake 0.62% C++ 18.86% Protocol Buffer 0.28%

robotics reinforcement-learning reinforcement-learning-algorithms deep-learning deep-reinforcement-learning

gps's People

Contributors

Stargazers

Watchers

Forkers

bstadie floodsung mathn josh-tobin wangyx0055 emilyscharff kelvinxu pgypch amoliu bmagyar safrooze matlab379 turinglife marcino239 cangjiaxaun etragas bpereira878 wkentaro manujnaman wmontgomery4 synpon cdevin emigmo ivehui yashim caomw fantaosha gandalfvn avivt tianheyu927 arasharchor sunbibei tveek kensun0 jon--lee dongleecsu mjm522 chilimangoes jlmjlm tigerneil ericjang mpflueger zeyuan1987 warsmoke mazhengmac symbiorobotics neroam ml-lab craigchen ashwinreddy rezama yif0 thobotics vyraun avisingh599 iretiayo lidejunnavinfo rickppd liubo-cs babooppa6 sjtuzhanglj longwoo ahundt smail1229 solertis sergeant-wizard animesh-garg achukka sunbeachsea giteverything pyni facetohard keniuniu barzinm m-j-mcdonald chpyang0229 xiaogengyaokeyan qianwangthu davidadley flyers 4skynet heanylab philjd kde424 riashat vdpappu rvrobotics hal2001 zzz622848 richardkelley service-lab foolyc lnj0532 shbz80 ja1r0 arc-2017 vbillys ikol1729 tonnyyan nsokhand

gps's Issues

ImportError: No module named Box2D (I had installed pybox2d)

Hi Chelsea,
Thank you for your awesome work about GPS. I followed the instructions to install the GPS algorithm from http://rll.berkeley.edu/gps/, but got an import error: No module named Box2D when I ran the code
python python/gps/gps_main.py box2d_pointmass_example

This is how I set up Pybox2D( the 2nd step is different, since the url:http://pybox2d.googlecode.com/svn/trunk/ was not found):

sudo apt-get install build-essential python-dev swig python-pygame subversion
git clone https://github.com/pybox2d/pybox2d
python setup.py build sudo python setup.py install
So I thought maybe I had installed pybox2d correctly. But there was always an error :ImportError: No module named Box2D".

Could you please give me any advice about this issue? Thank you.

Dong Li

mjcpy.so returns error when run "python python/gps/gps_main.py mjc_example"

Hi,
I need a help to run the mujoco example of your GPS,
I can run box2d related examples, but stuck at the mujoco ones.

the error is:

ImportError: /home/baek/gps/build/lib/mjcpy.so: undefined symbol: _ZN3osg8Geometry16setTexCoordArrayEjPNS_5ArrayE

I followed exactly what's said at the https://github.com/cbfinn/gps site.
it looks like the OSG_LIBRARIES is somehow wrong.
In two of the OSG_LIBRARIES,

/usr/lib/x86_64-linux-gnu/libosg.so;
/usr/lib/x86_64-linux-gnu/libosgViewer.so;

I found

_ZN3osg8Geometry16setTexCoordArrayEjPNS_5ArrayENS1_7BindingE

by the command

objdump -tT /usr/lib/x86_64-linux-gnu/libosg.so |grep _ZN3osg8Geometry16setTexCoordArrayEjPNS_5ArrayE

which looks similar to the missing symbol above, but the last "NS1_7BindingE" part is additionally there.

How should I resolve this issue?
Any help, please?

Crash in the second iteration

Hi Finn,
Thank you for your excellent work and it is really an excited innovation.
And all the demos can work well except the last one. When running "python python/gps/gps_main.py pr2_badmm_example"

it reports errors like this:

I0430 02:03:56.217406   978 solver.cpp:408]     Test net output #5: InnerProduct3 = 0
I0430 02:03:56.217413   978 solver.cpp:408]     Test net output #6: InnerProduct3 = 0
Exception in thread Thread-13:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "python/gps/gps_main.py", line 366, in <lambda>
    target=lambda: gps.run(itr_load=resume_training_itr)
  File "python/gps/gps_main.py", line 69, in run
    self._log_data(itr, traj_sample_lists, pol_sample_lists)
  File "python/gps/gps_main.py", line 240, in _log_data
    copy.copy(self.algorithm)
  File "python/gps/utility/data_logger.py", line 25, in pickle
    pickle.dump(data, open(filename, 'wb'))
  File "/usr/lib/python2.7/copy_reg.py", line 84, in _reduce_ex
    dict = getstate()
  File "python/gps/algorithm/policy_opt/policy_opt_caffe.py", line 233, in __getstate__
    self.solver.snapshot()
AttributeError: 'AdamSolver' object has no attribute 'snapshot'

and also when I run "python python/gps/gps_main.py pr2_example
"
it reports the following errors sometimes

LinAlgError: 2-th leading minor not positive definite ... 
raise LinAlgError("%d-th leading minor not positive definite" % info) LinAlgError: 2-th leading minor not positive definite

Do you have any idea about these two problems? Looking forward to your answers. Thank you very much.

GPS in humanoid running exp(cost function derivativate)

Has anyone tried the humanoid walking or running experiments?

if the state vector is the joint angle and joint velocity, how can i get the derivativate dp/dq(where p is the height above the ground of the pelvis)?

Other than the robot arm, the end position is the map function of the joint angle f(q), as for humanoid walking or running

the pelvis is not the function of the joint angle(when the support foot is not full contact with the ground or the robot is just falling down)
the joint angle of legs is not freedom in double support phase(when two feet are full contacted).

how to deal with this problem?

Errors compiling with mujoco pro 150

MuJoCo bindings fail

Box2d examples from your repository work fine and MuJoCo is also working on my machine using open-ai bindings. I was trying to build your bindings and ran into the following cmake-related issue:

~/gps/build$ cmake ../src/3rdparty
-- The C compiler identification is GNU 4.9.3
-- The CXX compiler identification is GNU 4.9.3
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Boost version: 1.58.0
-- Found the following Boost libraries:
--   python
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython2.7.so (found suitable version "2.7.12", minimum required is "2.7") 
CMake Error at /usr/share/cmake-3.5/Modules/FindPythonLibs.cmake:64 (get_filename_component):
  get_filename_component called with incorrect number of arguments
Call Stack (most recent call first):
  Boost.NumPy/CMakeLists.txt:14 (find_package)


-- Found PythonInterp: python (found version "2.7.12") 
-- Found NumPy: version "1.12.0" /home/henryk/.local/lib/python2.7/site-packages/numpy/core/include
-- Boost version: 1.58.0
-- Found the following Boost libraries:
--   python
-- found boost:
INCLUDE: /usr/include
LIB: /usr/lib/x86_64-linux-gnu/libboost_python.so
-- Found osg: /usr/lib/x86_64-linux-gnu/libosg.so  
-- Found osgViewer: /usr/lib/x86_64-linux-gnu/libosgViewer.so  
-- Found OpenThreads: /usr/lib/x86_64-linux-gnu/libOpenThreads.so  
-- Found osgGA: /usr/lib/x86_64-linux-gnu/libosgGA.so  
osg includes: /usr/include
-- Configuring incomplete, errors occurred!
See also "/home/henryk/gps/build/CMakeFiles/CMakeOutput.log".

I would be very grateful if you can help with this error. This is a generic installation of Ubuntu 16.04 with cmake 3.5.1. I also tried another equally generic Ubuntu 16.04 box with MuJoCo installed and I have got the same error.

ValueError

Hi, I encountered the following error when I run the guided policy search algorithm:

ValueError: Failed to find PD solution even for very large eta (check that dynamics and cost are reasonably well conditioned)!

Is there any solution?

P.S.
The above error occurred the newest version of GPS.
A few months ago, I run the same algorithm altering iterations 12 to 30 in hyperparams.py:

algorithm = {
・・・
#'iterations': 12,
'iterations': 30,
・・・
}

Then, the trajectory of the ILQG (Trajectory Samples) became very different from that of the Neural Networks (Policy Samples). At this time, I encountered the following warning:
"Final KL divergence after DGD convergence is too high."

Compiling in Mac with Mujoco

I was successfully able to run ./compile_proto.sh, It did created all necessary directories.

After to install the integration with MuJoCo 1.31, I have kept the mjpro directory inside 3rd party and put the key inside it: This is my log file.
CMakeOutput .txt

, I tried the following and gets following log for cmake and make -j.

$ cmake ../src/3rdparty
-- The C compiler identification is AppleClang 7.3.0.7030031
-- The CXX compiler identification is AppleClang 7.3.0.7030031
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Boost version: 1.60.0
-- Found the following Boost libraries:
--   python
-- Found PythonLibs: /usr/lib/libpython2.7.dylib (found suitable version "2.7.10", minimum required is "2.7") 
CMake Error at /usr/local/Cellar/cmake/3.5.2/share/cmake/Modules/FindPythonLibs.cmake:64 (get_filename_component):
  get_filename_component called with incorrect number of arguments
Call Stack (most recent call first):
  Boost.NumPy/CMakeLists.txt:14 (find_package)


-- Found PythonInterp: python (found version "2.7.11") 
-- Found NumPy: version "1.11.0" /usr/local/lib/python2.7/site-packages/numpy/core/include
-- Boost version: 1.60.0
-- Found the following Boost libraries:
--   python
-- found boost:
INCLUDE: /usr/local/include
LIB: /usr/local/lib/libboost_python-mt.dylib
osg includes: 
-- Configuring incomplete, errors occurred!
See also "/Users/michaelmathew/gps/build/CMakeFiles/CMakeOutput.log".
Michaels-MacBook-Pro:build michaelmathew$ make -j
make: *** No targets specified and no makefile found.  Stop.

Is there anything else that needs to done while installing on mac? I am using OSX El Capitan

Resource not found: gazebo_worlds

Hi,
I finally managed to install gps together with ROS. (I think newer versions then Jade don't work, because there some PR2_Packages missing). And Jade is not running on the newest Ubuntu Versions, so be careful.

I'm Having Ubuntu 14.04 with ROS Jade.

I'm now trying to start the simulated pr2. But I get the error. That ResourceNotFound: gazeob_worlds. I tried, like you explained to edit the pr2_gazebo_no_controller.launch and comment in this line:

<include file="$(find gazebo_ros)/launch/empty_world.launch" />

But still the same error. Any ideas?

Thanks a lot!

how can i use GPS to deal with propblems that the target point is changging

Can i use GPS to train a model that can fit to different target point?

Where is gps.proto.gps_pb2??

I can not find the gps.proto.gps_pb2. where is it ?

Is there any docker image?

I'm struggling to install some of the dependencies

Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python

Steps to reproduce the error:

Clone this repo;
run python python/gps/gps_main.py box2d_pointmass_pigps_example
get the following error:

python/gps/gui/textbox.py:64: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
  self._ax.set_axis_bgcolor(ColorConverter().to_rgba(color, alpha))
python/gps/gui/textbox.py:68: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  color, alpha = self._ax.get_axis_bgcolor(), self._ax.get_alpha()
python/gps/gui/textbox.py:69: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
  self._ax.set_axis_bgcolor(mpl.rcParams['figure.facecolor'])
python/gps/gui/textbox.py:71: MatplotlibDeprecationWarning: The set_axis_bgcolor function was deprecated in version 2.0. Use set_facecolor instead.
  self._ax.set_axis_bgcolor(ColorConverter().to_rgba(color, alpha))
DEBUG:tm._add: /camera/rgb/image_color, sensor_msgs/Image, sub
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1214 19:30:53.622043 21186 solver.cpp:44] Initializing solver from parameters: 
test_iter: 1
test_iter: 1
test_interval: 1000000
base_lr: 0.001
display: 0
lr_policy: "fixed"
momentum: 0.9
weight_decay: 0.005
snapshot_prefix: "python/../experiments/box2d_pointmass_pigps_example/policy"
random_seed: 1
train_net_param {
  layer {
    name: "Python1"
    type: "Python"
    top: "Python1"
    top: "Python2"
    top: "Python3"
    python_param {
      module: "policy_layers"
      layer: "PolicyDataLayer"
      param_str: "{\"shape\": [{\"dim\": [25, 6]}, {\"dim\": [25, 2]}, {\"dim\": [25, 2, 2]}]}"
    }
  }
  layer {
    name: "InnerProduct1"
    type: "InnerProduct"
    bottom: "Python1"
    top: "InnerProduct1"
    inner_product_param {
      num_output: 20
      weight_filler {
        type: "gaussian"
        std: 0.01
      }
      bias_filler {
        type: "constant"
        value: 0
      }
    }
  }
  layer {
    name: "ReLU1"
    type: "ReLU"
    bottom: "InnerProduct1"
    top: "InnerProduct1"
  }
  layer {
    name: "InnerProduct2"
    type: "InnerProduct"
    bottom: "InnerProduct1"
    top: "InnerProduct2"
    inner_product_param {
      num_output: 2
      weight_filler {
        type: "gaussian"
        std: 0.01
      }
      bias_filler {
        type: "constant"
        value: 0
      }
    }
  }
  layer {
    name: "Python4"
    type: "Python"
    bottom: "InnerProduct2"
    bottom: "Python2"
    bottom: "Python3"
    top: "Python4"
    loss_weight: 1
    python_param {
      module: "policy_layers"
      layer: "WeightedEuclideanLoss"
    }
  }
}
test_net_param {
  layer {
    name: "Python1"
    type: "Python"
    top: "Python1"
    python_param {
      module: "policy_layers"
      layer: "PolicyDataLayer"
      param_str: "{\"shape\": [{\"dim\": [1, 6]}]}"
    }
  }
  layer {
    name: "InnerProduct1"
    type: "InnerProduct"
    bottom: "Python1"
    top: "InnerProduct1"
    inner_product_param {
      num_output: 20
      weight_filler {
        type: "gaussian"
        std: 0.01
      }
      bias_filler {
        type: "constant"
        value: 0
      }
    }
  }
  layer {
    name: "ReLU1"
    type: "ReLU"
    bottom: "InnerProduct1"
    top: "InnerProduct1"
  }
  layer {
    name: "InnerProduct2"
    type: "InnerProduct"
    bottom: "InnerProduct1"
    top: "InnerProduct2"
    inner_product_param {
      num_output: 2
      weight_filler {
        type: "gaussian"
        std: 0.01
      }
      bias_filler {
        type: "constant"
        value: 0
      }
    }
  }
}
test_net_param {
  layer {
    name: "DummyData1"
    type: "DummyData"
    top: "DummyData1"
    dummy_data_param {
      shape {
        dim: 1
        dim: 6
      }
    }
  }
  layer {
    name: "InnerProduct1"
    type: "InnerProduct"
    bottom: "DummyData1"
    top: "InnerProduct1"
    inner_product_param {
      num_output: 20
      weight_filler {
        type: "gaussian"
        std: 0.01
      }
      bias_filler {
        type: "constant"
        value: 0
      }
    }
  }
  layer {
    name: "ReLU1"
    type: "ReLU"
    bottom: "InnerProduct1"
    top: "InnerProduct1"
  }
  layer {
    name: "InnerProduct2"
    type: "InnerProduct"
    bottom: "InnerProduct1"
    top: "InnerProduct2"
    inner_product_param {
      num_output: 2
      weight_filler {
        type: "gaussian"
        std: 0.01
      }
      bias_filler {
        type: "constant"
        value: 0
      }
    }
  }
}
type: "Adam"
I1214 19:30:53.622179 21186 solver.cpp:73] Creating training net specified in train_net_param.
I1214 19:30:53.622264 21186 net.cpp:51] Initializing net from parameters: 
state {
  phase: TRAIN
}
layer {
  name: "Python1"
  type: "Python"
  top: "Python1"
  top: "Python2"
  top: "Python3"
  python_param {
    module: "policy_layers"
    layer: "PolicyDataLayer"
    param_str: "{\"shape\": [{\"dim\": [25, 6]}, {\"dim\": [25, 2]}, {\"dim\": [25, 2, 2]}]}"
  }
}
layer {
  name: "InnerProduct1"
  type: "InnerProduct"
  bottom: "Python1"
  top: "InnerProduct1"
  inner_product_param {
    num_output: 20
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "ReLU1"
  type: "ReLU"
  bottom: "InnerProduct1"
  top: "InnerProduct1"
}
layer {
  name: "InnerProduct2"
  type: "InnerProduct"
  bottom: "InnerProduct1"
  top: "InnerProduct2"
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "Python4"
  type: "Python"
  bottom: "InnerProduct2"
  bottom: "Python2"
  bottom: "Python3"
  top: "Python4"
  loss_weight: 1
  python_param {
    module: "policy_layers"
    layer: "WeightedEuclideanLoss"
  }
}
I1214 19:30:53.622318 21186 layer_factory.hpp:77] Creating layer Python1
F1214 19:30:53.622359 21186 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Parameter, Pooling, Power, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
Aborted (core dumped)

The system used was Ubuntu 16.04 x64, with a pycaffe compiled from source.

'module' object has no attribute 'RAND_LIMIT'

Currently, running python python/gps/gps_main.py box2d_arm_example
leads to the following error:

DEBUG:No gps_agent_pkg: gps_agent_pkg
ROS path [0]=/opt/ros/lunar/share/ros
ROS path [1]=/opt/ros/lunar/share
Traceback (most recent call last):
  File "python/gps/gps_main.py", line 414, in <module>
    main()
  File "python/gps/gps_main.py", line 348, in main
    hyperparams = imp.load_source('hyperparams', hyperparams_file)
  File "/home/gabriel/gps/experiments/box2d_arm_example/hyperparams.py", line 10, in <module>
    from gps.agent.box2d.arm_world import ArmWorld
  File "python/gps/agent/box2d/arm_world.py", line 2, in <module>
    import Box2D as b2
  File "/usr/local/lib/python2.7/dist-packages/Box2D/__init__.py", line 20, in <module>
    from .Box2D import *
  File "/usr/local/lib/python2.7/dist-packages/Box2D/Box2D.py", line 434, in <module>
    RAND_LIMIT = _Box2D.RAND_LIMIT
AttributeError: 'module' object has no attribute 'RAND_LIMIT'
INFO:signal_shutdown [atexit]

This is probably a problem with my installation of box2d; still, please take a look.

pilqr: does it need installation of ROS to run door opening task?

Hi,

Thanks for sharing code, at here I see PR2 configuration, while at PILQR paper, this task is said to be a mujoco simulation task. I am a little bit confused by this, could anyone help me clear this?

Setup camera for pr2 robot and implementation for my own robot

Hi all,
Right now I am trying to set up camera for pr2 robot and transfer pr2's work to my own robot.
Since I am not familiar with mujoco, I am not sure whether I should follow method of Mujoco to set up camera, data type of observation and may be a lot of other things.
Is there any documentation of setup camera for pr2 or hyperparam file I can follow? Thanks in advance for any help.

got the "LinAlgError" while run "python python/gps/gps_main.py box2d_badmm_example"

Hi,

First, thank a lot for your excellent work.

I tried to run "python python/gps/gps_main.py box2d_badmm_example". Got the "LinAlgError".
BTW, no problem for me to run "python python/gps/gps_main.py box2d_arm_example"

The output log:

$ python python/gps/gps_main.py box2d_badmm_example
<...SNIP...>
I0615 17:57:23.078760 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:23.219300 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:24.172405 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:24.303740 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:28.371433 25761 solver.cpp:337] Iteration 0, Testing net (#0)
I0615 17:57:28.371523 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:28.371626 25761 solver.cpp:404] Test net output #0: InnerProduct3 = -0.000267899
I0615 17:57:28.371691 25761 solver.cpp:404] Test net output #1: InnerProduct3 = -8.7297e-05
I0615 17:57:28.371724 25761 solver.cpp:337] Iteration 0, Testing net (#1)
I0615 17:57:28.371752 25761 net.cpp:684] Ignoring source layer Python1
I0615 17:57:28.371785 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:28.371830 25761 solver.cpp:404] Test net output #0: InnerProduct3 = 0
I0615 17:57:28.371863 25761 solver.cpp:404] Test net output #1: InnerProduct3 = 0
I0615 17:57:36.898833 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:36.901408 25761 net.cpp:684] Ignoring source layer Python4
I0615 17:57:37.029466 25761 net.cpp:684] Ignoring source layer Python4
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "python/gps/gps_main.py", line 387, in
target=lambda: gps.run(itr_load=resume_training_itr)
File "python/gps/gps_main.py", line 85, in run
self._take_iteration(itr, traj_sample_lists)
File "python/gps/gps_main.py", line 213, in _take_iteration
self.algorithm.iteration(sample_lists)
File "python/gps/algorithm/algorithm_badmm.py", line 59, in iteration
self._update_policy_fit(m) # Update policy priors.
File "python/gps/algorithm/algorithm_badmm.py", line 180, in _update_policy_fit
SampleList(self.cur[m].pol_info.policy_samples)
File "python/gps/algorithm/policy/policy_prior_gmm.py", line 83, in update
self.gmm.update(XU, K)
File "python/gps/utility/gmm.py", line 175, in update
logobs = self.estep(data)
File "python/gps/utility/gmm.py", line 76, in estep
check_finite=False)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 81, in cholesky
check_finite=check_finite)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 30, in _cholesky
raise LinAlgError("%d-th leading minor not positive definite" % info)
LinAlgError: 1-th leading minor not positive definite

Thanks,
Peter

Segmentation fault: 11

python python/gps/gps_main.py box2d_pointmass_example

when i run this, it occurs

Segmentation fault: 11

version:
Python 2.7.13
macOS High Sierra 10.13.3

Does anyone meet this problem?

multi_modal_network return values not compatible with policy_opt_tf

In policy_opt_tf, around line 70:

tf_map_generator = self._hyperparams['network_model']
tf_map, fc_vars, last_conv_vars = tf_map_generator(dim_input=self._dO, dim_output=self._dU,
    batch_size=self.batch_size, network_config=self._hyperparams['network_params'])

return values are expected in the form as multi_modal_network_fp gives them: (tf_model_example.py, line 268)

return nnet, fc_vars, last_conv_vars

However, the multi_modal_network in that same file uses a different expression: (line 166)
return TfMap.init_from_lists([nn_input, action, precision], [fc_output], [loss])

which makes policy_opt_tf raise the following error:

*** TypeError: iteration over non-sequence

Running Tensorflow 0.8.0 on Ubuntu 16.04

Convolutional Neural Networks

I look forward to the new version that includes support for images and convolutional networks. When do you think that will be?

Compile Error in mujoco setup

Hi,
I have compile error to setup mujoco as below,

error: ‘mjModel {aka struct _mjModel}’ has no member named ‘sensor_scale’
out["sensor_scale"] = toNdarray2<mjtNum>(m_model->sensor_scale, m_model->nsensor, 1);

error: ‘mjData {aka struct _mjData}’ has no member named ‘maxstackuse’
_cadihk(d, "sensor_scale", m_model->sensor_scale);

some thing like this.
auto gen file has some version issue I guess.

I would appreciate if somebody help me :)

Trajectory optimization not stable

Hi there,

Thanks for your excellent code. I am running your code using my own Mujoco model to do peg hole insertion with algorithm_traj_opt only, (No neural net yet). It seems the first 15 iterations is okay and the trajectory is converging.
However, things suddenly become worse after that. The Laplace estimation of the improvement produces a very large value, so the new eta grows very fast. Then the program crushes since Non-PD error happens.

I checked the iLQR paper. It seems there is no Laplace estimation. And the Qtt (combination of Qxx, Qxu, Quu) has a very different form with the equation you wrote in traj_opt_lqr_python.py. The iLQR paper I read is this: https://homes.cs.washington.edu/~todorov/papers/TassaIROS12.pdf

Can you let me know the paper of Laplace estimation implementation and the implementation paper of the iLQR you referred? Appreciate it!

some problem about create more MuJoCo worlds.

In the file agent/agent_mjc.py, when I tried to create more different worlds in function _setup_world(self, filename), it threw some error like this:

��[��[V: not found sh: 2: [ : not found ��[��[V: not found sh: 2: [ : not found ��[��[V: not found sh: 2: Syntax error: "(" unexpected ��[��[V: not found sh: 2: Syntax error: "(" unexpected ERROR: Invalid activation key

I am sure that my MuJoCo key is right.
Does anyone meet this problem?

ROS Problem

could you tell me your ros version?

Boost.Python.ArgumentError: Python argument types did not match C++ signature

Can you help with the following boost-related error?

$ python python/gps/gps_main.py mjc_example
DEBUG:No ROS enabled: No module named rospkg
Traceback (most recent call last):
  File "python/gps/gps_main.py", line 410, in <module>
    main()
  File "python/gps/gps_main.py", line 395, in main
    gps = GPSMain(hyperparams.config, args.quit)
  File "python/gps/gps_main.py", line 46, in __init__
    self.agent = config['agent']['type'](config['agent'])
  File "python/gps/agent/mjc/agent_mjc.py", line 30, in __init__
    self._setup_world(hyperparams['filename'])
  File "python/gps/agent/mjc/agent_mjc.py", line 54, in _setup_world
    world = mjcpy.MJCWorld(filename)
Boost.Python.ArgumentError: Python argument types in
    MJCWorld.__init__(MJCWorld, str)
did not match C++ signature:
    __init__(_object*, std::string)

My machine is a generic Ubuntu 16.04 box with Boost 1.58, MuJoCo 131.

Problem in using MuJoCo131 trial license

Hi, I try to use MuJoCo131 trial license with GPS, but on running the mjc_example experiment, it will output:

ERROR: Invalid activation key

The trial license works well with MuJoCo's own simulate program.

Does anyone know the reason? Thanks!

I got error on mjc_peg_images

I am using TF v0.12 and got following error when I try to run mjc_peg_images.
Also I tried with TFv0.11 same error.
Which version of TF works ?

DEBUG:No gps_agent_pkg: gps_agent_pkg
ROS path [0]=/opt/ros/kinetic/share/ros
ROS path [1]=/opt/ros/kinetic/share
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "python/gps/gps_main.py", line 410, in
main()
File "python/gps/gps_main.py", line 395, in main
gps = GPSMain(hyperparams.config, args.quit)
File "python/gps/gps_main.py", line 51, in init
self.algorithm = config['algorithm']'type'
File "python/gps/algorithm/algorithm_badmm.py", line 34, in init
self._hyperparams['policy_opt'], self.dO, self.dU
File "python/gps/algorithm/policy_opt/policy_opt_tf.py", line 47, in init
self.init_network()
File "python/gps/algorithm/policy_opt/policy_opt_tf.py", line 71, in init_network
network_config=self._hyperparams['network_params'])
File "python/gps/algorithm/policy_opt/tf_model_example.py", line 147, in multi_modal_network
'bc2': init_bias([num_filters[1]]),
File "python/gps/algorithm/policy_opt/tf_model_example.py", line 13, in init_bias
return tf.get_variable(name, initializer=tf.zeros(shape, dtype='float'))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 987, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 889, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 347, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 332, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 638, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable None already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "python/gps/algorithm/policy_opt/tf_model_example.py", line 13, in init_bias
return tf.get_variable(name, initializer=tf.zeros(shape, dtype='float'))
File "python/gps/algorithm/policy_opt/tf_model_example.py", line 146, in multi_modal_network
'bc1': init_bias([num_filters[0]]),
File "python/gps/algorithm/policy_opt/policy_opt_tf.py", line 71, in init_network
network_config=self._hyperparams['network_params'])

INFO:signal_shutdown [atexit]

there does not exit the file: gps.algorithm.policy_opt.config_tf

When verifying the test_policy_opt_tf.py, I meet the following errors:

from gps.algorithm.policy_opt.config_tf import POLICY_OPT_TF
ImportError: No module named config_tf

Thanks

Wrong normalization in GMM-sigma initialization?

Hi there,

reading the code at https://github.com/cbfinn/gps/blob/master/python/gps/utility/gmm.py#L167

            # Initialize.
            for i in range(K):
                cluster_idx = (cidx == i)[0]
                mu = np.mean(data[cluster_idx, :], axis=0)
                diff = (data[cluster_idx, :] - mu).T
                sigma = (1.0 / K) * (diff.dot(diff.T))  # <<<<<<<<<< THIS LINE
                self.mu[i, :] = mu
                self.sigma[i, :, :] = sigma + np.eye(Do) * 2e-6

and the documentation about GMM at 1 I am wondering if the sigma value should be normalized by the count of data points in the cluster i instead of the total number of clusters k. That is:

sigma = (1.0 / cluster_idx.shape[0]) * (diff.dot(diff.T))  # Note the 'cluster_idx.shape[0]'

What do you think?

Best regards,

Julian

Unable to install Box2d

Hi @dongleecsu I tried the steps that you've mentioned but I'm not able to get the Box2D up. I still get the error of

from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
File "lunar_lander_dl_player.py", line 13, in
import Box2D
ModuleNotFoundError: No module named 'Box2D'

Could you help me wit this. Thanks in advance

Something wrong when simulate python python/gps/gps_main.py mjc_badmm_example.

i have set the environment of caffe,but when i run this, it occurs
Traceback (most recent call last):
File "python/gps/gps_main.py", line 414, in
main()
File "python/gps/gps_main.py", line 348, in main
hyperparams = imp.load_source('hyperparams', hyperparams_file)
File "/home/wq/gps/experiments/mjc_badmm_example/hyperparams.py", line 19, in
from gps.algorithm.policy_opt.policy_opt_caffe import PolicyOptCaffe
File "python/gps/algorithm/policy_opt/policy_opt_caffe.py", line 8, in
import caffe
File "python/gps/algorithm/policy_opt/caffe/init.py", line 1, in
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
File "python/gps/algorithm/policy_opt/caffe/pycaffe.py", line 15, in
import caffe.io
ImportError: No module named caffe.io
i see when set ros ,there are something to compile with
cd src/gps_agent_pkg/
cmake . -DUSE_CAFFE=1 -DCAFFE_INCLUDE_PATH=/path/to/caffe/distribute/include -DCAFFE_LIBRARY_PATH=/path/to/caffe/build/lib
make -j
but there isnt anything to set when set mujoco? so what's my problem?

The "right" way to calculate the overall covariance for GMM?

Hi, thanks for publishing this code. It is helping us a lot.

I want to ask about the part where you calculate the overall covariance for GMM.
If my understanding is correct, the overall covariance for GMM with weight w_i is:

So I think the corresponding change to your current implementation is:

--- a/python/gps/utility/gmm.py
+++ b/python/gps/utility/gmm.py
@@ -89,7 +89,7 @@ class GMM(object):
         # For some reason this version works way better than the "right"
         # one... could we be computing xxt wrong?
         diff = self.mu - np.expand_dims(mu, axis=0)
-        diff_expand = np.expand_dims(diff, axis=1) * \
+        diff_expand = np.expand_dims(self.mu, axis=1) * \
                 np.expand_dims(diff, axis=2)
         wts_expand = np.expand_dims(wts, axis=2)
         sigma = np.sum((self.sigma + diff_expand) * wts_expand, axis=0)

The comment in your code suggests that you're aware that the current code is not theoretically derived, so I'd like to know if my modified version is what you tried as the "right" version, and if so, how it performed "worse" compared to the current version.

I did try the above modification with mcj_badmm_example, and I couldn't find any significant difference between the original version.

Thanks!

How were the PR2 gains found for the PR2 example?

Hi,

We wrote a Baxter ros package for gps for testing the same code in baxter for a reaching task (torque control mode). Though the cost converges, it doesn't very well reach the goal location. We doubt the gain values. So it would be very helpful to know how you identified the gains of the pr2 given in hyperparams file of the pr2_example.

Another issue is that giving a slightly far of location on the pr2_example does not very well behave correctly. The example uses a very close target location to its starting location. Is there a rule of thumb on how far the target location should be from the start location? I doubt this is because of lack of data if we give a far of location.

Any helpful suggestion is appreciated.

Best,

Mike

Using ROS Interface for Gazebo Baxter

I was wondering if there is a quick way of interfacing the library with Baxter in Gazebo by changing src code within gps_agent_pkg and the corresponding launch files?

mjc_example fails on second iteration

When running the peg insertion mjc_example, the trajectory optimizer fails on the second step because it cannot regularize with very large eta:

ValueError: Failed to find PD solution even for very large eta (check that dynamics and cost are reasonably well conditioned)!

The issue seems to be condition 3, which actually increases cost in the first update:

Include observation in reward function without using it during state feedback

Hello,

I was wondering whether it is possible to use a feature or signal for reward calculation without explicitly having the trajectory optimizer using it as part of the state feedback.

For example, if I want to use an object's pose to calculate a reward, but only want to use the robot's joints for the feedback controller, how would I go about doing that? If I do not include the object's pose as a state, the algorithm throws an error if I want to use it for reward computation.

Box2d BADMM Examples

In the box2d_badmm_example, I've ran the experiment three times now but I find that the arm never reaches the vertical position. Instead it slides past the vertical position by a significant angle so that the objective of the whole experiment is unfulfilled.

I tried it with the mdgps algorithm as well and got the same result. Wondering if you had the same experience?

ImportError: Failed to import any qt binding

Anyone know why the problem occurs?Help me!

How "trajectory divergence term" is calculated in compute_cost function in algorithm_traj_opt.py

Could you help to give any reference to this part of code? Thank you!

def compute_costs(self, m, eta, augment=True):
    """ Compute cost estimates used in the LQR backward pass. """
    traj_info, traj_distr = self.cur[m].traj_info, self.cur[m].traj_distr
    if not augment:  # Whether to augment cost with term to penalize KL
        return traj_info.Cm, traj_info.cv

    multiplier = self._hyperparams['max_ent_traj']
    fCm, fcv = traj_info.Cm / (eta + multiplier), traj_info.cv / (eta + multiplier)
    K, ipc, k = traj_distr.K, traj_distr.inv_pol_covar, traj_distr.k

    # Add in the trajectory divergence term.
    for t in range(self.T - 1, -1, -1):
        fCm[t, :, :] += eta / (eta + multiplier) * np.vstack([
            np.hstack([
                K[t, :, :].T.dot(ipc[t, :, :]).dot(K[t, :, :]),
                -K[t, :, :].T.dot(ipc[t, :, :])
            ]),
            np.hstack([
                -ipc[t, :, :].dot(K[t, :, :]), ipc[t, :, :]
            ])
        ])
        fcv[t, :] += eta / (eta + multiplier) * np.hstack([
            K[t, :, :].T.dot(ipc[t, :, :]).dot(k[t, :]),
            -ipc[t, :, :].dot(k[t, :])
        ])

    return fCm, fcv

Talk

is anyboby using this algorithm in their own robots which is not PR2,just common Manipulator arm or your DIY works。

'data': aux_x0 not defined

Got one error

NameError: name 'aux_x0' is not defined

when run

python python/gps/gps_main.py pr2_badmm_example

I checked the code at
https://github.com/cbfinn/gps/blob/master/experiments/pr2_badmm_example/hyperparams.py#l89.
Should I change the aux_x0 to ja_aux ?

Error in import mjcpy with mujoco150

import mjcpy
ImportError: .../gps/src/3rdparty/mjpro/bin/libmujoco150.so: undefined symbol: __glewBlitFramebuffer
INFO:signal_shutdown [atexit]

I install opengl , libglew-dev and GLUT already. I also include them into 3rdparty/mjcpy2/CMakelist.txt as

else()
find_package( osg REQUIRED)
find_package( osgViewer REQUIRED)
find_package( OpenThreads REQUIRED)
find_package(osgGA REQUIRED)
find_package(OpenGL REQUIRED)
find_package(GLUT REQUIRED)
find_package(GLEW REQUIRED)
include_directories(SYSTEM ${GLEW_INCLUDE_DIRS} ${OPENGL_INCLUDE_DIRS} ${GLUT_INCLUDE_DIRS} )
# link_directories(${OSG})
set(OSG_LIBRARIES ${GLEW_LIBRARIES} ${OPENGL_LIBRARIES} ${GLUT_LIBRARY} ${OSG_LIBRARY} ${OSGVIEWER_LIBRARY} ${OPENTHREADS_LIBRARY} ${OSGGA_LIBRARY} )
endif()

Unimplemented Geom Type: -1

I am installing gps on ubuntu 16.04. Everything works fine with box2d examples. The compilation for mjcpy is also successful (I have generated mjcpy.so). However, when I execute the example code

python python/gps/gps_main.py mjc_example

the error occurs

DEBUG:No ROS enabled: No module named rospkg
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
unimplemented geom type: -1
GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.
Segmentation fault (core dumped)

I have not found such error related to mujoco online. Is it a potential error produced by mjcpy? Many thanks for any help.

Negative numbers as "average cost". Is that normal in the case of mjc_peg_images?

Can someone confirm that these results are normal and expected?

After few iterations of negative average cost some nans are starting to appear and the whole simulation blows up after approximately 15 iterations. The problem is specific to mjc_peg_images; reacher_images works just fine and the average cost is always positive.

Problem occurs when running "python python/gps/gps_main.py mjc_badmm_example"

When running "python python/gps/gps_main.py mjc_badmm_example‘， I got "ImportError: cannot import name 'QtCore'". Please help me.

Catkin version of ROS package & Docker image

Hey,

Thanks for making this code public. Are there plans to release a docker image of this setup anytime soon? And is there a reason why you guys chose the non-catkinized option for the ros implementation?
It would be nice to have, I think.

CostAction first order terms

In https://github.com/cbfinn/gps/blob/master/python/gps/algorithm/cost/cost_action.py#L28, the term dc/du is computed as (line 28):

lu = self._hyperparams['wu'] * sample_u

Should there be a negative sign in front, since this should compute the expansion of 0.5*(u-uref)^T*R*(u-uref)?

Bug

Running this code python python/gps/gps_main.py box2d_badmm_example;In the Calculating step,there has error below:
/home/prafly/desktop/gps/python/gps/algorithm/policy_opt/policy_opt_caffe.py:140: RuntimeWarning: divide by zero encountered in divide
self.policy.scale = np.diag(1.0 / np.std(obs, axis=0))
I0511 10:33:26.268626 17880 solver.cpp:341] Iteration 0, Testing net (#0)
I0511 10:33:26.268664 17880 net.cpp:748] Ignoring source layer Python4
I0511 10:33:26.268708 17880 solver.cpp:409] Test net output #0: InnerProduct3 = 0.000162208
I0511 10:33:26.268729 17880 solver.cpp:409] Test net output #1: InnerProduct3 = -0.00041328
I0511 10:33:26.268740 17880 solver.cpp:341] Iteration 0, Testing net (#1)
I0511 10:33:26.268748 17880 net.cpp:748] Ignoring source layer Python1
I0511 10:33:26.268760 17880 net.cpp:748] Ignoring source layer Python4
I0511 10:33:26.268780 17880 solver.cpp:409] Test net output #0: InnerProduct3 = 0
I0511 10:33:26.268792 17880 solver.cpp:409] Test net output #1: InnerProduct3 = 0
I0511 10:33:29.947541 17880 net.cpp:748] Ignoring source layer Python4
I0511 10:33:29.948006 17880 net.cpp:748] Ignoring source layer Python4
I0511 10:33:30.041903 17880 net.cpp:748] Ignoring source layer Python4
/home/prafly/desktop/gps/python/gps/utility/gmm.py:202: RuntimeWarning: invalid value encountered in less
w[:, (self.mass < (1.0 / K) * 1e-4)[:, 0]] = 1.0 / N
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "python/gps/gps_main.py", line 369, in
target=lambda: gps.run(itr_load=resume_training_itr)
File "python/gps/gps_main.py", line 67, in run
self._take_iteration(itr, traj_sample_lists)
File "python/gps/gps_main.py", line 195, in _take_iteration
self.algorithm.iteration(sample_lists)
File "/home/prafly/desktop/gps/python/gps/algorithm/algorithm_badmm.py", line 59, in iteration
self._update_policy_fit(m) # Update policy priors.
File "/home/prafly/desktop/gps/python/gps/algorithm/algorithm_badmm.py", line 180, in _update_policy_fit
SampleList(self.cur[m].pol_info.policy_samples)
File "/home/prafly/desktop/gps/python/gps/algorithm/policy/policy_prior_gmm.py", line 83, in update
self.gmm.update(XU, K)
File "/home/prafly/desktop/gps/python/gps/utility/gmm.py", line 174, in update
logobs = self.estep(data)
File "/home/prafly/desktop/gps/python/gps/utility/gmm.py", line 75, in estep
check_finite=False)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 81, in cholesky
check_finite=check_finite)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 30, in _cholesky
raise LinAlgError("%d-th leading minor not positive definite" % info)
LinAlgError: 1-th leading minor not positive definite

Doubt on 'conditions' parameter

I am unable to find out what exactly is the role of 'conditions' :4 in the hyperparameters file? Could someone please help?

Support tensorflow 1.2

Hi,
thanks for the great work!
I tweaked a few lines in the code in order to run gps using the newest version of tensorflow. Would you be interested in a pull request? It's mainly about changing deprecated functions.

Cheers,
Phil