The reimplementation of Model Predictive Path Integral (MPPI) from the paper "Information Theoretic MPC for Model-Based Reinforcement Learning" (Williams et al., 2017) for the pendulum OpenAI Gym environment
I have a quick question on the implementation though.
In this line, you are just adding up costs throughout trajectories while Algorithm 2 in the paper adds extra term which is "lambda * u^{T}{t-1} \Sigma^{-1}*\epsilon^{k}{t-1}".
Are there any plans to extend this to approximated dynamics (e.g. with a NN) and using importance sampling instead of sampling trajectories directly from the environment?
(replace __init__ env arg with a dynamics arg, then take in env for just the control method for actual stepping)
That would actually match the contributions from the 2017 paper and make it more broadly applicable. I would like to use this in an environment where I can't reset the state of the simulator, so trajectories have to be generated with the model.