Giter Site home page Giter Site logo

stove's Introduction

Structured Object-Aware Physics Prediction for Video Modelling and Planning

See arxiv.org/1910.02425 or openreview.net/forum?id=B1e-kxSKDH for further information.

Example Animations

Video Prediction

Real Ours VRNN SQAIR DDPAE Linear Supervised
Real Ours VRNN SQAIR DDPAE Linear Supervised

The above depicts the reconstruction and prediction errors of the various models. The models are given 8 frames of video as input, which they reconstruct. Conditioned on this input, all models predict the following 92 frames. Only STOVE manages to generate visually convincing physically behavior over longer timeframes. 10 different sequences of length 100 are shown.

Model-Based Control

MCTS on STOVE MCTS on Real Env. PPO on Env. States PPO on Env. Images

The above shows the performance of the compared models in the interactive environments. The agent controls the red ball and negative reward is given whenever the red ball collides with any other ball. STOVE is used as a world model, predicting future states, frames, and rewards. MCTS can then be used on STOVE for model-based control. We compare to MCTS on the real environment states, as well as PPO on the environment states and raw images. Again, 10 different sequences of length 100 are shown.

Abstract

Humans can easily predict future outcomes even in settings with complicated interactions between objects. For computers, however, learning models of interactions from videos in an unsupervised fashion is hard. In this paper, we demonstrate that structure and compositionality are key to solving this problem. Specifically, we develop a novel video prediction model from two distinct components: a scene model and a physics model. Both models are compositional and exploit repeating elements wherever possible. We impose a highly structured bottleneck between model components to allow for fast learning and clearly interpretable functionality, without losing any generality or performance. This fully compositional approach yields a strong video prediction model, which clearly outperforms relevant baselines. We produce realistic looking physical behaviour over a possibly infinite time frame and perform competitively even compared to a supervised approach. Finally, we demonstrate the strength of our model as a simulator for sample efficient model-based reinforcement learning in tasks with heavily interacting objects.

Other animations

STOVE's rollouts are stable for a possibly infinite number of timesteps. (Shown are 2000 frames of rollout and we tested up to 100000.)

All components of STOVE scale well to videos with larger number of objects!

Citation

Please cite our work here and at arxiv.org/1910.02425 as

@inproceedings{kossen2020structured,
  title={Structured Object-Aware Physics Prediction for Video Modeling and Planning},
  author={Kossen, Jannik and Stelzner, Karl and Hussing, Marcel and Voelcker, Claas and Kersting, Kristian},
  booktitle={Proceedings of the International Conference on Learning Representations},
  year={2020}
}

Data

Run run_scripts.py --create-data to generate billiards and gravity data. Also random data collected in the RL setting to train the action- conditioned world model is generated.

Model

Run bash run_files/run_models.sh to train the model on billiards and gravity data. Also an actioned-conditioned world model is trained on the avoidance task.

Interactive

python run_scripts.py --interactive allows you to either play in a live environment or a model simulation of the environment.

Questions

If you have any questions or problems regarding the code or paper do not hesitate to contact us.

stove's People

Contributors

jlko avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.