Giter Site home page Giter Site logo

bco's Introduction

Behavioral Cloning from Observation [Paper]

Update

2019/11/28:

  1. Implement tensorflow 2.0 version and push to tf2.0 branch

Introduction

This is an implementation of BCO in Tensorflow on cartpole environment.

There are two phases in BCO: (1) Inverse dynamic model which experience in a self-supervised fashion. (2) Policy model which use behavioral cloning by observing the expert perform without actions and get the action by (1).

Overview

Algorithm

As method above, there are two phases in BCO. In lines 5-9, phase 1, improving the inverse dynamic model. In lines 10-12, phase 2, improving the policy model by behavioral cloning.

algorithm

Training BCO

After collecting the observation of expert demonstration, you can then train BCO. The training script are in scripts directory. Let's see how to train BCO on cartople:

./scripts/run_bco_cartpole.sh
# python3 models/bco_cartpole.py --mode=train --input_file=demonstration/cartpole/cartpole.txt --model_dir=model/cartpole/

List of Args:

--input_filename   - the demonstration inputs
--mode             - train or test
--model_dir        - where to save/restore the model
--maxits           - the number of training iteration
--M                - the number of post demonstration examples
--batch_size       - number of examples in batch
--lr               - initial learning rate for adam SGD
--save_freq        - save model every save_freq iterations, 0 to disable
--print_freq       - print reward and loss every print_freq iterations, 0 to disable

Evaluation

Get your evaluation result by the testing script in scripts directory. Let's see the examples for evaluate BCO on cartpole:

./scripts/test_bco_cartpole.sh
# python3 models/bco_cartpole.py --mode=test --model_dir=model/cartpole/

Training on your own dataset and architectures

Prepare your own data

The representation expects trajectories to be in a text file in the form:

[state] [next_state]

Each line in the file represents an observation from state to next_state. The demonstration must be in a file that contains only of demonstrations of this form. See demonstration directory for examples.

Using your own architecture

First in __init__ function you could specific your own environment.

self.env = gym.make('Cartpole-v0') # which could change to any of your environment.

Later you have to implement following function which is related to your model or interact with your own environment.

  1. implement your model.

    build_policy_model builds your own policy model

    build_idm_model builds your own inverse dynamic model

  2. interact with environment.

    pre_demonstration uniform sample action to generate (s_t, s_t+1) and action pairs

    post_demonstration uses policy to generate (s_t, s_t+1) and action pairs

    eval_rwd_policy gets reward by evaluate the policy function

Implement the above function. See bco_cartpole.py in models directory for examples.

Citation

@article{torabi2018behavioral,
  title={Behavioral Cloning from Observation},
  author={Torabi, Faraz and Warnell, Garrett and Stone, Peter},
  journal={arXiv preprint arXiv:1805.01954},
  year={2018}
}

bco's People

Contributors

jerrylin1121 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.