Giter Site home page Giter Site logo

wangjksjtu / rl-perturbed-reward Goto Github PK

View Code? Open in Web Editor NEW
27.0 4.0 3.0 163 KB

Reinforcement Learning with Perturbed Reward, AAAI 2020

Home Page: https://arxiv.org/abs/1810.01032

License: MIT License

Python 86.24% Shell 13.76%
reinforcement-learning noisy-labels confusion-matrix openai-gym atari-games aaai2020

rl-perturbed-reward's Introduction

RL with Perturbed Rewards

This is the tensorflow implementation of Reinforcement Learning with Perturbed Rewards as described in the following AAAI 2020 paper (Spotlight):

@inproceedings{wang2020rlnoisy,
  title={Reinforcement Learning with Perturbed Rewards},
  author={Wang, Jingkang and Liu, Yang and Li, Bo},
  booktitle={AAAI},
  year={2020}
}

The implementation is based on keras-rl and OpenAI baselines frameworks. Thanks to the original authors!

  • gym-control: Classic control games
  • gym-atari: Atari-2600 games

Dependencies

  • python 3.5
  • tensorflow 1.10.0, keras 2.1.0
  • gym, scipy, scipy, joblib, keras
  • progressbar2, mpi4py, cloudpickle, opencv-python, h5py, pandas

Note: make sure that you have successfully installed the baseline package and other packages following (using virtualenvwrapper to create virtual environment):

mkvirtualenv rl-noisy --python==/usr/bin/python3
pip install -r requirements.txt
cd gym-atari/baselines
pip install -e .

Examples

  • Classic control (DQN on Cartpole)
cd gym-control
python cem_cartpole.py                                           # true reward
python dqn_cartpole.py --error_positive 0.1 --reward noisy       # perturbed reward
python dqn_cartpole.py --error_positive 0.1 --reward surrogate   # surrogate reward (estimated)
  • Atari-2600 (PPO on Phoenix)
cd gym-atari/baselines
python -m baselines.run --alg=ppo2 --env=PhoenixNoFrameskip-v4 \  # true reward
       --num_timesteps=5e7 --normal=True                          
python -m baselines.run --alg=ppo2 --env=PhoenixNoFrameskip-v4 \  # noisy reward
       --num_timesteps=5e7 --save_path=logs-phoenix/phoenix/ppo2_50M_noisy_0.2 \
       --weight=0.2 --normal=False --surrogate=False --noise_type=anti_iden
python -m baselines.run --alg=ppo2 --env=PhoenixNoFrameskip-v4 \  # surrogate reward (estimated)
       --num_timesteps=5e7 --save_path=logs-phoenix/phoenix/ppo2_50M_noisy_0.2 \
       --weight=0.2 --normal=False --surrogate=True --noise_type=anti_iden

Reproduce the Results

To reproduce all the results reported in the paper, please refer to scripts/ folders in rl-noisy-reward-control and rl-noisy-reward-atari:

  • gym-control/scripts
    • Cartpole
      • train-cem.sh (CEM)
      • train-dqn.sh (DQN)
      • train-duel-dqn.sh (Dueling-DQN)
      • train-qlearn.sh (Q-Learning)
      • train-sarsa.sh (Deep SARSA)
    • Pendulum
      • train-ddpg.sh (DDPG)
      • train-naf.sh (NAF)
  • gym-atari/scripts
    • train-alien.sh (Alien)
    • train-carnival.sh (Carnival)
    • train-mspacman.sh (MsPacman)
    • train-phoenix.sh (Phoenix)
    • train-pong.sh (Pong)
    • train-seaquest.sh (Seaquest)
    • train-normal.sh (Training with true rewards)

If you have eight available GPUs (Memory > 8GB), you can directly run the *.sh scripts one at a time. Otherwise, you can follow the instructions in the scripts and run the experiments. It ususally takes one or two days (GTX-1080 Ti) to train the policy.

cd rl-noisy-reward-atari/baselines
sh scripts/train-alien.sh

The logs and models will be saved automatically. We provide results_single.py for getting the averaged scores:

python -m baselines.results_single --log_dir logs-alien

Citation

Please cite our paper if you use this code in your research work.

Questions/Bugs

Please submit a Github issue or contact [email protected] if you have any questions or find any bugs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.