Giter Site home page Giter Site logo

transfer-dreamer's Introduction

On The Effectiveness of World Models For Continual Reinforcement Learning

This repository contains code to reproduce the experiments in our 2023 Collas paper: [On The Effectiveness of World Models For Continual Reinforcement Learning(https://arxiv.org/abs/2211.15944)]. We show that world-models - in particular DreamerV2 - are effective at retaining skills during continual reinforcement learning

@article{kessler2022surprising,
  title={The surprising effectiveness of latent world models for continual reinforcement learning},
  author={Kessler, Samuel and Mi{\l}o{\'s}, Piotr and Parker-Holder, Jack and Roberts, Stephen J},
  journal={arXiv preprint arXiv:2211.15944},
  year={2022}
}

Using the Package

Create a conda package iwht python==3.8 and the following packages and versions in requirements.txt.

To install nle on a machine without root access these steps are helpful.

Minigrid

To install gym-minigrid:

cd gym_minigrid
pip install -e .
cd ..

To run dreamerv2 and dreamerv2 + p2e:

# DV2
python train_minigrid.py --cl --num_tasks=3 --tag=mg_new_cl_s1_1M --steps=750000 --seed=1 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --wandb_proj_name=minigrid_new --minlen=5 --rssm_full_recon
# DV2 + state_bonus
python train_minigrid.py --cl --num_tasks=3 --tag=mg_sb_fr_cl_s0_1M --wandb_proj_name=minigrid_state_bonus --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --wandb_proj_name=minigrid_new --minlen=5 --state_bonus --rssm_full_recon
# DV2 + p2e
#     + random sampling of expl replay buffer
#     + grad heads for obs only
#     + same expl and eval policies
python train_minigrid.py --wandb_proj_name=minigrid_new --cl --num_tasks=3 --tag=mg_new_cl_p2e0.9_s6_1M --steps=750000 --seed=6 --plan2explore --expl_intr_scale=0.9 --expl_extr_scale=0.9 --logdir=logs --del_exp_replay --minlen=50 --rssm_full_recon --sep_exp_eval_policies

Training on Minihack

To run dreamerv2 on cl-small:

# DV2 
python train_minihack.py --cl --cl_small --num_tasks=4 --tag=mh_cl_small_s0_1M --wandb_proj_name=minihack_task_dist --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --rssm_full_recon --minlen=5 --replay_capacity=1000000

To run dreamerv2 + p2e on cl-small:

# DV2  + p2e
python train_minihack.py --cl --cl_small --num_tasks=4 --tag=mh_cl_small_s0_1M --wandb_proj_name=minihack --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --minlen=5 --replay_capacity=1000000  --plan2explore --expl_intr_scale=0.9 --expl_extr_scale=0.9

To run dreamerv2 and rs aka continual-dreamer on cl-small:

# DV2 + rs 
python train_minihack.py --cl --cl_small --num_tasks=4 --tag=mh_cl_small_rs_s0_1M --wandb_proj_name=minihack_task_dist --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --minlen=5 --replay_capacity=1000000 --reservoir_sampling

To run dreamerv2 + p2e + rs aka continual dreamer on cl-small:

# DV2  + p2e + rs
python train_minihack.py --cl --cl_small --num_tasks=4 --tag=mh_cl_small_rs_s0_1M --wandb_proj_name=minihack --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --minlen=5 --replay_capacity=1000000  --plan2explore --expl_intr_scale=0.9 --expl_extr_scale=0.9 --reservoir_sampling

To run dreamerv2 + reservoir sampling + 50:50 on cl-small:

# DV2 + reservoir sampling + 50:50
python train_minihack.py --cl --cl_small --num_tasks=4 --tag=mh_cl_small_rs_5050_s0_1M --wandb_proj_name=minihack --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --minlen=5 --replay_capacity=1000000 --reservoir_sampling --recent_past_sampl_thres=0.5 

To run dreamerv2 + coverage maximization on cl-small:

# DV2 + coverage maximization
python train_minihack.py --cl --cl_small --num_tasks=4 --tag=mh_cl_small_cm_s0_1M --wandb_proj_name=minihack_task_dist --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --minlen=5 --replay_capacity=1000000 --coverage_sampling

To run dreamerv2 + reward sampling on cl-small:

python train_minihack.py --cl --cl_small --num_tasks=4 --tag=mh_cl_small_rwd_new_s0_1M --wandb_proj_name=minihack_task_dist --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --minlen=5 --replay_capacity=1000000 --reward_sampling

8 task Minihack

# DV2
python train_minihack.py --cl --num_tasks=8 --tag=mh_cl_s0_1M --wandb_proj_name=cl_8_tasks_RS --wandb_group=dv2 --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --sep_exp_eval_policies --minlen=5 --replay_capacity=2000000
# DV2 + p2e
python train_minihack.py --cl --num_tasks=8 --tag=mh_cl_p2e_s0_1M --wandb_proj_name=cl_8_tasks_RS --steps=1000000 --seed=0 --logdir=logs_cl --del_exp_replay --minlen=5 --replay_capacity=2000000 --reservoir_sampling

transfer-dreamer's People

Contributors

skezle avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.