Giter Site home page Giter Site logo

or2ac's Introduction

OR2AC

This is the implementation for my undergrad Final Year Project “Offline Risk-Averse Actor-Critic with Curriculum Learning”.

Project Description

In real-world scenarios, Offline RL has emerged as a preferable approach as it allows policy learning solely from historical data, eliminating the need for environmental interactions during learning. However, deploying offline RL presents multiple challenges, particularly in policy safety, handling out-of-distribution state-action pairs, and policy generalization. To address these, this project advances risk-averse and generalizable offline RL algorithms, enhancing real-world applicability.

This code implements several RL algorithms to foster risk-sensitive RL agents as baselines, including

Here is the overview of the proposed method:

overview

Fig.1 Overview of OR2AC

The process involves setting the performance metric and difficulty adjustment mechanism in the Curriculum Scheduler. Then, an online algorithm is selected to train the data collector and collect transitions for offline training. The collected data is used to train the offline learner using an offline algorithm from the Model Zoo. The Curriculum Scheduler controls the environment difficulty throughout the training process.

Installation

Here are the steps to run on your machine:

  1. Create and activate a conda environment, and install packages:

    conda create -n OR2AC python=3.9
    conda activate OR2AC
    pip install -r requirements.txt
  2. Run experiments: First, run train_online.py to generate dataset for offline training:

    python train_online.py --task_name online --env riskymassrandom --algo sac --seed 666

    Second, run train_offline.py:

    python train_offline.py --task_name offline --env riskymassrandom --algo codac --seed 666 --risk_prob 0.9 --risk_penalty 50.0 --risk_type cvar --risk_param 0.1 --tau_type iqn
  3. The file structure should look like this, then you can test your model using visualize.py

     .
     ├── env
     ├── model
     │   ├── sac.py
     │   ├── dsac.py
     │   ├── codac.py
     │   ├── networks.py
     │   └── utils.py
     ├── dataset
     │   └── task
     │       └── level
     ├── saved_policies
     │   └── task
     │       ├── online
     │       └── offline
     ├── README.md
     ├── train_online.py
     ├── train_offline.py
     ├── replay_memory.py
     └── visualize.py
    

Acknowledgement

The code in this repository is based on and inspired by the work of the authors and contributors from CODAC and DSAC.

or2ac's People

Contributors

pinqian77 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.