Giter Site home page Giter Site logo

moral_rl's Introduction

Multi-Objective Reinforced Active Learning

Dependencies

  • wandb
  • tqdm
  • pytorch >= 1.7.0
  • numpy >= 1.20.0
  • scipy >= 1.1.0
  • pycolab == 1.2

Weights and Biases

Our code depends on Weights and Biases for visualizing and logging results during training. As a result, we call wandb.init(), which will prompt to add an API key for linking the training runs with your personal wandb account. This can be done by pasting the WANDB_API_KEY into the respective box when running the code for the first time.

Environments

Our gridworlds (Emergency: randomized_v2.py, Delivery: randomized_v3.py) build on the Pycolab game engine with a custom wrapper to provide similar functionality as the gym environments. This engine comes with a user interface and any environment can be played in the console using python environment.py with arrow keys and w, a, s, d as controls.

Training

There are four training scripts for

  • manually training a PPO agent on custom rewards (ppo_train.py),
  • training AIRL on a single expert dataset (airl_train.py),
  • active MORL with custom/automatic preferences (moral_train.py) and
  • training DRLHP with custom/automatic preferences (drlhp_train.py).

When using automatic preferences, a desired ratio can be passed as an argument. For example,

python moral_train.py --ratio a b c

will run MORAL using a (real-valued) ratio of a:b:c among the three explicit objectives in Delivery.

Hyperparameters

Hyperparameters are passed as arguments to wandb.init() and can be changed by modifying the respective training files.

moral_rl's People

Contributors

mlpeschl avatar

Stargazers

 avatar Mikhail Vlasenko avatar Zijie Huang avatar Ziyan Wang avatar Elizaveta Tennant (Karmannaya) avatar  avatar  avatar Linghui Meng avatar

Watchers

 avatar

moral_rl's Issues

Backbone RL algorithm

Hello! Thanks for sharing the codes.

I am just a beginner in RL. And I am wondering if the active learning method can be built upon other RL methods, such as SAC. If so, can you please provide any advice on how to modify your codes?

Why are log action probabilities compared with advantages?

I am working with this code for my bachelor's thesis, and I am confused as of why for computing classification accuracy the code compares advantages (which can be both negative and positive) with log action probabilities (which are always negative).

Relevant snippet (update_discriminator() in airl.py):

class_predictions = torch.cat([torch.log(action_probabilities).unsqueeze(1), advantages], dim=1)
# Compute Loss function
loss = criterion(class_predictions, labels)
# Compute Accuracies
label_predictions = torch.argmax(class_predictions, dim=1)  # takes a max between probabilities and advantages
predicted_fake = (label_predictions[labels == 0] == 0).float()
predicted_expert = (label_predictions[labels == 1] == 1).float()

It seems to me that such code will be unable to predict a 0 for label_predictions for good actions, as their advantage will be more than 0, but the log probability cant be more than 0, as log(1) = 0.

From my experiments, I also observe that 'Fake accuracy' is consistently lower than 'Real accuracy'.

Could you please point me to an explanation? Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.