Giter Site home page Giter Site logo

aiueola / neurips2023-future-dependent-ope Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 57 KB

(NeurIPS2023) "Future-Dependent Value-Based Off-Policy Evaluation in POMDPs"

License: Apache License 2.0

Python 100.00%
off-policy-evaluation reinforcement-learning research

neurips2023-future-dependent-ope's Introduction

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs


About

This repository contains the code to replicate the semi-synthetic experiments conducted in the paper "Future-Dependent Value-Based Off-Policy Evaluation in POMDPs" by Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, and Wen Sun, which has been accepted to NeurIPS2023 as a Spotlight.

Click here to show the abstract

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play and perform a similar role to that of classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is close to the true policy value as long as futures and histories contain sufficient information about latent states, and the Bellman completeness.

If you find this code useful in your research then please site:

@artile{uehara2023future,
  author = {Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, and Wen Sun},
  title = {Future-Dependent Value-Based Off-Policy Evaluation in POMDPs},
  journal = {Advances in Neural Information Processing Systems},
  volume = {xxx},
  pages = {xxx -- xxx},
  year = {2023},
}

Dependencies

This repository supports Python 3.7 or newer.

  • numpy==1.22.4
  • pandas==1.5.3
  • scikit-learn==1.0.2
  • matplotlib==3.7.1
  • torch==2.0.0
  • d3rlpy==1.1.1
  • hydra-core==1.3.2

Running the code

To conduct the experiments with CartPole, run the following command.

(i) learning policies

python src/online_learning.py online_learning.noise_param=0.0
python src/online_learning.py online_learning.noise_param=0.3

(ii) OPE

python src/evaluation_neural.py

neurips2023-future-dependent-ope's People

Contributors

aiueola avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.