Giter Site home page Giter Site logo

policy-gradient-pytorch's Introduction

policy-gradient-baseline-pytorch

AI agent solves CartPole and LunarLander environments in the OpenAi gym using vanilla policy gradient method. The agent uses average rewards as a baseline.

Training :

  • It uses monte-carlo method for learning. (Agent waits till the end of eposides to learn).
  • During an episode trajectory of state, action, rewards are stored. At the end of an episode the neural network approximates the probablity distribution of actions for the states in trajectory.
  • Loss is calculated with the sum of producs of the log probablity of action with discounted rewards in the trajectory.

Result

Agent with baseline performed better than the agent without baseline.

Policy Gradient CartPole -v1 Policy Gradient Baseline Vs NoBaseline

Usage

command line arguments : --env environment (CartPole-v1 or LunarLander-v2) --learn training the agent --play to make the agent play with the environment -ep number of episodes to play to train -g discount factor gamma -lr learning rate

  • To training the agent : run python agent.py --env LunarLander-v2 --learn -ep 1000
  • To play : run python agent.py --env LunarLander-v2 --play -ep 5

animated

Requirements

policy-gradient-pytorch's People

Contributors

iamvigneshwars avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.