Giter Site home page Giter Site logo

lunarlanderii's Introduction

LunarLanderII

Update Sept 2021: Since this project is part of homework in Carnegie Mellon University 10701 course Intro to Machine Learning (PhD) and future versions of this course may have similar homework, the code for this project should not be made public. Hence the python script is removed. Sorry for the inconvinience.

0. Basic Information

  • Name: Lunar Lander II, OpenAI Gym Reinforcement Learning Project
  • Author: Yutian Wang
  • Date: April 2021
  • Version: v1.0

1. Introduction

This is a project aiming to solve the Lunar Lander v2 reinforcement Learning problem privided by OpenGym AI.

In the Lunar Lander V2 environment, we aim to automate the landing process of a lunar lander. The lander has three boasters: a main boaster which fires downward to slow the landing, and two "side boasters" (one on left and one on right) that help adjust the position of the lander. The whole simulation is 2-dimensional.

The lunar land, as the real Moon surface, is not flat and has different and different shapes. The shape of the lunar surface differs in each round. The surface is visible to the player or the lander. There are two flags on the lunar surface, and our aim is let the lander land within the area between these two flags. The landing fails either if the lander land outside of the area between the flags or crash.

Below is a frame from the example video.

LunarLanderV2

More detailed information about the environment can be found here.

2. Algorithm

This project used the REINFORCE algorithm to train the automate lander and do reinforcement learning. REINFORCE is a policy gradient algorithm, and more specifically, in this case, at each time step, the lander can fire any of the three boasters.

We want to train an algorithm that decide which engine(boaster) to fire up at each time step. We build a deep neural network to help make that decision and do the training. In each trail, we calculate the reward as specified in the environment description and compute the cost (see function compute_expected_cost), and do back-tracking gradient descent on the neural network architecture we built and updates the weights accordingly. We kept training for 5000 trails for over 6 hours until the reward converges.

3. Result

We run our trained automated lander for another 100 times and it landed successfully for 94 times. So it reached an successful rate of 94%.

4. Files in this repository

  • README.md: this file.
  • reinforce.py: Python model building and training file. (Removed)
  • mypolicy.pth: the file contains the trained weights for the neural network.
  • lunarlander2.png: a picture used in this file for illustration purpose.

5. About and Acknowledgement

This project is a project homework in 10-701 Introducion to Machine Learning in Carnegie Mellon University. Special thanks to the Professors and TAs for teaching this course, designing this assignment, providing the starter code and debuggging help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.