Giter Site home page Giter Site logo

openai-mountain-car-v0's Introduction

openai-mountain-car-v0

OpenAI challenges repository

I tested various approaches and found that properly tuned DQN plus cross-entropy pool solves this problem in the fastest way.

By DQN+CE I mean common DQN technique, but batches sampled each time for experience reply are selected proportionally to how good their appropriate episode was compared to the worst one with -200 total reward.

In common cross-entropy we basically select the best episodes and learn network to correctly predict action based on those steps. This drops experience for the wrong/non-existing steps and actions, which might be good to learn too.

Another approach I tested was main/follower networks, i.e. when you 'read' from the follower network which is slowly follows main network used for learning. This does not really change convergence noticeably, maybe reduces oscillations, which always happen likely because of overfitting.

Another approach was a3c and actor-critic model. This never reached performance of DQN and more commonly rises serious questions about its applicability for this task, since there is no clear convergence logic and proper reward match - it is always possible to create a set of steps which will have the same discounted reward, but will not lead to the winning strategy.

DQN is a simple network consisting of 3 layers: 50, 190 and 3 neurons, the first two have biases. I use tanh nonlinearity, relu never congested and in my opinion it is very overvalued function. Tanh has a nice property of being positive and negative, which is useful in this problem. The last layer uses linear activation function since Q-values are large enough.

I use l1+l2 regularization and RMSProp optimizer.

By tuning number of neurons you can heavily affect performance, so I believe my numbers can further be improved.

Code for all mentioned ideas is included in the repository https://github.com/bioothod/openai-mountain-car-v0, it is implemented in pure python + TF. To run the winning solution just type

$ python mc0.py

It will create mc0 and mc0_wrappers directories in the current dir, the former can be used

openai-mountain-car-v0's People

Contributors

bioothod avatar

Stargazers

Hao avatar  avatar Lei Mao avatar

Watchers

James Cloos avatar  avatar  avatar

Forkers

auserj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.