Giter Site home page Giter Site logo

papionline's Introduction

PAPIOnline

Until now, mankind has found solutions to many complex problems by using their mind. Partially observable environments are among these problems. In time, they learn what they have to do in an environment which they cannot fully control. They act by making predictions with their experiences. Developing artificial intelligence techniques take this behavior as an example. Reinforcement learning method tries to reach the best solution with trial and error. It associates the actions with the observations from and tries to maximize cumulative reward. Since the number of observations that can be obtained in a partially observable environment is limited, it is very difficult to establish this relationship. Therefore, the reward mechanism becomes extremely important. The agent, which has to fulfill many tasks in order to reach the target, must be guided by reward shaping. Reward functions are getting more and more complicated because the importance of actions and tasks change dynamically. It is very difficult to maintain and reuse such calculations.

In this thesis, we present a method that can replace complex reward calculations for partially observable environments. We have created a system that can feed reinforcement learning with rewards from Monte Carlo Tree Search, which is a heuristic search algorithm. When the agent does the best action in a situation, it should get the highest reward. So, how do we interpret that action as the best action? Since the same action may produce different results in different situations, a reward should be given according to the result. It is a daunting task to evaluate and prioritize the results. The MCTS algorithm finds the best action of a given state with its forward simulations. While doing this, it simulates the game many times and evaluates these results. So the result-oriented approach we just mentioned is also valid for MCTS. We created a system for evaluating actions which are taken by RL with MCTS. After each action, we ran MCTS to find best action for current state. Then we gave a positive reward to RL if selected action is same as MCTS's best action, orherwise, we gave a negative reward.

In order to test the method we developed, we chose the Payer versus Player feature of MMORPG game, which is a partially observable environment. We trained same agent with self-play deep reinforcement learning with different reward strategies on a sample MMORPG implementation we wrote. The first agent received manual rewards and the second received rewards according to the MCTS result. Then, we had them battle each other. As the results show that, MCTS strategy outperformed the manual one.

Dependency Versions

Unity v2019.3.13f1 ML-Agents v1.0.2 Tensorflow v2.0.2

papionline's People

Contributors

tarikkarsi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.