Giter Site home page Giter Site logo

time-travel's Introduction

Time-Travel

Introduction

In this project, we are creating a custom environment. The environment is trivial and provides an agent with defined rules and design of a β€œgame”. Here we have starting and ending points. The aim of the AI is to get to the end and with the help of Reinforcement Learning, possibly go several decisions back and improve its performance. What we eventually want to imitate is how the human thinks when making decisions from the experiences based on the reward signal. As might be expected, our model will be designed with the help of a rewarding system, which means there will be a reward score for the agent with every decision it makes. This will be a measure of how much the AI is motivated to make a similar decision. All of the methods and models will be described in the following sections.

Method

  • Reinforcement learning
  • PPO
  • LSTM
  • Actor-critic

Enviroment

The enviroment is based on the following github. We are using gym-minigrid for the project. https://github.com/utnnproject/gym-minigrid

Custom enviroment

We made our own custom enviroment for using time-travel. In the customs.py file, we defined our simple enviroment for the project.

Time-Travel

We used a method called time-travel to make the agent learn more efficient. By using time-travel, agent now has a option to go back few frames before from the current frame and change decisions to reach the end point more efficiently. We defined the function of time-travel on the minigrid.py file.

The past frames are memorized in a list and this list is updated every frame. When the length of this list is more than 7, the agent can take an option to go back frames.

Training

To train the agents you can code below. --recurrence x will add memory for training.

python -m scripts.train --algo ppo --env <environment name(.ex MiniGrid-Customs-LineCorridor-v0)> --model <model_name> --recurrence 4 --save-interval 10 --frames 5000000 --lr 0.0003 --discount 0.95

Result of Time-Travel

This is the exploration without using time-travel. You can see that the agent is exploring across the map to find the end point.

animated

Now this is exploration using time-travel. The agent is going back on time so it can reach to end point with less exploration.

animated

The following figure is the heatmap of time-travel position. You can see that the agent is often using time-travel when they see the left end and right end.

The following figure is the heatmap of agent position. Agent is coming back to the center of the map by using time-travel.

The following figure is the rewards that agent gain for each frames. You can see that the agent doesn't gain rewards on the first but after 2 million frames the agent finds the optimal solution and the gain reward become stable.

References

[1] https://github.com/DLR-RM/stable-baselines3

[2] https://github.com/DLR-RM/rl-baselines3-zoo

[3] https://github.com/lcswillems/rl-starter-files

[4] https://github.com/lcswillems/torch-ac

time-travel's People

Contributors

handykurniawan avatar shumpeimorimoto avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.