Giter Site home page Giter Site logo

wumpus_world_agents's Introduction

The Wumpus World Agents (Naive, Probabilistic, Deep Q-learning)

In this project I built an environment simulator and three different agents for the AI Wumpus World Environment (partially observable game environment).

Naive Agent

Jupyter notebook: naive_agent.ipynb

Naive Agent chooses the next action randomly between the six possible actions (Forward, Turn Left, Turn Right, Shoot, Grab and Climb) with uniform probability.

Probabilistic Agent (ProbAgent)

Jupyter notebook: prob_agent_collect_experience.ipynb

  • ProbAgent uses probabilistic reasoning to search the grid of squares for the gold as safely as possible. Any grid size can be used.
  • Bayesian networks were created using the Pomegranate library to make inferences about the probability of danger at new locations.
  • The Python NetworkX library was used to build a graph of safe locations and find the shortest safe path to the target location.
  • During 10,000 games with a 4x4 grid, the agent won 40% of the games. The average score per game was 266.

Q-Learning Agent (DeepQAgent)

Jupyter notebook: q_agent_two_input_network.ipynb

  • Q-learning with epsilon-greedy policy
  • An action-value network with two inputs (the states and actions) and one output (the action-value) was used. The encoded state (a 3-D tensor using 13 feature planes) goes through several convolutional layers. The proposed action goes into a separate input. The output of the convolutional layers is combined with the proposed action and passed through a dense layer.
  • The experience data generated by the probabilistic agent ProbAgent was used as the first experience set to train the DeepQAgent. The DeepQAgent learned to climb out without gold.
  • Evaluating the updated agent (1,000 games with a 4x4 grid): The average score per game was about -85 (if epsilon=0.5) and -1.4 (if epsilon=0.0). The wins percentage was about 0.3% (epsilon=0.5). The agent needs further training for improvement.
  • The network and DeepQAgent can be used for larger grids

The Wumpus World Environment - Rules

The rules of the environment were mostly taken from Russell and Norvig, Artificial Intelligence: A Modern Approach.

Example Grid (4x4):

Example Grid 4x4

The Wumpus World is a grid of squares surrounded by walls (represents a cave), where each square can contain agents and objects.

  • The Agent always starts in the lower left corner - in the code it is labelled as (0, 0), facing to the right (Agent’s orientation - East).
  • The Agent dies if it enters a square containing a pit or a live monster Wumpus. It is safe to enter a square with a dead Wumpus.
  • The Agent's goal is to find the gold and bring it back to the start as quickly as possible, without being killed, and climb out of the cave. Also, the agent may be allowed to climb out of the cave without gold.
  • The game ends either when the Agent dies or when the Agent climbs out of the cave.

Locations of the Wumpus, gold and pits: The locations of the gold and the Wumpus are chosen randomly, with a uniform distribution, from the squares other than the start square. In addition, each square other than the start can be a pit, with probability = pit_prob

The Agent is facing one of four possible directions (Agent’s orientation): North, South, East or West.

The Agent’s Actions:

  • The Agent can go Forward
  • Turn Right by 90°
  • Turn Left by 90°
  • The action Grab can be used to pick up the gold if it is in the same square as the Agent
  • The action Shoot can be used to fire an arrow in a straight line in the direction the agent is facing, the arrow continues until it either kills the Wumpus or hits a wall. The Agent has only one arrow
  • The action Climb, can be used to climb out of the cave, but only from the start square

The Agent’s Percepts:

  • In the square containing the Wumpus and in the directly (not diagonally) adjacent squares, the Agent will receive a Stench
  • In the squares directly adjacent to a pit, the Agent will perceive a Breeze
  • In the square where the gold is, the Agent will perceive a Glitter
  • When an Agent walks into a wall it will perceive a Bump
  • When the Wumpus is killed, it emits a woeful Scream that can be perceived anywhere in the cave

The Percept also contains the reward calculated by the environment after each Agent's action : +1000 for climbing out of the cave with the gold, -1000 for falling into a pit or being eaten by the Wumpus, -1 for each action taken and -10 for using the arrow.

An environment is initialized with the following parameters:

  • width of the grid
  • height of the grid
  • allow climb without gold
  • pit probability: the probability of a pit being added to each square except (0, 0)

The standard game is an initialization of (4, 4, True, 0.2).

Notes:

  • The Agent must only have access to the Percepts. The Agent should not be able to access any other information about the state of the Environment (where the Wumpus is, whether there is a pit in a location, etc.)
  • The Wumpus and pits do not move during a game, but they will move from one game to the next

wumpus_world_agents's People

Contributors

izlata avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

asouzujoseph

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.