The Wumpus World Agents (Naive, Probabilistic, Deep Q-learning)

In this project I built an environment simulator and three different agents for the AI Wumpus World Environment (partially observable game environment).

Naive Agent

Jupyter notebook: naive_agent.ipynb

Naive Agent chooses the next action randomly between the six possible actions (Forward, Turn Left, Turn Right, Shoot, Grab and Climb) with uniform probability.

Probabilistic Agent (ProbAgent)

Jupyter notebook: prob_agent_collect_experience.ipynb

ProbAgent uses probabilistic reasoning to search the grid of squares for the gold as safely as possible. Any grid size can be used.
Bayesian networks were created using the Pomegranate library to make inferences about the probability of danger at new locations.
The Python NetworkX library was used to build a graph of safe locations and find the shortest safe path to the target location.
During 10,000 games with a 4x4 grid, the agent won 40% of the games. The average score per game was 266.

Q-Learning Agent (DeepQAgent)

Jupyter notebook: q_agent_two_input_network.ipynb

Q-learning with epsilon-greedy policy
An action-value network with two inputs (the states and actions) and one output (the action-value) was used. The encoded state (a 3-D tensor using 13 feature planes) goes through several convolutional layers. The proposed action goes into a separate input. The output of the convolutional layers is combined with the proposed action and passed through a dense layer.
The experience data generated by the probabilistic agent ProbAgent was used as the first experience set to train the DeepQAgent. The DeepQAgent learned to climb out without gold.
Evaluating the updated agent (1,000 games with a 4x4 grid): The average score per game was about -85 (if epsilon=0.5) and -1.4 (if epsilon=0.0). The wins percentage was about 0.3% (epsilon=0.5). The agent needs further training for improvement.
The network and DeepQAgent can be used for larger grids

The Wumpus World Environment - Rules

The rules of the environment were mostly taken from Russell and Norvig, Artificial Intelligence: A Modern Approach.

Example Grid (4x4):

The Wumpus World is a grid of squares surrounded by walls (represents a cave), where each square can contain agents and objects.

The Agent always starts in the lower left corner - in the code it is labelled as (0, 0), facing to the right (Agent’s orientation - East).
The Agent dies if it enters a square containing a pit or a live monster Wumpus. It is safe to enter a square with a dead Wumpus.
The Agent's goal is to find the gold and bring it back to the start as quickly as possible, without being killed, and climb out of the cave. Also, the agent may be allowed to climb out of the cave without gold.
The game ends either when the Agent dies or when the Agent climbs out of the cave.

Locations of the Wumpus, gold and pits: The locations of the gold and the Wumpus are chosen randomly, with a uniform distribution, from the squares other than the start square. In addition, each square other than the start can be a pit, with probability = pit_prob

The Agent is facing one of four possible directions (Agent’s orientation): North, South, East or West.

The Agent’s Actions:

The Agent can go Forward
Turn Right by 90°
Turn Left by 90°
The action Grab can be used to pick up the gold if it is in the same square as the Agent
The action Shoot can be used to fire an arrow in a straight line in the direction the agent is facing, the arrow continues until it either kills the Wumpus or hits a wall. The Agent has only one arrow
The action Climb, can be used to climb out of the cave, but only from the start square

The Agent’s Percepts:

In the square containing the Wumpus and in the directly (not diagonally) adjacent squares, the Agent will receive a Stench
In the squares directly adjacent to a pit, the Agent will perceive a Breeze
In the square where the gold is, the Agent will perceive a Glitter
When an Agent walks into a wall it will perceive a Bump
When the Wumpus is killed, it emits a woeful Scream that can be perceived anywhere in the cave

The Percept also contains the reward calculated by the environment after each Agent's action : +1000 for climbing out of the cave with the gold, -1000 for falling into a pit or being eaten by the Wumpus, -1 for each action taken and -10 for using the arrow.

An environment is initialized with the following parameters:

width of the grid
height of the grid
allow climb without gold
pit probability: the probability of a pit being added to each square except (0, 0)

The standard game is an initialization of (4, 4, True, 0.2).

Notes:

The Agent must only have access to the Percepts. The Agent should not be able to access any other information about the state of the Environment (where the Wumpus is, whether there is a pit in a location, etc.)
The Wumpus and pits do not move during a game, but they will move from one game to the next

izlata / wumpus_world_agents Goto Github PK

wumpus_world_agents's Introduction

The Wumpus World Agents (Naive, Probabilistic, Deep Q-learning)

Naive Agent

Probabilistic Agent (ProbAgent)

Q-Learning Agent (DeepQAgent)

The Wumpus World Environment - Rules

wumpus_world_agents's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent