jan-engelmann / multi-agent-market-rl Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 9.01 MB

Creating a multi agent reinforcement learning environment for two sided auction markets.

Python 100.00%

multi-agent-market-rl's People

Contributors

Stargazers

Watchers

multi-agent-market-rl's Issues

Rework Tests

Write new tests compatible with the current environment structure.

Rethink reward for buyers

Currently, the reward for buyers is maximal if they do not buy anything at all...

Example:
s_act tensor([[28., 14.]])
b_act tensor([[30., 13.]])
s_deal tensor([ 0., 22.])
b_deal tensor([22., 0.]) --> Buyer nr.2 does not make a deal
s_rew tensor([-10., 11.])
b_rew tensor([ 8., 29.]) --> Buyer nr.2 achieves the highest reward of this round... :/

Quick fix:

b_rew = b_reservation - b_deals --> max for b_deals == zeros()
Add a bonus reward if buyer managed to buy something.
We want max reward for buyer if he achieves to make a deal without spending a lot of his budget.
Maybe write a reward plugin allowing for different approaches when calculating the buyer reward

Coherent Market Engine initialization

We need to decide how we want to initialize the Market Engine.
Do we want n_sellers and n_buyers to be an integer representation of the number of sellers and buyers?
Or do we want n_sellers and n_buyers to represent a list of the corresponding agent ids?

Currently MarketMatchHiLo(...) overwrites n_sellers and n_buyers with lists of the corresponding agent ids. This breaks the assert functions in info_setting.py where n_sellers and n_buyers is expected to be an integer.
If we need the agent ids, we should separately generate n_sellers and n_buyers via len(seller_ids)....

Disregard done agents from weight update

We should not use 'mock' decisions from done agents to update their weights.

Currently, the loss is computed over a batch of previous actions for all agents in one go.
We have Q_value_targets and Q_values with shape (batch_size, n_agents)

--> Loss = (Q_value_targets - Q_values)**2 averaged over batch_size --> shape of loss is (n_agents,)

But a sampled batch will contain environment states, where a subgroup of the agents were done. This means that the corresponding Q-values of done agents do not carry any meaning and should not be part of the loss.
Due to the necessity of a fixed dimension, we can't simply remove these Q-values from the loss calculation.

Quick fix ideas:

Mask the action of the done agents to the action corresponding to 'no action'. This will set the target Q-value to the Q-value corresponding to 'no action', teaching the agent to no longer participate after having achieved a deal.
Compute loss for every agent individually and only consider Q-values where the agent was not yet done (random sampling can result in having only game states where a given agent is done... Also, probably much slower.)

Gradient of Loss is NONE

STATUS:
Training Loop runs without errors. But agent weights are not being updated due to a NONE gradient coming from the Loss.

TODO:

Go through all steps linked to the generation of the loss (agent_actions, deals, loss_calc) and make sure that no inplace operations are breaking the Dynamic Computational Graph.
A gradient equaling NONE is a strong indication, that the Dynamic Computation Graph was broken.
Linked to this, rethink the way we compute deals (A lot of inplace operations are used... possible that they are the root cause of the error.) --> MarketMatchHiLo() in markets.py

Resolve environment dimension (batch size)

I think every optimizer should only take part in one environment. In this case the current environment implementation makes no sense. Fix this.

Tianshou ReplayBuffer not compatible with ndim reward tensor

ReplayBuffer expects the reward to be a one element tensor such that it can be converted into a python scalar. Not possible with our ndim tensors holding all rewards across all agents.

Solutions:

Look at Tianshou source code and adapt implementation resolving the problem
Implement own ReplayBuffer with collections.deque or something similar

Reservation Prices

Implement a correct way of initializing the reservation prices of sellers and buyers (should be an easy fix)

jan-engelmann / multi-agent-market-rl Goto Github PK

multi-agent-market-rl's People

Contributors

Stargazers

Watchers

multi-agent-market-rl's Issues

Rework Tests

Rethink reward for buyers

Coherent Market Engine initialization

Disregard done agents from weight update

Quick fix ideas:

Gradient of Loss is NONE

Resolve environment dimension (batch size)

Tianshou ReplayBuffer not compatible with ndim reward tensor

Reservation Prices

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent