Giter Site home page Giter Site logo

multi-agent-market-rl's People

Contributors

besuter avatar jan-engelmann avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

multi-agent-market-rl's Issues

Rework Tests

Write new tests compatible with the current environment structure.

Rethink reward for buyers

Currently, the reward for buyers is maximal if they do not buy anything at all...

Example:
s_act tensor([[28., 14.]])
b_act tensor([[30., 13.]])
s_deal tensor([ 0., 22.])
b_deal tensor([22., 0.]) --> Buyer nr.2 does not make a deal
s_rew tensor([-10., 11.])
b_rew tensor([ 8., 29.]) --> Buyer nr.2 achieves the highest reward of this round... :/

Quick fix:

  • b_rew = b_reservation - b_deals --> max for b_deals == zeros()
  • Add a bonus reward if buyer managed to buy something.
  • We want max reward for buyer if he achieves to make a deal without spending a lot of his budget.
  • Maybe write a reward plugin allowing for different approaches when calculating the buyer reward

Coherent Market Engine initialization

We need to decide how we want to initialize the Market Engine.
Do we want n_sellers and n_buyers to be an integer representation of the number of sellers and buyers?
Or do we want n_sellers and n_buyers to represent a list of the corresponding agent ids?

Currently MarketMatchHiLo(...) overwrites n_sellers and n_buyers with lists of the corresponding agent ids. This breaks the assert functions in info_setting.py where n_sellers and n_buyers is expected to be an integer.
If we need the agent ids, we should separately generate n_sellers and n_buyers via len(seller_ids)....

Disregard done agents from weight update

We should not use 'mock' decisions from done agents to update their weights.

Currently, the loss is computed over a batch of previous actions for all agents in one go.
We have Q_value_targets and Q_values with shape (batch_size, n_agents)

--> Loss = (Q_value_targets - Q_values)**2 averaged over batch_size --> shape of loss is (n_agents,)

But a sampled batch will contain environment states, where a subgroup of the agents were done. This means that the corresponding Q-values of done agents do not carry any meaning and should not be part of the loss.
Due to the necessity of a fixed dimension, we can't simply remove these Q-values from the loss calculation.

Quick fix ideas:

  • Mask the action of the done agents to the action corresponding to 'no action'. This will set the target Q-value to the Q-value corresponding to 'no action', teaching the agent to no longer participate after having achieved a deal.
  • Compute loss for every agent individually and only consider Q-values where the agent was not yet done (random sampling can result in having only game states where a given agent is done... Also, probably much slower.)

Gradient of Loss is NONE

STATUS:
Training Loop runs without errors. But agent weights are not being updated due to a NONE gradient coming from the Loss.

TODO:

Go through all steps linked to the generation of the loss (agent_actions, deals, loss_calc) and make sure that no inplace operations are breaking the Dynamic Computational Graph.
A gradient equaling NONE is a strong indication, that the Dynamic Computation Graph was broken.
Linked to this, rethink the way we compute deals (A lot of inplace operations are used... possible that they are the root cause of the error.) --> MarketMatchHiLo() in markets.py

Tianshou ReplayBuffer not compatible with ndim reward tensor

ReplayBuffer expects the reward to be a one element tensor such that it can be converted into a python scalar. Not possible with our ndim tensors holding all rewards across all agents.

Solutions:

  • Look at Tianshou source code and adapt implementation resolving the problem
  • Implement own ReplayBuffer with collections.deque or something similar

Reservation Prices

Implement a correct way of initializing the reservation prices of sellers and buyers (should be an easy fix)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.