jan-engelmann / multi-agent-market-rl Goto Github PK
View Code? Open in Web Editor NEWCreating a multi agent reinforcement learning environment for two sided auction markets.
Creating a multi agent reinforcement learning environment for two sided auction markets.
Write new tests compatible with the current environment structure.
Currently, the reward for buyers is maximal if they do not buy anything at all...
Example:
s_act tensor([[28., 14.]])
b_act tensor([[30., 13.]])
s_deal tensor([ 0., 22.])
b_deal tensor([22., 0.]) --> Buyer nr.2 does not make a deal
s_rew tensor([-10., 11.])
b_rew tensor([ 8., 29.]) --> Buyer nr.2 achieves the highest reward of this round... :/
Quick fix:
We need to decide how we want to initialize the Market Engine.
Do we want n_sellers and n_buyers to be an integer representation of the number of sellers and buyers?
Or do we want n_sellers and n_buyers to represent a list of the corresponding agent ids?
Currently MarketMatchHiLo(...) overwrites n_sellers and n_buyers with lists of the corresponding agent ids. This breaks the assert functions in info_setting.py where n_sellers and n_buyers is expected to be an integer.
If we need the agent ids, we should separately generate n_sellers and n_buyers via len(seller_ids)....
We should not use 'mock' decisions from done agents to update their weights.
Currently, the loss is computed over a batch of previous actions for all agents in one go.
We have Q_value_targets and Q_values with shape (batch_size, n_agents)
--> Loss = (Q_value_targets - Q_values)**2 averaged over batch_size --> shape of loss is (n_agents,)
But a sampled batch will contain environment states, where a subgroup of the agents were done. This means that the corresponding Q-values of done agents do not carry any meaning and should not be part of the loss.
Due to the necessity of a fixed dimension, we can't simply remove these Q-values from the loss calculation.
STATUS:
Training Loop runs without errors. But agent weights are not being updated due to a NONE gradient coming from the Loss.
TODO:
Go through all steps linked to the generation of the loss (agent_actions, deals, loss_calc) and make sure that no inplace operations are breaking the Dynamic Computational Graph.
A gradient equaling NONE is a strong indication, that the Dynamic Computation Graph was broken.
Linked to this, rethink the way we compute deals (A lot of inplace operations are used... possible that they are the root cause of the error.) --> MarketMatchHiLo() in markets.py
I think every optimizer should only take part in one environment. In this case the current environment implementation makes no sense. Fix this.
ReplayBuffer expects the reward to be a one element tensor such that it can be converted into a python scalar. Not possible with our ndim tensors holding all rewards across all agents.
Solutions:
Implement a correct way of initializing the reservation prices of sellers and buyers (should be an easy fix)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.