Hello there ! I've done a little bit of work and research on the kaggle Halite IV

Going further into deep MARL with halite and HandyRL about handyrl HOT 5 CLOSED

dena commented on May 18, 2024 1

Going further into deep MARL with halite and HandyRL

from handyrl.

Comments (5)

YuriCat commented on May 18, 2024

Thank you for trying new application!
I believe RL agents can play such games well, while some or more implementation and much computation resources would be necessary, and I don't have any idea whether it can outperform other approaches.

There seem to be two problem settings for this environment:

two player game
many players' game

and you prefer the second one, don't you?

In the second setting, you can compute model for each each component (ship?).
Maybe some more works are needed in generation.py (and train.py?) for this setting.

In first problem setting, one of the easiest approaches is that computing all-components' policy with 1 CNN.
The input shape is (H x W x features) and the output shape is (H x W x actions), then we can decide actions for all ships.
Needless to say, we essentially should decide action sequentially ... but it's computationally heavy.
Therefore, I think computing CNN only one time or several times (we can paint board with checkered pattern with several colors and then compute actions by color) is (wrong but) realistic way.

from handyrl.

Jogima-cyber commented on May 18, 2024

Thank you for your quick answer, there is indeed two problem settings and I was referring to the first one, two player game, to begin with. When talking of sequentially decision of action what do you mean ?

from handyrl.

YuriCat commented on May 18, 2024

I might have given you a misunderstanding.
Problem settings are

a game between two players
a game among ships

and I thought that your considering setting is the latter one, since you used the word "multi agent". Is this right?

In the latter case, one available approach is considering this game as (maximum) 21 x 21 players (ships) game.
I think we can already handle this setting in current HandyRL.

In the previous case, the following is my idea.

When the board size is 4 x 4, first we paint the board as:

0101
2323
1010
3232

then, first decide actions of ships painted as "0".
Next, decide actions of ships painted as "1".
Next, "2".
Finally, "3".

As the result, we can decide actions of all ships with computing CNN 4 times.

from handyrl.

Jogima-cyber commented on May 18, 2024

Actually I don't mind using one or the other setting, I would just like to "make it work". But the only way I see handling setting one (two players) is with a centralized network outputting the actions for all their agents. The problem is that I don't know of any way of taking several actions from the output of the net. Models like IMPALA (and actually every model I know) are made to take one action per policy (or q-value) head net.
That's why I think handling the problem as a MARL problem is better. This means considering as many agents as there actually are and calling the net for each of them to determine their next action.

What you're meaning with your painted board is that we have one net outputting actions for each kind of ships (own ships, own shipyards, opponent ships, opponent shipyards) ? So 4 agents running ?

I also have a question regarding the net you're using for Hungry Geese :

def forward(self, x, _=None):
        h = F.relu_(self.conv0(x))
        for block in self.blocks:
            h = F.relu_(h + block(h))
        h_head = (h * x[:,:1]).view(h.size(0), h.size(1), -1).sum(-1)
        h_avg = h.view(h.size(0), h.size(1), -1).mean(-1)
        p = self.head_p(h_head)
        v = torch.tanh(self.head_v(torch.cat([h_head, h_avg], 1)))

        return {'policy': p, 'value': v}

Why do you this operation :

h * x[:,:1]

Which is element-wise product of h of size (Batch_size,32,11,7) and x[:,0] of size (Batch_size, 11, 7) all along the channel axis and which represents the head position of the concerned geese. Do you have an intuitive explanation of the idea behind doing that ?

from handyrl.

YuriCat commented on May 18, 2024

This operation is gathering features on the position of the head of a goose.

In Hungry Geese, generally, the state around the head of a goose is the most important for selecting action and alive-or-dead detection, and the farther away a pixel is from the head, the less important the state is.

But for value estimation, however, global information which includes the length of each goose is also important. That's why head features and averaged features are concatenated before the last layer of the value estimation.

(I also posted this explanation on the thread of our code in Kaggle.)

Selecting actions of several components itself is not difficult.
You can output tensor whose shape is (batch_size, n_components, n_actions) and compute softmax on its last axis, then decide actions for each component.

However, this procedure is problematic when we want them to cooperate with each other.
So, I thought of a way to avoid deciding the actions of components with small Manhattan distances at the same time.

from handyrl.

Going further into deep MARL with halite and HandyRL about handyrl HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent