Giter Site home page Giter Site logo

Comments (5)

YuriCat avatar YuriCat commented on May 18, 2024

Thank you for trying new application!
I believe RL agents can play such games well, while some or more implementation and much computation resources would be necessary, and I don't have any idea whether it can outperform other approaches.

There seem to be two problem settings for this environment:

  • two player game
  • many players' game

and you prefer the second one, don't you?

In the second setting, you can compute model for each each component (ship?).
Maybe some more works are needed in generation.py (and train.py?) for this setting.

In first problem setting, one of the easiest approaches is that computing all-components' policy with 1 CNN.
The input shape is (H x W x features) and the output shape is (H x W x actions), then we can decide actions for all ships.
Needless to say, we essentially should decide action sequentially ... but it's computationally heavy.
Therefore, I think computing CNN only one time or several times (we can paint board with checkered pattern with several colors and then compute actions by color) is (wrong but) realistic way.

from handyrl.

Jogima-cyber avatar Jogima-cyber commented on May 18, 2024

Thank you for your quick answer, there is indeed two problem settings and I was referring to the first one, two player game, to begin with. When talking of sequentially decision of action what do you mean ?

from handyrl.

YuriCat avatar YuriCat commented on May 18, 2024

I might have given you a misunderstanding.
Problem settings are

  • a game between two players
  • a game among ships

and I thought that your considering setting is the latter one, since you used the word "multi agent". Is this right?

In the latter case, one available approach is considering this game as (maximum) 21 x 21 players (ships) game.
I think we can already handle this setting in current HandyRL.

In the previous case, the following is my idea.

When the board size is 4 x 4, first we paint the board as:

0101
2323
1010
3232

then, first decide actions of ships painted as "0".
Next, decide actions of ships painted as "1".
Next, "2".
Finally, "3".

As the result, we can decide actions of all ships with computing CNN 4 times.

from handyrl.

Jogima-cyber avatar Jogima-cyber commented on May 18, 2024

Actually I don't mind using one or the other setting, I would just like to "make it work". But the only way I see handling setting one (two players) is with a centralized network outputting the actions for all their agents. The problem is that I don't know of any way of taking several actions from the output of the net. Models like IMPALA (and actually every model I know) are made to take one action per policy (or q-value) head net.
That's why I think handling the problem as a MARL problem is better. This means considering as many agents as there actually are and calling the net for each of them to determine their next action.

What you're meaning with your painted board is that we have one net outputting actions for each kind of ships (own ships, own shipyards, opponent ships, opponent shipyards) ? So 4 agents running ?

I also have a question regarding the net you're using for Hungry Geese :

def forward(self, x, _=None):
        h = F.relu_(self.conv0(x))
        for block in self.blocks:
            h = F.relu_(h + block(h))
        h_head = (h * x[:,:1]).view(h.size(0), h.size(1), -1).sum(-1)
        h_avg = h.view(h.size(0), h.size(1), -1).mean(-1)
        p = self.head_p(h_head)
        v = torch.tanh(self.head_v(torch.cat([h_head, h_avg], 1)))

        return {'policy': p, 'value': v}

Why do you this operation :

h * x[:,:1]

Which is element-wise product of h of size (Batch_size,32,11,7) and x[:,0] of size (Batch_size, 11, 7) all along the channel axis and which represents the head position of the concerned geese. Do you have an intuitive explanation of the idea behind doing that ?

from handyrl.

YuriCat avatar YuriCat commented on May 18, 2024

This operation is gathering features on the position of the head of a goose.

In Hungry Geese, generally, the state around the head of a goose is the most important for selecting action and alive-or-dead detection, and the farther away a pixel is from the head, the less important the state is.

But for value estimation, however, global information which includes the length of each goose is also important. That's why head features and averaged features are concatenated before the last layer of the value estimation.

(I also posted this explanation on the thread of our code in Kaggle.)

Selecting actions of several components itself is not difficult.
You can output tensor whose shape is (batch_size, n_components, n_actions) and compute softmax on its last axis, then decide actions for each component.

However, this procedure is problematic when we want them to cooperate with each other.
So, I thought of a way to avoid deciding the actions of components with small Manhattan distances at the same time.

from handyrl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.