Had a discussion with @ylfdq1118 today. Here is the conclusion we have reached: <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

In previous discussion with <a class="user-mention notranslate" data-hovercard-type="u

[DISCUSSION] [RFC] Multi-edge support about dgl HOT 11 CLOSED

dmlc commented on August 15, 2024

[DISCUSSION] [RFC] Multi-edge support

from dgl.

Comments (11)

lingfanyu commented on August 15, 2024

For RGCN, I think the weight W should not be in the message signature, and message function should directly reference it (for example, using closure).

Here is what the code might look like:

class MessageFunc(nn.Module):
    def __init__(self):
        self.w = torch.Tensor(num_relations, in_feat, out_feat)

    def forward(self, src, edge):
        return torch.bmm(src['h'], self.w[edge['type']])

Then the propagation looks like:
g.update_all(msg_func, fn.sum(), update_func)

And we need to lower the spmv executor from update_all, send_and_recv api to recv api so that all builtin reduce can be done as a spmv from edge space to node space. And if message func is also builtin, we can provide a shortcut to rewrite as a spmv from node space to node space. This is related to executor redesign @jermainewang and I previously discussed.

from dgl.

zheng-da commented on August 15, 2024

@BarclayII @ylfdq1118 Thanks for the proposal. I agree that it's better to use incident lists to support multi-graphs.

Each edge has a unique ID. It shouldn't be a big problem for the API of DGLGraph, such as get_e_repr and set_e_repr.

I wonder if it's too expensive to return lists of lists for get_eids, batch_in_edges and batch_out_edges. What is the problem of returning them as a single list?

How do you want to use the filtering function with other APIs? One possible way is to pass them to the APIs, such as update_all, to remove some nodes and edges on the fly during the computation. The question is whether this is necessary given the subgraph API? One advantage may be that filtering at the runtime may give more information to users to decide when to sample an edge or a vertex.

from dgl.

BarclayII commented on August 15, 2024

@BarclayII @ylfdq1118 Thanks for the proposal. I agree that it's better to use incident lists to support multi-graphs.

Each edge has a unique ID. It shouldn't be a big problem for the API of DGLGraph, such as get_e_repr and set_e_repr.

The current semantics of get_e_repr and set_e_repr takes in pairs of source/target vertices. Under this semantics, if the number of edges between those pairs are different, then the number of rows of the edge tensor will be different than the number of source/target pairs. Users will have to align the rows to which pair of nodes in this case, and we believe that doing so would be quite a hassle. They may as well use get/set_e_repr_by_id in this case. So we believe that adapting the current get/set_e_repr to multigraph is subtle and/or redundant.

I wonder if it's too expensive to return lists of lists for get_eids, batch_in_edges and batch_out_edges. What is the problem of returning them as a single list?

My mistake. I checked the implementation of batched inbound/outbound edge queries and there should be no problem, since we are returning a list of (source, target, edge ID) tuples. I removed that proposal.

That being said, do we need to support batched queries of predecessors/successors? I feel that this should be a nice and convenient addition, but I'm unsure whether we need it right now or in the near future.

How do you want to use the filtering function with other APIs? One possible way is to pass them to the APIs, such as update_all, to remove some nodes and edges on the fly during the computation. The question is whether this is necessary given the subgraph API? One advantage may be that filtering at the runtime may give more information to users to decide when to sample an edge or a vertex.

The point of having the filtering functions is to support selecting a subset of nodes/edges using predicates. For example, one might want to "sample all the nodes with the maximum hidden state component greater than 1" (as in my second example). The resulting node subset can as well change dynamically.

That being said, I believe that users can do the same thing manually (again in my second example). But personally I would prefer our framework to somehow support querying by predicates conveniently, by returning the selected node/edge IDs, subgraphing, or any other way.

from dgl.

jermainewang commented on August 15, 2024

Could some one briefly describes what is an "incident list"? What is the difference with the current adjacency list here?

from dgl.

zheng-da commented on August 15, 2024

My understanding is that "incident list" is more or less the same as "adjacency list". The only difference is that it allows duplicated edges. I simply use their term. @BarclayII @ylfdq1118 is my understanding correct?

from dgl.

BarclayII commented on August 15, 2024

Technically "adjacency list" is about "which nodes are the direct successors to the given node", while "incidence list" is about "which edges are outbound from the given node". So the C++ storage itself also contains everything we need for incidence lists. This is good news since we don't need to change the storage (sorry I missed that), and we only need to tweak our code to allow duplicate succ's.

from dgl.

BarclayII commented on August 15, 2024

If we don't have any further problems then I assume that I'll go ahead and implement these changes?

from dgl.

lingfanyu commented on August 15, 2024

In previous discussion with @jermainewang , we think the query API is running at relatively high level (meaning in python code) because query API involves frame storage. But since GraphIdx is decoupled from storage, I am not sure about the exact structure we are going to use in GraphIdx for efficient edge type querying (maybe just one extra layer of indirection). Also, we have concerns about whether MultiGraphIdx should be a separate C++ class.

I think we need to be crystal clear about the design since GraphIdx affects a lot of stuff. So just hold on for a while until I get time to work on this. Also, we need to discuss more and push a detailed design doc on GraphIdx for others to comment on.

from dgl.

BarclayII commented on August 15, 2024

Re query API: I'm thinking of an extra layer of indirection too. Basically right now the frame storage only allows indexing by column. We can additionally have a "row indexer" for row slicing, following the same design as pandas.DataFrame. Then we only need to pass in the row-sliced "subframe" into the predicate.

Re MultiGraphIdx: I'm not sure what's the concern of having GraphIdx handling both simple- and multi-graph. In my mind GraphIdx can be mostly kept as it is right now (except edge_id and edge_ids). But I do agree that we need to inspect it in more detail.

from dgl.

jermainewang commented on August 15, 2024

I see. I think the multi-edge design should be at the same level as C++ GraphIndex. That's why I'm really surprised when I heard from Lingfan that your guy's design has something with the frame storage. Currently, Lingfan and I are too busy (will finally be released after tomorrow). But I think there are several things worth thinking:

Read through the current GraphIndex class and play with it. You need to understand how it is used in the python side. The python binding is already complete so you can actually run it in python.
In the high level, start from the application/example, think about how the python side API will look like for users.
From the high level APIs, summarize a narrow low-level interface that needs to be implemented in C++.
Given the low-level APIs, how does the other existing packages implement them? (e.g. BGL, igraph, SGraph, graph-tool, etc.)
From then, we can try to formalize the design and implement it.

from dgl.

BarclayII commented on August 15, 2024

A cleaned up proposal is available at
https://docs.google.com/document/d/12gSTTMUNGVt-ZJkj4VCxUUOYVdSh07tUHKcM3_NU7-c/edit?usp=sharing

Closing this issue for now.

from dgl.

[DISCUSSION] [RFC] Multi-edge support about dgl HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent