In general, gonum packages that are computationally intensive (e.g. mat64, blas, stat,

graph: Graph.NodeList should not demand that returned []Node be newly allocated about graph HOT 19 CLOSED

kortschak commented on August 15, 2024

graph: Graph.NodeList should not demand that returned []Node be newly allocated

from graph.

Comments (19)

kortschak commented on August 15, 2024

An alternative to returning a mutable []Node would be to return a new type NodeIterator:

type NodeIterator interface {
    Len() int
    Next() bool
    Node() Node
}

This would be used like so:

it := g.NodeList()
for it.Next() {
    n := it.Node()
}

The Len method allows an easy way to query graph order.

A similar mechanism could be used for the Neighbor and Successor and Predecessor methods.

from graph.

UserAB1236872 commented on August 15, 2024

I prefer the iterator, but I think it may be nice to have:

type NodeIterator interface {
Remaining() (low, hi int, inf bool)
//...
}

Where low is a lower bound (e.g. "at least one more node"), hi is an upper
bound (e.g. "at most 4 more nodes"), and inf is true if there is no (known)
upper bound. If inf is false and low==hi, we can assume there are exactly
that many nodes.

For a graph that could conceivably implement FiniteGraph, hi==low and inf
== false, but this allows us to use the same interface for infinite, or
finite-but-unknown-cardinality graphs as well.

On Thu, Apr 30, 2015 at 4:18 PM, Dan Kortschak [email protected]
wrote:

An alternative to returning a mutable []Node would be to return a new type
NodeIterator:

type NodeIterator interface {
Len() int
Next() bool
Node() Node
}

This would be used like so:

it := g.NodeList()
for it.Next() {
n := it.Node()
}

The Len method allows an easy way to query graph order.

A similar mechanism could be used for the Neighbor and Successor and
Predecessor methods.

—
Reply to this email directly or view it on GitHub
#46 (comment).

from graph.

kortschak commented on August 15, 2024

The Len method could be optional. This gives a simple check for whether the series is finite or not. I would prefer this to a multi-return.

I'm not wedded to the iterator over a filled slice or an exposed internal slice and I thin the choice needs to come with some evidence from performance numbers.

Any particular reason you prefer the iterator approach - I expect it to be slower than a filled/exposed slice. This is pretty important to me.

A possible median would be to define a struct { Nodes []Node } type that satisfies NodeIterator that can be switched on to allow the fast path, and have perfoamce critical graph types do both by return that type.

from graph.

kortschak commented on August 15, 2024

A reasonable interim approach is to change the wording to: "The caller may arbitrarily change elements in the returned slice, but those changes may be reflected to other callers."

This allows the fast path and indicates that currently the safe way to return a []Node is as a copy.

In the future, if we return a node iterator interface we could do something like this:

// Nodes is a Node iterator.
type Nodes interface {
    // Next advances the iterator and returns whether
    // the next call to Node will return a non-nil Node.
    Next() bool

    // Node returns the current Node from the iterator.
    Node() Node
}

type NodeSlicer() interface {
    // NodeSlice returns the set of nodes to be iterated
    // by a Nodes iterator.
    // The holder of the iterator may arbitrarily
    // change elements in the returned slice, but
    // those changes may be reflected to other
    // iterators.
    NodeSlice() []Node
}

// nodeIterator implements the Nodes and NodeSlicer interfaces.
type nodeIterator struct {
    i     int
    nodes []Node
}

func (n *nodeIterator) Next() bool {
    n.i++
    return n.i < len(n.nodes)
}

func (n *nodeIterator) Node() Node {
    if n.i < len(n.nodes) {
        return n.nodes[n.i]
    }
    return nil
}

func (n *nodeIterator) NodeSlice() []Node {
    return n.nodes
}

This means that if a client does it := g.Nodes() they off the bat get the capacity to do this:

for it.Next() {
    doStuffWith(it.Node())
}

but they can also:

switch it := it.(type) {
case NodeSlicer:
    for _, n := range it.NodeSlice() {
        doStuffWith(n)
    }
default:
    // as above
}

If a graph implementation does not want clients altering the underlying slice (or that slice does not exist because the graph is implicit), then the implementor simply omits the NodeSlice method.

from graph.

UserAB1236872 commented on August 15, 2024

I think if we're doing the iterator approach, we should probably do it like the iterator-ish things in the standard library (which admittedly are mostly on bytes and strings); Reader and such. I think NodeSlice should act more like returning the remaining values in the iterator.

This should be fairly trivial to implement. If there's an underlying slice, you can use downslice using the iterator's internal i. If you're generating nodes until some stopping point you just for !finalState and append until you have the slice.

from graph.

UserAB1236872 commented on August 15, 2024

Though on second thought, I'm not sure how an iterator is going to work for some algorithms. Kruskal's, for instance, needs an edge list sorted by weight. You can't really get around that with an EdgeIterator unless you start trying to reproduce Functional-like iterator adapters in Go which would likely be a nightmare.

from graph.

kortschak commented on August 15, 2024

No, that still works, it just changes where the slice gets created. So, it the graph is not happy with the edge list being modified that it would return to the client, then it gives the iterator as a pure iterator and then the []Edge is created by Kruskal so that it can be sorted. If the graph is happy with the edges being modified, then it bundles it up in an EdgeSlicer and the slice can be used directly. We could conceivably write a helper func that takes an iterator and returns the slice, either by populating the slice or just returning bundled slice. We use this kind of approach to hiding type details in cayley.

from graph.

kortschak commented on August 15, 2024

That's a minor implementation detail, but sure, that sounds like a perfectly reasonable approach.

On 02/05/2015, at 10:15 PM, "Jeff Juozapaitis" [email protected] wrote:

I think NodeSlice should act more like returning the remaining values in the iterator.

from graph.

leoneu commented on August 15, 2024

I like the two interface solution (Nodes and NodeSlicer). For the iterator, we may want to add an error. The backend will have I/O errors so we want to be able to handle the error immediately. I found these guidelines for Google APIs. I think we can use the same pattern.

it := Iterator()
for {
    var node Node
    node, err := it.Next()
    if err == iterator.Done {
        break
    }
    if err != nil {
        // handle the error.
    }
    // use the node
}

Should we reopen this issue and start experimenting?

from graph.

kortschak commented on August 15, 2024

This issue is still open. Maybe open a new issue in gonum/gonum and point to here.

from graph.

kortschak commented on August 15, 2024

The subtleties as I saw them were threefold:

Performance - an iterator is slower than a []T.
Code duplication - to avoid performance issue, a []T and an iterator need to be handled by all graph analysis functions.
Feasibility - some functions require the entire set to be loaded for the analysis to be feasible, with arbitrarily large (up to the range of int64) this is not possible.

With the addition of errorability we have a new problem. Part of the motivation behind adding the graphql package and changing from int to int64 IDs was to allow at least subsets of massive graphs to be easily accessed. I'd really like to be able to resolve this issue, but the reason it's more than two years old is that I haven't yet seen a good solution. If you can come up with one that doesn't have a disproportionate impact on other functionality of the graph packages, that would be really good.

from graph.

leoneu commented on August 15, 2024

Not sure I understand why we use int64 instead of uint64 for the node id.

I ran a benchmark and don't think this will be an issue in any real application. Access to a node is going to be a small fraction of the compute time.
Actually, I would drop the slice and use iterator instead. The speed difference will be minimal, it will be dominated by other operations. In my benchmark, just doing one operation on the node brings the speed difference to 5x. We need to profile before making decisions.
For example? Can't imagine a case that cannot be implemented using an iterator.

In summary, I really don't see why using an iterator is a problem, any data system would use an iterator to scale. For example, a database API would not return a slice of all objects for a query. Not making the change makes the project unusable for large graphs so I think it is justified to trade off some simplicity and performance for future usability.

from graph.

kortschak commented on August 15, 2024

The use of int64 over uint64 allows easy use of sign in graph algorithms. This has significant benefits. I guess you could argue that for those that they can type convert, but then the same argument holds the other way.

I'd need to see that in the context of a real application, but note that From and To would also result in interators, much of the work is following paths, which hits these a lot.
See 1.
Have a look at most of the functions in network and many of the functions in path.

I though of a partial alternative; in many cases a complete list is not necessary, particularly when the graphs are large/indeterminate. All you need is a start node or set of start nodes.

The issue of errorability is still a significant concern.

I'm very happy for you to got ahead with experiments though, making sure the suite passes in all packages is not going to be a small task.

from graph.

leoneu commented on August 15, 2024

What about returning a slice for From and To? I would expect large graphs in real applications to have max degree much less than the size of the graph.

from graph.

kortschak commented on August 15, 2024

Yes, that would work.

from graph.

kortschak commented on August 15, 2024

Although it's not clear to me how that would work with I/O failability.

from graph.

leoneu commented on August 15, 2024

Good point.

from graph.

kortschak commented on August 15, 2024

This is why I focused on the graphql parsing. Under that model, you do some preliminary graph analysis using the graphdb to get a manageable graph segment that you can then do finer detail on.

from graph.

kortschak commented on August 15, 2024

One thing to consider to ease the change is to provide a helper in graph that converts a graph.Nodes, potentially without an allocation of a []graph.Node, to a []graph.Node. As an optimisation, a graph.Nodes that knows its length can report this and NodesOfwill make use of the information.

func NodesOf(it Nodes) []Node {
    var n []graph.Node
    switch it := it.(type) {
    case NodeSlicer:
        return []graph.Node(n) 
    case Lener:
        n = make([]graph.Node, 0, it.Len())
    default:
        for it.Next() {
                n = append(n, it.Node())
        }
        return nodes
    }
}

In the context of use, this would allow expression of, e.g. in network.PageRank:

        nodes := graph.NodesOf(g.Nodes())
        indexOf := make(map[int64]int, len(nodes))
        for i, n := range nodes {
                indexOf[n.ID()] = i
        }

which would make the transition much easier. Removal of the []graph.Node slice can then happen on a case-by-case basis as needed/allowed.

If it's deemed useful, the optional Len() int method on the iterator could be added as a requirement.

from graph.

graph: Graph.NodeList should not demand that returned []Node be newly allocated about graph HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent