yggdrasil-network / yggdrasil-go Goto Github PK
View Code? Open in Web Editor NEWAn experiment in scalable routing as an encrypted IPv6 overlay network
Home Page: https://yggdrasil-network.github.io
License: Other
An experiment in scalable routing as an encrypted IPv6 overlay network
Home Page: https://yggdrasil-network.github.io
License: Other
Right now, when using -autoconf
on BSD, the configuration defaults to using IfName
as /dev/tap0
, which obviously fails when /dev/tap0
is in use by some other process.
In this case, it makes sense that when IfName
is auto
, it should recurse through the TAP devices on the system to find a free one. This would allow -autoconf
to start properly.
Once done we can default IfName
to auto
on BSD.
If there's an error in the config file, such as a field where the value with an incorrect type is used, this cause an error to be returned by the parser. Currently, no attempt is made to check the error and deal with it gracefully, we just panic(err)
.
There may be some cases when we can recover (if it's an obvious mistake) or continue running with a sane default (and warn the user instead of crashing).
Create symlink from Dockerfile to contrib/docker/Dockerfile
The DHT bootstraps by periodically adding peers (one-hop neighbors) to the DHT, if the peer is not already present.
The DHT has a fixed bucket size, and removes any old nodes if a new node is added to an already full bucket (dropping the new node instead of the old one completely breaks DHT bootstrapping unless dedicated bootstrap servers are used, and nodes bootstrap off the same set of bootstrap servers, so this is a complete non-option IMO).
There's probably a bad interaction here, where a node with many peers can push one of their peers out of their DHT without ever pinging them, due to the DHT size limits. If two nodes with a lot of peers both push eachother out, then they'll never talk to eachother over the DHT. If those nodes are both gateways into otherwise-separate networks, then it would cause the DHT to diverge into a broken state, I think, in which nodes from one network can't find nodes from the other. I haven't seen this problem occur, but it should happen if the periodic peer re-inserts are timed just right with the periodic DHT maintenance traffic.
I think one possible fix here is for the DHT to check if a node is a peer, and not count peers towards the maximum buckets size (or evict them if the bucket is full). In which case, peers would only be removed if they fail to respond to too many pings--it probably makes sense for the peer struct to send the DHT a message that a timeout occurred at the peer layer, so the DHT can immediately remove that peer without wasting pings on it. There may also be some way to fix it by throttling how often peers are added to the DHT, and using something like a round robin strategy, but I'm not entirely convinced that it would fix the problem in all cases.
General issue to monitor bandwidth usage and figure out how to reduce it.
Mobile nodes, or anything battery-powered (e.g. sensor networks) need to keep idle bandwidth usage as low as reasonably possible.
The main consumers of bandwidth for idle nodes are:
We should find ways of keeping all of these to a minimum. A few thoughts:
For 1, we can probably throttle back idle traffic on idle stable links, so we'd tend to only send updates when the tree updates the root timestamp. We could do this by progressively increasing the time between announcements (by 1 second per announcement, until some max, maybe 30 seconds or so). We would need to reset the time if the coords change, and/or require acknowledgements of some kind when we start relaying traffic that isn't part of the spanning tree (to make sure we're not sending data or DHT traffic into the void).
For 2, I think the DHT's correctness should be insensitive to timing, so we can probably throttle things back if the network is stable (a node would reset if its coords change, maybe?). We should wait and see what affect recent patches have had before we do anything here. Ideally, a node that isn't trying to send any traffic would participate in the bare minimum amount of DHT activity needed to keep the rest of the network running.
For 3, we need to be careful about how we change things, to make sure we don't accidentally cause blackholes. Ideally, the less active a node is, the fewer other nodes should have that node in its bucket (so it would mainly need to deal with traffic related to its keyspace neighbors), so this may interact with 2 in weird ways. It would also be good if the DHT was caching--if lots of nodes are looking up X, then X should appear on lots of DHT nodes (the ones that looked up X, and maybe the ones that were involved in looking up X). I'm not sure how well caching works in practice for DHTs, so that needs some research before deciding whether or not to try it.
All of these things have tradeoffs between reaction/convergence time and idle bandwidth usage. The current implementation mostly uses hard coded and arbitrary timeouts, so there's hopefully some room for improvement here.
When using -autoconf
, or alternatively when the Listen
address is configured with a :0
port number, the TCP and UDP sockets come up with different port numbers.
We can either write manual logic to randomly select the port number for both protocols (and extra logic to re-bind to a new random port if one or other protocol has that port number already in use), or we can leave the behaviour as-is and let the operating system assign unique ports for each protocol as it does today.
If the latter, both the TCP and the UDP listen addresses/ports should be written out to the console when Yggdrasil starts (whereas right now only the TCP port is).
Created in response to two TODO
s in switch.go
.
Instead of checking distance for every destination every time, maintain an array of structs, indexed by the first coordinate that differs from our own. Each struct has stored the best port to forward to, and a next coord map. Move to struct, then iterate over coord maps until you hit a dead end. The last port before the dead end should be the closest.
Sending build context to Docker daemon 1.857MB
Step 1/11 : FROM docker.io/golang:alpine as builder
alpine: Pulling from library/golang
4fe2ade4980c: Already exists
2e793f0ebe8a: Pull complete
77995fba1918: Pull complete
6b343150750a: Pull complete
517b41dec3aa: Pull complete
Digest: sha256:c2c6c46c11319fd458a42aa3fc3b45e16bacb49e3f33f1e2a783f0122a9d8471
Status: Downloaded newer image for golang:alpine
---> c283ac5a8f78
Step 2/11 : COPY . /src
---> cf9ea15c0cb2
Step 3/11 : WORKDIR /src
---> Running in dc146e239edb
Removing intermediate container dc146e239edb
---> dc31857e35f0
Step 4/11 : RUN apk add git && ./build
---> Running in 233c86601e75
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
(1/6) Installing nghttp2-libs (1.32.0-r0)
(2/6) Installing libssh2 (1.8.0-r3)
(3/6) Installing libcurl (7.61.1-r1)
(4/6) Installing expat (2.2.5-r0)
(5/6) Installing pcre2 (10.31-r0)
(6/6) Installing git (2.18.1-r0)
Executing busybox-1.28.4-r1.trigger
OK: 19 MiB in 20 packages
Building: yggdrasil
Fetching https://golang.org/x/crypto?go-get=1
Fetching https://golang.org/x/net?go-get=1
Fetching https://golang.org/x/text?go-get=1
Fetching https://golang.org/x/sys?go-get=1
go: finding github.com/kardianos/minwinsvc v0.0.0-20151122163309-cad6b2b879b0
go: finding github.com/neilalexander/hjson-go v0.0.0-20180509131856-23267a251165
go: finding github.com/docker/libcontainer v2.2.1+incompatible
go: finding github.com/mitchellh/mapstructure v1.1.2
go: finding github.com/songgao/packets v0.0.0-20160404182456-549a10cd4091
go: finding github.com/yggdrasil-network/water v0.0.0-20180615095340-f732c88f34ae
Parsing meta tags from https://golang.org/x/sys?go-get=1 (status code 200)
get "golang.org/x/sys": found meta tag get.metaImport{Prefix:"golang.org/x/sys", VCS:"git", RepoRoot:"https://go.googlesource.com/sys"} at https://golang.org/x/sys?go-get=1
Parsing meta tags from https://golang.org/x/crypto?go-get=1 (status code 200)
Parsing meta tags from https://golang.org/x/net?go-get=1 (status code 200)
get "golang.org/x/crypto": found meta tag get.metaImport{Prefix:"golang.org/x/crypto", VCS:"git", RepoRoot:"https://go.googlesource.com/crypto"} at https://golang.org/x/crypto?go-get=1
get "golang.org/x/net": found meta tag get.metaImport{Prefix:"golang.org/x/net", VCS:"git", RepoRoot:"https://go.googlesource.com/net"} at https://golang.org/x/net?go-get=1
go: finding golang.org/x/sys v0.0.0-20181206074257-70b957f3b65e
go: finding golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9
go: finding golang.org/x/net v0.0.0-20181207154023-610586996380
Parsing meta tags from https://golang.org/x/text?go-get=1 (status code 200)
get "golang.org/x/text": found meta tag get.metaImport{Prefix:"golang.org/x/text", VCS:"git", RepoRoot:"https://go.googlesource.com/text"} at https://golang.org/x/text?go-get=1
go: finding golang.org/x/text v0.3.0
go: downloading github.com/mitchellh/mapstructure v1.1.2
go: downloading golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9
go: downloading github.com/songgao/packets v0.0.0-20160404182456-549a10cd4091
go: downloading github.com/kardianos/minwinsvc v0.0.0-20151122163309-cad6b2b879b0
go: downloading golang.org/x/text v0.3.0
go: downloading golang.org/x/net v0.0.0-20181207154023-610586996380
go: downloading golang.org/x/sys v0.0.0-20181206074257-70b957f3b65e
go: downloading github.com/yggdrasil-network/water v0.0.0-20180615095340-f732c88f34ae
go: downloading github.com/docker/libcontainer v2.2.1+incompatible
go: downloading github.com/neilalexander/hjson-go v0.0.0-20180509131856-23267a251165
github.com/kardianos/minwinsvc
golang_org/x/net/dns/dnsmessage
github.com/yggdrasil-network/yggdrasil-go/src/config
github.com/neilalexander/hjson-go
github.com/yggdrasil-network/yggdrasil-go/src/defaults
github.com/yggdrasil-network/water
golang.org/x/crypto/ed25519/internal/edwards25519
golang.org/x/crypto/curve25519
golang.org/x/crypto/internal/subtle
golang.org/x/crypto/poly1305
net
golang.org/x/crypto/ed25519
# net
exec: "gcc": executable file not found in $PATH
golang.org/x/crypto/salsa20/salsa
golang.org/x/net/internal/iana
golang.org/x/net/bpf
golang.org/x/text/encoding/internal/identifier
golang.org/x/text/transform
golang.org/x/crypto/nacl/secretbox
golang.org/x/text/internal/utf8internal
golang.org/x/crypto/nacl/box
golang.org/x/text/encoding
golang.org/x/text/runes
golang.org/x/text/encoding/internal
golang.org/x/text/encoding/unicode
Building: yggdrasilctl
net
# net
exec: "gcc": executable file not found in $PATH
Removing intermediate container 233c86601e75
---> ee4a9e848da0
Step 5/11 : FROM docker.io/alpine
---> 11cd0b38bc3c
Step 6/11 : LABEL maintainer="Christer Waren/CWINFO <[email protected]>"
---> Running in b81b1895d3e4
Removing intermediate container b81b1895d3e4
---> 98d0d93df1d1
Step 7/11 : COPY --from=builder /src/yggdrasil /usr/bin/yggdrasil
COPY failed: stat /var/lib/docker/overlay2/77c4f922b88880a5e7bc391f1b4827f93ba3c9a1c1476e5c6af7c1ed8a053441/merged/src/yggdrasil: no such file or directory
Currently there is nothing to stop some nodes joining the network, peering with nodes that have unacceptable latency and significantly impacting the latency of traffic flowing through these routes in the process. This issue is largely just to track this problem, and to keep a note of some possible potential fixes for the future.
Some possible ideas:
At the moment, when node coordinate selection takes place, the link uptime/stability is the primary metric involved. This keeps the topology of the network relatively stable even if not all peerings are solid. However, there is nothing to account for the actual latency of each peering, so there is no guarantee that your chosen coordinates will be aligned to your physically closest node.
In the event that multiple peerings are considered "stable", we should aim to align node coordinates to the node with the lowest latency. This at least will encourage the worst-path route to follow the path of shortest latency. Some work will need to be done to identify how much weight wants to be given to latency vs stability if the link is not 100% stable.
Also cautious that we don't generate too much extra traffic as a result (#65).
Currently blocked by:
It might be possible to share latency information through backpressure notifications, if that happens as a part of #111.
Currently blocked by:
LatencyThreshold
option for peeringsA non-default user-configurable option, in ms, to define the maximum allowed latency on an incoming peering. This might be particularly useful for public nodes to prevent peerings being accepted from nodes that are far away. The ping RTT time must fall beneath the LatencyThreshold
value or the peering will be rejected/dropped by the TCP machinery before the peering negotiates successfully.
Currently blocked by:
Created in response to a FIXME
in dht.go
.
DHT allocates a bunch, sorts, and keeps the part it likes. It would be better to only track the part it likes to begin with.
This and #41 may be mutually exclusive. This is definitely mutually exclusive with source routing, so we'd be breaking from the original intent of testing greedy routing as possible pathfinding strategy for cjdns.
As of v0.2, we currently used a sort of local-only backpressure and LIFO queues to route around congestion in some cases, but we still require the greedy routing distance criteria to be met (to prevent routing loops). LIFO backpressure routing has some very interesting properties. If we added queue size announcements to the protocol, we could take advantage of the distance metric as a baseline pressure gradient to get around many of the normal issues (long delay, queues for every node) that prevent backpressure from being used in general purpose packet switch networks.
This would require reworking the lookup logic quite a bit, but there's several ways to do that which I think would work, and I'm not sure which one I'd want to go with. I'm also not sure when it would be appropriate to send queue size updates, or what format they should take (queue per destination, or do we cluster regions of the tree together under one queue?).
Specifying peer like "dyn.example.net" doesn't even get attempted according to logs.
We need to update some things in TCP to make the admin socket's removePeer function work.
While we're at it, we should make sure that addPeer
and removePeer
are both safe in case a bad string is used (they currently crash if a string shorter than 4 bytes is sent, or respond as if they worked--without doing anything--if a string without tcp:
or udp:
is used). There may be other edge cases where problems can occur.
OpenBSD does not have the ability to disable the 4-byte protocol information (PI) header from each packet to/from the TUN adapter.
This causes Yggdrasil to crash due to a write
error on the TUN, as we do not currently account for this anywhere.
TAP mode works fine on OpenBSD as the PI header is not included there.
Created in response to a TODO
in wire.go
.
Currently the wire protocol is sensitive to order, which means that most changes to the wire protocol would be breaking changes.
Some possibilities here:
Some of the configuration options in nodeconfig may not be very obvious and could do with some review.
Some obvious candidates for change:
BoxPub
and BoxPriv
SigPub
and SigPriv
LinkLocal
The main readme is in pretty good shape, but the rest of the documentation is in a sorry state.
Suggestions:
hjson
format for the config file.└┌(#:~)┌- yggdrasilctl getself -v
2018/12/17 10:57:12 Found platform default config file /etc/yggdrasil.conf
2018/12/17 10:57:12 Using endpoint tcp://localhost:9001 from AdminListen
2018/12/17 10:57:12 Connecting to TCP socket localhost:9001
2018/12/17 10:57:12 Connected
2018/12/17 10:57:12 Sending request: getself
2018/12/17 10:57:12 Fatal error: runtime error: index out of range
I am getting this on Sedric and Relpda:
{ "name": "y.relpda.mikaela.info",
"type": "server",
"location": "i-83.net, Gravelines, France",
"peering": "Ask me, AllowedEncryptionPublicKeys is set",
"contact": {
"name": "Mikaela Suomalainen",
"xmpp": "[email protected]",
"public keys": "https://mikaela.info/keys/"}
}
{ "name": "y.sedric.mikaela.info",
"type": "laptop",
"contact": {
"name": "Mikaela Suomalainen",
"xmpp": "[email protected]",
"public keys": "https://mikaela.info/keys/"}
}
Right now the admin socket listens on localhost:9001
by default. This can also be changed in config.
It would be worth checking if UNIX domain sockets are a more sane option on some platforms.
Right now icmpv6.go
has 0xFD
hard-coded and does not use the global address_prefix
variable as defined in address.go
.
That should be fixed so that an address space change occurs in one place in the codebase only.
After following the instructions for install from internet repository, package download fails with checksum match error.
Curremtly those lines only include the key. I wonder if it would be a good idea to begin the public key with something like pub:
(I am taking inspiration from GPG) so it would become pub:TheLongKeyStringHere
so there would be less room for user error while sharing the EncryptionPublicKey.
My usecase is that I am keeping list of the keys I care about in a public git repository here and I find it scary to wonder if I made a mistake after all and copied the private key regardless of all checking as git diff
won't assure me that all lines begin with pub:
or otherwise make it clear which is which.
I guess that if this gets implemented, it would probably be necessary to support strings without it and wonder if the config sanitation could handle this update also?
I haven't ipv6 address from provider. I haven't any ipv6 address. But, if I launch yggdrassil, it's properly connected to the nodes and I can ping into yggdrasil network. But, I can't ping any clearnet ipv4 addresses. What I do wrong? How can I fix it?
Currently peering connections are not authenticated in any way - if you know the address and port number of an Yggdrasil node, you can connect to it.
It would be good to allow optional whitelisting, perhaps by allowing specific public keys or by shared passwords, for non-multicast-configured peerings.
add a public node specific option to limit peering to prevent over saturation of that node.
The public node could when struggling have the chance to manually set a limit to prevent new peering to preserve stability of the network.
maybe this could be set as:
peer-limit = 40
if there is already more than the limit, the newest could be disconnected.
When running multiple instances in same machine, admin port should not be static. It would be nice increase by one when port is reserved instead of panic 😄. Then should automatically change new free port to configuration. Could be temporarily fixed by editing config manually when first started.
2018/12/05 21:00:52 Starting up...
2018/12/05 21:00:52 Found 5 multicast interface(s)
2018/12/05 21:00:52 Starting switch
2018/12/05 21:00:52 Starting router
2018/12/05 21:00:52 Multicast discovery is enabled
2018/12/05 21:00:52 Failed to start multicast interface
2018/12/05 21:00:52 An error occurred during startup
panic: listen udp6 [::]:9001: bind: address already in use
goroutine 1 [running]:
main.main()
/src/yggdrasil.go:227 +0x15bc
Currently, the only tree information we exchange with a peer (one-hop neighbors) is to receive that peer's signed locator info and to send them our own.
The very first simulation version included exchanging info about other nodes we know about (other peers, peers of peers, etc) if our stretch to them via the tree would have been higher than some threshold relative to the path being advertised by a peer. This proved to be wasteful in practice on the kinds of networks I'm concerned with (even if you only exchanged info about, say, 10 extra nodes, then that's extra info eats up 10x the bandwidth of the current updates, not accounting for the fact that the size of each non-peer update would probably be a little larger on average than a peer update).
However, there may be some peers with which you don't care about the cost of exchanging these extra updates. If you run a network of nodes, you may be fine with exchanging extra information between those nodes to make sure that every node knows about the peers of each node acting as a gateway into other networks.
If you simply say which nodes are gateway nodes, then that works to adjacent networks, but not through adjacent networks, so it recovers something similar to the peering agreements we see between networks running BGP. I don't think we want to be limited to only that, so we would probably want to do something more elaborate. We'd want some way of deciding that certain peers should be given certain additional info (or telling certain peers that we're interested in said info). In any case, we'd probably want to keep the default behavior to be exchanging nothing extra, like we do currently, since that's much lower cost and still seems to work pretty well (based on simulation).
It would be good to figure out if/how this could be done while keeping the cost reasonable, and if we decide this is something we want, then try to lay any necessary ground work for the changes into the protocol updates for the next release.
Note that there may be incentive to not do this, even if it's easy/cheap to do. If it only improves performance in e.g. Autonomous System networks, then that encourages more centralization around a relatively small number of large AS networks that only link together at designated gateway nodes. That seems like it would revert back to what we have now, where there's BGP as an EGP and anything else as an IGP--which is probably fine from a performance standpoint, but the whole point of this project is to not do that and see how far we can get. If it's doable in a way that works for individual nodes / isn't biased towards something like the AS model, then I'm a lot more inclined to take the approach.
Right now there is no strict formatting requirement on either inputs or outputs to/from the admin socket. This is not ideal for interfacing with the admin socket programmatically.
Investigate whether there is a more appropriate format to use - ideally one that remains as easy for a human to interact with/to understand as it is to be parsed by an external program:
Currently the FreeBSD and OpenBSD code are making calls to ifconfig
to set the interface address.
It should be possible to adapt the same SIOCAIFADDR_IN6
code from tun_darwin.go
to set the interface addresses instead without relying on ifconfig
.
Using latest darwin binary from here:
https://circleci.com/api/v1.1/project/github/yggdrasil-network/yggdrasil-go/latest/artifacts
binary link: https://156-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.2.22-darwin-amd64
~/Downloads
directory$ sudo ./yggdrasil -useconf < yggdrasil.conf
2018/06/15 20:14:04 Starting up...
2018/06/15 20:14:04 Found 10 multicast interface(s)
2018/06/15 20:14:04 Starting router
2018/06/15 20:14:04 Multicast discovery is enabled
2018/06/15 20:14:04 Listening for TCP on: [::]:36845
2018/06/15 20:14:04 Interface name: utun3
2018/06/15 20:14:04 Interface IPv6: 203:5af4:178a:4da5:dbd:5380:32e3:47ed/7
2018/06/15 20:14:04 Interface MTU: 65535
2018/06/15 20:14:04 Startup complete
2018/06/15 20:14:04 Your IPv6 address is 203:5af4:178a:4da5:dbd:5380:32e3:47ed
2018/06/15 20:14:04 Your IPv6 subnet is 303:5af4:178a:4da5::/64
2018/06/15 20:14:04 Admin socket listening on 127.0.0.1:9001
$ ping6 203:5af4:178a:4da5:dbd:5380:32e3:47ed
PING6(56=40+8+8 bytes) 203:5af4:178a:4da5:dbd:5380:32e3:47ed --> 203:5af4:178a:4da5:dbd:5380:32e3:47ed
^C
--- 203:5af4:178a:4da5:dbd:5380:32e3:47ed ping6 statistics ---
34 packets transmitted, 0 packets received, 100.0% packet loss
$ ping6 203:5af4:178a:4da5:dbd:5380:32e3:47ed
PING6(56=40+8+8 bytes) 203:5af4:178a:4da5:dbd:5380:32e3:47ed --> 203:5af4:178a:4da5:dbd:5380:32e3:47ed
^C
--- 203:5af4:178a:4da5:dbd:5380:32e3:47ed ping6 statistics ---
8 packets transmitted, 0 packets received, 100.0% packet loss
$ ping6 203:5af4:178a:4da5:dbd:5380:32e3:47ed
PING6(56=40+8+8 bytes) 203:5af4:178a:4da5:dbd:5380:32e3:47ed --> 203:5af4:178a:4da5:dbd:5380:32e3:47ed
Going to try doing ping to other IPv6 after testing peering.
-- Satinder
When calling tun.close()
, the tun goroutine can be left running as it's currently not signalled that the TUN/TAP adapter is going away. This hasn't usually been a problem because the only time we generally call tun.close()
is when the process is already exiting.
However, this signals another problem in that it may cause panics if the TUN/TAP adapter disappears for some other reason instead of trying to recover gracefully.
I am using 0.2.7 (yggdrasil --version
doesn't exist as a side note) on Debian Testing.
https://yggdrasil-network.github.io/admin.html has at least getSessions and seems to include a longer list than:
Usage of yggdrasilctl:
-endpoint string
Admin socket endpoint (default "tcp://localhost:9001")
-json
Output in JSON format
I also thought that I saw getPeers in the help text.
fields
addPeer [uri [interface]]
getSelf []
getSwitchPeers []
getSwitchQueues []
removePeer [port]
addAllowedEncryptionPublicKey [key]
getAllowedEncryptionPublicKeys []
dot []
getMulticastInterfaces []
getTunTap []
help <nil>
removeAllowedEncryptionPublicKey [key]
getDHT []
getPeers []
getSessions []
setTunTap [name [tap_mode] [mtu]]
This problem has been especially apparent on the latest version of Windows 10.
If you run yggdrasil.exe -genconf > Yggdrasil.conf
from a Command Prompt, the file is written and can be read by yggdrasil.exe -useconffile
and yggdrasil.exe -useconf
.
If you follow the same steps in PowerShell instead of Command Prompt, the encoding of the file is different (Unicode-perhaps?) and then Yggdrasil can no longer interpret it. (At a closer look, it seems that Yggdrasil reads every other byte of the file as 0x00
, which trips up hjson
).
Also if you edit the yggdrasil.conf
file in Notepad and then save, once again, the encoding is transformed and Yggdrasil can no longer make sense of it (same symptoms as with PowerShell).
I recently wanted to see if traffic sent over yggdrasil is actually encrypted.
To do this I pointed wireshark at the the tun0
interface.
To my surprise I the traffic which was sent over yggdrasil appeared to be unencrypted.
Am I missing something/doing something wrong?
Currently we are limited to sending only IP packets because Yggdrasil has no means to separate IP packets from anything else.
If we have a field that describes the protocol type (i.e. having a value for IP, and other arbitrary values for other things) then we can add some kind of API to transport arbitrary data through Yggdrasil in the future.
Right now, Water simply selects probably the first tap0901
adapter that it finds in the registry. This may conflict with OpenVPN or other software using the tap0901
driver at the same time.
The capability exists to generate more than one TAP adapter - we should investigate how to select a specific one, or to not conflict.
Two real nodes, A and B, are both peered with X and only X. X has a route to the root, so A and B both make X their parent.
Running iperf between A and B, they get something like 7 Mbps. If I connect them directly to eachother over GbE, the speed jumps up to ~850+ Mbps after about 2 seconds. That part works great.
Now I unplug the cable between A and B. The iperf stream's throughput drops to 0 bps until the moment the connection between A and B times out (edit: about 6 seconds), at which point it goes back to 7 Mbps. What I want to have happen is for the speed to drop down to ~7 Mbps almost instantly, without waiting for the peer to time out. This is a prerequisite for #111 I think.
Node coords shouldn't be changing (they keep the same parent, X, the whole time). What I would have expected is for the stack for link A->B to grow, and the network would switch to sending over A->X->B instead. For some reason, as soon as the write call blocks for the disconnected peer, the node stops trying to send to that peer (the node stops sending to the buffered channel that's used to feed the tcp write worker, which would normally read from the channel to build up the LIFO stack and then pop from the top of the stack when there's no more traffic on the go channel). Pings also fail to make it through, so I don't think it's any issue related to the inner iperf stream backing off or delaying retransmission attempts.
Separate some of the TUN/TAP and IP logic out of src/yggdrasil
so that we can reuse it as a library.
Perhaps move it into src/yggdrasilnode
so that yggdrasil.go
can still be a simple wrapper that does nothing but feed in config.
When attempting to run in a network namespace, I came across an unusual bug:
$ ip netns add node1
$ ip link add veth0 type veth peer name veth1
$ ip link set veth0 up
$ ip link set veth1 netns node1 up
$ ip netns exec node1 su $USER
$ ip link set lo up
$ ./yggdrasil --autoconf
2018/04/26 20:03:17 Initializing...
2018/04/26 20:03:17 Starting interface...
2018/04/26 20:03:17 Started interface
2018/04/26 20:03:17 Starting admin socket...
2018/04/26 20:03:17 Started admin socket
2018/04/26 20:03:17 Starting TUN/TAP...
2018/04/26 20:03:17 Listening for TCP on: [::]:33859
2018/04/26 20:03:17 Listening for UDP on: [::]:44995
2018/04/26 20:03:17 Admin socket listening on [::1]:9001
2018/04/26 20:03:17 Started...
2018/04/26 20:03:19 Connected: fd01:1cc6:49a1:36a2:7706:7358:eb44:ec9@fe80::a41c:8fff:fe05:6072%veth1
# Then attempt to ping the node
panic: write /dev/net/tun: input/output error
goroutine 25 [running]:
yggdrasil.(*tunDevice).write(0xc4201a62e0, 0x0, 0x0)
/home/arceliar/Misc/mesh/yggdrasil-go/src/yggdrasil/tun.go:59 +0x242
yggdrasil.(*Core).DEBUG_startTunWithMTU.func2(0xc4201a0000)
/home/arceliar/Misc/mesh/yggdrasil-go/src/yggdrasil/debug.go:241 +0x33
created by yggdrasil.(*Core).DEBUG_startTunWithMTU
/home/arceliar/Misc/mesh/yggdrasil-go/src/yggdrasil/debug.go:241 +0x146
On closer inspection, it appears that the tun device was never brought up, and the fd00::/8
address within the node1
namespace was assigned to veth1
.
I'm guessing this happened as part of the update to use github.com/docker/libcontainer/netlink
instead of calling ip
from the command line to set addresses and bring interfaces up. Ideally, netns should work, given its usefulness in the past for testing, but the bare minimum requirement is to raise an error when the tun fails to go up and avoid setting the address on the wrong interface.
A few different things we could try.
Chord-like DHTs have a nice property that kad-like DHTs are missing: your successor (and/or predecessor) is a valid next-hop for any lookup. This means we can focus on maintaining good info about a very small (potentially constant) number of non-peer nodes, and then be comparatively lazy about checking for other nodes in the network (since they're just a performance optimization). In kad, by contrast, you could have a bucket with no working nodes, and nothing forces any nodes in any other buckets into still being a valid next hop for that region of keyspace. I have chord code working already, more or less, in a branch of my repo, but it need some cleanup / testing / optimizing, assuming we decide to go that route at all.
Hypothetically, we could store some kind of connection info about public peers in the DHT. The idea is that, if a node is configured to auto-connect to public peers, and manages to connect to the network at all, then they should be able to find and connect to a couple of "useful" public peers somewhere in the network. The idea would be to help "gateway" nodes (local mesh nodes that peer to other parts of the network via another network, such as the internet) get more peers and avoid needlessly routing DHT lookups back and forth across the planet. In the short term, I'm more concerned with figuring out if this is a practical / good idea rather than deciding exactly what "useful" means, or how we store/lookup/communicate public peer info, I'm just assuming that the DHT is likely to be involved in any scale-able and decentralized approach, so I'm including it in this issue.
Currently, the DHT only stores info about nodes it needs to know about. The same is true for the chord-like DHT. Hypothetically, we could also add in info about every node we get a response from when doing a search, and/or every node we have an open session with. We wouldn't add DHT pings for these nodes, unless we decide that they're important for the DHT independent of any use in searches or sessions. This means they'd get timed out if the session ends or we stop using those nodes as parts of new searches. This is info we come across anyway, so I'm just wondering if we can find something better to do with it than nothing at all. This could hypothetically cause a popular node to become more popular, leading to an imbalance of DHT traffic, so it needs some further study before we commit to doing or not doing anything with that information.
Right now the CircleCI build pipeline numbers the build artifacts in the following format: X.Y.ZZZZ
, where X
is the major version number (right now 0), Y
is the minor version number (right now 1) and ZZZZ
is the build number, arbitrarily determined from the number of Git commits on that branch.
This was just enough to make the builds look distinct, but doesn't actually match Semantic Versioning guidelines.
Breaking protocol changes should probably be identified in the minor version number, but in that case, to strictly meet the guidelines, the build number should count the number of builds since the last minor version and not overall like it does today. (Also would need to drop the leading zeroes!)
If including a version number in the wire format, like described in #42, should this be derived from the major and minor version number?
More open-ended question: is Semantic Versioning the right approach for us, and how worried about this are we right now?
This might be handy for a server that reaches capacity, having the option to quickly add all current peers to the allowed list to prevent further peering but keep the current peers.
More than one person has mentioned to me in the last couple of days about the use of fd00::/8
address space, as this is active space for private allocations, and results in direct collisions.
I think I agree that fd00::/8
is not really a good candidate - even my own network has overlap as a result of this.
Some suggestions have come in the form of:
fc00::/8
and accept that conflicts with cjdns will arise - is it actually in the scope of our project to deliberately avoid collisions with cjdns?4000::/3
as this is space that is soft-assigned to the ITU for regional allocation, and apparently the Internet Architecture Board think this is never going to happen because their plan is insane0200::/7
as this is deprecated since 2004 pending a change to the NSAP-IPv6-whatever RFC that nobody has apparently been bothered enough to update in the last 14 yearsCurrently, the DHT does recursive searches. This consists of sending a search packet into the DHT, which forwards until it reaches a dead end, which then responds to let the origin know the search result.
Recursive searches are theoretically faster, when they work, but an intermediate node with out-of-date DHT information can cause the search packets to drop. An iterative approach can find its way around such errors.
Recursive searches were used deliberately because they are fragile, while iterative searches can find their way not only around temporary DHT problems, but also around some small DHT blackholes. If blackhole exist, then it's a design problem, so we don't want them.
I've seen no evidence of blackholes either in the wild or recent simulation tests. At some point, we should switch to doing iterative searches (reusing the existing DHT lookup packet types), possibly with parallel lookups, like a proper kademlia implementation.
Once iterative searches are in, I'm not sure if it makes sense to keep support for recursive searches or to remove them.
Sort-of related to #65 (but for usage-based resource consumption instead of idle).
Currently, if no traffic has been received for a session in at least 6 seconds, then the next time we send a packet we also start a search. This is to quickly detect if a session has dropped due to coord changes.
What we should do is switch to sending sessionPings (throttled to a max of 1/second/session). If an additional T time (3-6 seconds?) passes without receiving a sessionPong, then try a search instead. Most of the time, a lack of traffic probably just means the session was idle, so a search is overkill.
Go has nice machinery for writing unit tests. We should probably be using it. The sim can already be run manually to sort-of test some things, but it would make sense to unit test some parts (encoding/decoding packets, address<->nodeID conversion, etc). We could maybe rewrite a part of the sim to use the normal go testing framework, and then run it on a small network hard-coded network as a vertical slice test.
Affects at least OpenBSD.
On other platforms, listening on [::]:12345
seems to bind to IPv4 and IPv6 (I would have expected this to be IPv6 only, and :12345
to be dual-stack, but apparently the latter is IPv4 only). On OpenBSD, these bind to IPv6 and IPv4 respectively.
We either need to figure out how to listen on a dual stack socket, or open separate sockets for IPv4 and IPv6. The latter option may be difficult, as it raises the question of which socket to use to get the port for multicast announcements to find local peers.
In order to correctly terminate a peering through the admin API (i.e. to add a removePeer
command), a new wire packet needs to be defined to safely shut down and remove a peering from both sides.
In theory in TCP mode the connection can just be closed, but this is not possible with UDP.
Right now the IPv6 address is generated by truncating 512 bits down into less than 128 bits to fit in an IPv6 address.
This raises questions of:
/64
s which are truncated even furtherfd00::/8
(which is already designated as ULA territory, unlike fc00::/7
) is appropriate└┌(%:~)┌- yggdrasilctl getPeers --json
panic: runtime error: index out of range
goroutine 1 [running]:
main.main()
/go/src/github.com/{{ORG_NAME}}/{{REPO_NAME}}/yggdrasilctl.go:63 +0x5700
``
I am using 0.2.7 (`yggdrasil --version` doesn't exist as a side note) on Debian Testing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.