nlnog / nlnog-ring Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 25.0 477 KB

NLNOG Server Ring Project

Home Page: http://ring.nlnog.net/

Python 65.00% Ruby 17.58% Shell 8.66% Perl 8.77%

nlnog-ring's People

Contributors

Stargazers

Watchers

nlnog-ring's Issues

Automatically deactivate unreachable nodes

11:03 @cmouse some kind of automatic disable for nodes that are persistently gone would be nice
11:03 @cmouse so that they would be removed from 'ring-all' etc.
11:03 @cmouse until fixed
11:04 @Teun yeah, they slow down many ring tools
11:10 @cmouse prhaps if node is unreachable for more than 2 days it would be automatically removed
11:10 @cmouse and notified
11:10 @cmouse would get reinserted if it stays reachable for 24 hours
13:39 < sid3windr> does this include revoking ssh keys after 7d unreach? ;>

ring pastebin

make 'pastebinit' work with a ring pastebin

this pastebin only accessible from ring ip's
good promotion for the ring
html viewer option (ring-all output is in markdown notation)

ring-scripts need manpages

Redesign sshkey control

It's too much work to maintain all sshkeys for all participants, participants should be able to add and remove their sshkeys via webinterface or email interface or something else.

in -a mode show if IPv4 or IPv6 is used

apply MPLS and ASN lookup patch to mtr

two nice patches which could be useful for the mtr on the nodes:
http://www.roderick.triple-it.nl/blog/2010/12/29/mtr-with-aslookup/ (ASN lookups)
http://prolixium.com/files/code/mtr-patches/ (MPLS labels)

Supress failure messages

suppress the ssh login failure messages to stdout while the ring-trace in running, as it gives the impression that the application is failing when it is actually busy finishing its routine.

Reported by Ad Trouwborst [email protected]

ring-trace breaks when whois.cymru.com is down

$ ring-trace -n 2 ring.nlnog.net
ring-trace v1.8.1 - written by Teun Vink [email protected]
picked 2 hosts at random: voxel02 claranet06
Performing ICMP traceroutes towards ring.nlnog.net from 2 ring hosts, ssh-timeout is 10 seconds.
Failed to lookup ASNs at cymru.com.
Traceback (most recent call last):
File "/usr/local/bin/ring-trace", line 1453, in
ns.show_country, ns.remove_broken, ns.highlight_ixp, ns.remove_duplicate, ns.transparent)
File "/usr/local/bin/ring-trace", line 825, in graph
"\n%s" % tracedata[ips[index]]['fqdn'] if (index == 0 or resolve) else "",
KeyError: 'fqdn'
$

outbound mode for ring-trace

It would be fantastic if some sort of 'outbound' mode is added to ring-trace, which would mean that ring-trace does an mtr on localhost towards all other RING nodes and makes a map from that. This way you can easily create a forward and a reverse ring-trace to assess the overall topology in both directions.

ring-trace crash

[11:02:10] host jump01 done in 8.1 seconds.
[11:02:10] host tenet01 done in 4.9 seconds.
[11:02:12] host yourorg01 done in 5.6 seconds.
[11:02:13] host voxel02 done in 6.6 seconds.
ssh: connect to host apnic01.ring.nlnog.net port 22: Connection timed out
[11:02:14] host apnic01 done in 13.5 seconds.
ssh: connect to host globalaxs02.ring.nlnog.net port 22: Connection timed out
[11:02:15] host globalaxs02 done in 13.5 seconds.
[11:02:16] host voxel01 done in 9.3 seconds.
ssh: connect to host ic-hosting01.ring.nlnog.net port 22: Connection timed out
[11:02:16] host ic-hosting01 done in 14.1 seconds.
ssh: connect to host nedzone01.ring.nlnog.net port 22: Connection timed out
[11:02:16] host nedzone01 done in 13.2 seconds.
[11:02:16] host gossamerthreads01 done in 14.6 seconds.
[11:02:16] host seeweb01 done in 12.1 seconds.
ssh: connect to host leaseweb01.ring.nlnog.net port 22: Connection timed out
[11:02:23] host leaseweb01 done in 20.3 seconds.
ssh: connect to host signet01.ring.nlnog.net port 22: Connection timed out
[11:02:25] host signet01 done in 20.4 seconds.
ssh: connect to host surfnet01.ring.nlnog.net port 22: Connection timed out
[11:02:26] host surfnet01 done in 20.3 seconds.
[11:02:26] analysing traces.
[11:02:27] Failed to parse ASN lookup line: Error: no ASN or IP match on line 633.
[11:02:27] Looking up DNS entries.
[11:02:38] 679 DNS lookups done.
[11:02:38] Checking IXP lists.
[11:02:50] generating color table.
[11:02:50] generating graphs.
Traceback (most recent call last):
File "/usr/local/bin/ring-trace", line 1284, in
ns.show_country, ns.remove_broken, ns.highlight_ixp, ns.remove_duplicate)
File "/usr/local/bin/ring-trace", line 745, in graph
asns.insert(0, a[0])
IndexError: list index out of range
job@master01:~$

occaid01 is causing this crash, most probably due to the lack of IPv4.

ring-scripts should be packaged for easier deployment inside and outside of ring

ring-ping (irc): hide errors

  job:  !ring-ping -6 2a02:ce0:9::e2f8:47ff:fe13:3836
nlnog:  2a02:ce0:9::e2f8:47ff:fe13:3836: 99 servers: 181ms average
nlnog:  2a02:ce0:9::e2f8:47ff:fe13:3836: unreachable from: boxed-it01
nlnog:  ssh connection failed: apnic01

If one node out of ~100 fails, I don't think that's very interesting to know for each ping.

Proposal:
if less than x% (x=5?) of nodes are not responding (ssh connection failed etc.), just don't mention it at all, or perhaps just report a count on one of the first two lines, thus reducing noise.

arrange firewalling for IPMI interfaces of master servers

need sum firewalling

Crash when -a -B encounters incomplete trace

A broken path causes a crash when using -a and -B together:

pels@fizzix:~/Desktop$ ring-trace -R -6 -a -X -t png -n 0 -i as250net01 -B public01.infra.ring.nlnog.net
ring-trace v1.7.1 - written by Teun Vink [email protected]
Including 1 hosts: as250net01
Performing ICMP traceroutes towards public01.infra.ring.nlnog.net from 1 ring hosts, ssh-timeout is 10 seconds.
Traceback (most recent call last):
File "/home/pels/bin/ring-trace", line 1320, in
ns.show_country, ns.remove_broken, ns.highlight_ixp, ns.remove_duplicate, ns.transparent)
File "/home/pels/bin/ring-trace", line 765, in graph
asns.insert(0, a[0])
IndexError: list index out of range

This is triggered by the unfinished traceroute:

artin@as250net01:~$ mtr --report www.ams-ix.net
HOST: as250net01 Loss% Snt Last Avg Best Wrst StDev
1.|-- 2001:4ce8:fbb0::f128 0.0% 11 0.1 0.2 0.1 1.6 0.5
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0

When using only -a or only -B no crash occurs.

do DNS lookups parallel to reduce runtime

waiting for the DNS resolver to timeout is often the longest wait, doing those lookups in parallel will be faster.

ignore IPv6-only hosts when doing a ring-ping to an IPv4-only destination

ignore IPv6-only hosts when doing a ring-ping to an IPv4-only destination so the results won't get messed up.

remove specific IP addresses from trace

Feature request: remove specific addreses from ring traces, since sometimes routers with multiple interfaces mess up traces.

DNS LOC entries in ring.nlnog.net

It could be good to have DNS LOC entries, with GPS position of nodes directly in DNS, to be able to use this information.

An example of such thing is on http://hewgill.com/tools/dnsloc (try with nautile01.ring.nlnog.net).

setup AMP software on the ring

this stuff is mega cool - http://erg.wand.net.nz/amp/matrix.php/ipv4/latency/NZ/

timestamp ringtraces

Put a timestamp in the ringtrace comment.

create routeviews.ring.nlnog.net bgp looking glass

would be nice if participants can (voluntary) setup ebgp multi-hop session to this bird instance and announce all they have in ipv4 and ipv6 table.

ring users could ssh to this machine and will drop straight into looking glass software

permit running of http daemon on node

[11:39] lochii:#ring wanted to know if possible to run any form of httpd on our ring nodes, serving nothing (just replying 200)
[11:39] lochii:#ring and by run, I mean puppeted
[11:39] lochii:#ring don't care what it is, as long as it can answer 200
[11:40] lochii:#ring reason being, want to integrate it with some external monitoring which is only able to do HTTP health checks
[11:46] lochii:#ring anything 2xx is fine
[12:36] job:#ring lochii: not a bad idea. Please email ring-admins as a reminder
[12:37] job:#ring Or open an issue on github
[12:38] lochii:#ring ack, thanks
[12:41] job:#ring And we can use those pages to promote the ring

add an additional first hop in -a mode

Add an additional - hop in -a mode so it's clear multiple boxes from one org are in the same ASN.

ring-trace -t txt -r does not do resolving

DNS resolving (-r) doesn't work for text output (-t txt):

tdc@tdc01:~$ ring-trace -4 -t txt -n 1 -r www.cisco.com
ring-trace v1.8.1 - written by Teun Vink <[email protected]>
picked 1 host at random: claranet05
Performing ICMP traceroutes towards www.cisco.com from 1 ring hosts, ssh-timeout is 10 seconds.
traceroutes to 23.78.47.242 generated by ring-trace 1.8.1 at 2017-05-30 19:12:43

Node: claranet05
----------------
1.|-- 62.240.228.2               0.0%     1    0.3   0.3   0.3   0.3   0.0
2.|-- 212.43.193.114             0.0%     1    4.9   4.9   4.9   4.9   0.0
3.|-- 62.240.250.213             0.0%     1    1.3   1.3   1.3   1.3   0.0
4.|-- 213.242.120.69             0.0%     1    1.9   1.9   1.9   1.9   0.0
5.|-- 4.69.140.30                0.0%     1    1.5   1.5   1.5   1.5   0.0
6.|-- 141.136.103.181            0.0%     1    1.3   1.3   1.3   1.3   0.0
7.|-- 89.149.181.146             0.0%     1    1.3   1.3   1.3   1.3   0.0
8.|-- 46.33.95.178               0.0%     1    1.6   1.6   1.6   1.6   0.0
9.|-- 78.152.47.136              0.0%     1   26.1  26.1  26.1  26.1   0.0
10.|-- 78.152.57.10               0.0%     1   29.1  29.1  29.1  29.1   0.0
11.|-- 23.78.47.242               0.0%     1   25.9  25.9  25.9  25.9   0.0

Filter/highlight IXP hops

Add a flag to filter IXP hops from graphs to simplify graphs.
Also, hilighting IXP hops when not removed can be useful.

Needed: a list of IXP addresses, Rodecker might be able to provide this.

good tty logging

All shell command's and as much as possible should be logged (over encrypted connection) to one or two masterservers and be stored for future reference in case of abuse.

automatically add owner user to admin group on own machine

so we don't manually have to do 'sudo adduser $participant admin' on freshly provisioned machines

ring-trace needs better SSH output handling

ring-trace is quite ignorant of the result of the ring-trace command. This can result in problems, e.g. when a new host key is offered and user input is needed, but also when the SSH connection fails.

Better handling of the actual SSH command should fix this.

Fix -i flag behaviour

It is not possible to trace only from a specific set of hosts. -n needs to be at least 1 at the moment:

$ ring-trace -i zylon01 -i interconnect01 -i solido01 -i prolocation01 -i totaalnet01 -i cyso01 -i msp01 128.0.0.1
ring-trace v0.9.9 - written by Teun Vink [email protected]
Including 7 hosts: zylon01 interconnect01 solido01 prolocation01 totaalnet01 cyso01 msp01
Performing ICMP traceroutes towards 128.0.0.1 from 49 ring hosts.

$ ring-trace -n 0 -i zylon01 -i interconnect01 -i solido01 -i prolocation01 -i totaalnet01 -i cyso01 -i msp01 128.0.0.1
ring-trace v0.9.9 - written by Teun Vink [email protected]
Including 7 hosts: zylon01 interconnect01 solido01 prolocation01 totaalnet01 cyso01 msp01
Performing ICMP traceroutes towards 128.0.0.1 from 49 ring hosts.

$ ring-trace -n 1 -i zylon01 -i interconnect01 -i solido01 -i prolocation01 -i totaalnet01 -i cyso01 -i msp01 128.0.0.1
ring-trace v0.9.9 - written by Teun Vink [email protected]
picked 1 host at random: portlane01
Including 7 hosts: zylon01 interconnect01 solido01 prolocation01 totaalnet01 cyso01 msp01
Performing ICMP traceroutes towards 128.0.0.1 from 8 ring hosts.
Created trace-128.0.0.1.jpg in 6.6 seconds.

whitelist all ring-ips in fail2ban

arrange with puppet facts

modify traceroute so all features can be used without root cap.

"traceroute -I " has been requested multiple times by now :-)

Pick a set of nodes based on traceroute characteristics

Feature request: make traces from all nodes, but only graph the top/bottom X hosts based on hop count, latency and possibly other characteristics of the trace.

publish ring puppet repo on github

Give outsiders more access to how we manage the ring

Also master01 should pull from this repo after some checking has been done

broken hops are not graphed correctly

All broken hops (resulting in * * *) are graphed as one single node in the graph, though it might actually represent a large number of different hops.

Ignoring these hops results into improper graphs, so the best solution would be to create seperate nodes (and edges) for these hops.

ring-trace - combine command status with mtr output

We've been tracing a reachability issue trying to detect which ASN is performing DPI on incoming DNS packets towards one of our root servers and blocking some of them.

Ultimately we used ping-trace to build a view of the topology facing that server, and then had to manually run tests on the chosen nodes to find out which ones had the query blocked, and from that determine which common ASN was responsible for that blocking.

It would be really useful if ping-trace could run a secondary command alongside the mtr tracer, and then indicate in the 0th hop output what the return status of that command was. This would allow us at a single glance to identify which group of topologically connected nodes are causing an issue.

ring-trace crashes from time to time

Here are two identical runs of ring-trace, where the first one crashes and the second one went OK:

tore@stat:~$ /stat/local/nlnog-ring/scripts/ring-trace -ab4X -n 20 dns.i.bitbit.net
ring-trace v1.6.1 - written by Teun Vink <[email protected]>
picked 20 hosts at random: speedpartner01 rrbone01 rootlu01 dcsone01 nexellent01 nxs01 strato01 enestdata01 hosteam01 melbourne01 softlayer06 nuqe01 bluezonejordan01 ebayclassifiedsgroup01 sixdegrees01 onet02 poznan01 dyn01 softlayer01 openminds01
Performing ICMP traceroutes towards dns.i.bitbit.net from 20 ring hosts, ssh-timeout is 10 seconds.
ssh: connect to host softlayer06.ring.nlnog.net port 22: Connection refused
Host softlayer06 returned no trace.
Traceback (most recent call last):
  File "/stat/local/nlnog-ring/scripts/ring-trace", line 1293, in <module>
    ns.show_country, ns.remove_broken, ns.highlight_ixp, ns.remove_duplicate)
  File "/stat/local/nlnog-ring/scripts/ring-trace", line 737, in graph
    a = [(tracedata[ip].get("asn", "unknown"),tracedata[ip].get("ix", False),tracedata[ip].get("desc", "unknown"), "") for ip in ips if not (tracedata[ip].get("ix", False) and no_ixp)]
KeyError: ''
tore@stat:~$ /stat/local/nlnog-ring/scripts/ring-trace -ab4X -n 20 dns.i.bitbit.net
ring-trace v1.6.1 - written by Teun Vink <[email protected]>
picked 20 hosts at random: rackfish01 is01 ripe01 hostway01 rbnetwork02 afilias01 claranet06 signet01 softlayer04 dyn01 westnet01 atrato03 tetaneutral01 direcpath01 hurricane01 ualbany01 isc01 cambrium01 siminn01 ispservices01
Performing ICMP traceroutes towards dns.i.bitbit.net from 20 ring hosts, ssh-timeout is 10 seconds.

Image uploaded to https://ring.nlnog.net/paste/p/1v4d2w4tvrm2a4c4
Done in 26.4 seconds.

Transparent background for ring-trace images

For use in powerpoint

support .ring-tracerc for defaults

A .ring-tracerc file would be useful, so we can set default arguments. This might require a rewrite of commandline arguments so each option can be disabled as well as enabled.

modify ping so it can do all features without root capabilities

show only the AS-path in ring-trace

Feature request: show only the AS-path trace instead of the actual IP's. This would result into simpler graphs.

ring-change-sshkeys

Some kind of wrapper that can manage the sshkeys for users would be nice. I imagine the following:

user types ring-change-sshkeys on a ring-node, and it will open up the preferred editor and it should like like typing 'crontab -e'.

The first few lines of the authorized_keys file that is displayed should be some comments which contain warnings and what the file means.

The user can then edit the file to it's liking and when quiting the editor the resulting data should be validated:

empty file is error
invalid keys error
etc

If an error is found the user should be presented with a choice: discard all changes or open the editor again.

If the file is valid, it can be emailed to ring-admins@ in puppetized format and incorporated in the repos. In the future we can automate this if we have reason to trust the system.

sign ring.nlnog.net
work with the nlnog.net hostmaster to get our parent signed

make provisioning of machines less work

We require a db backend for this, habbie has already written some python to help in that space

Current steps are

check machine cpu (64 bit?)
check ubuntu distribution
check v6 connectivity and configuration (should be static)
add CNAME in DNS on job's private server
add machine to /etc/puppet/files/etc/hosts so master can create smokeping configs and not error
add ring repo and apt-get update
puppetd --test
apt-get update
puppetd --test a few times
add machine to ring.nlnog.net txt record
add owner to admin group
add news item on ring.nlnog.net
notify participant they can subscribe to mailman
manually type email to participant to welcome them
manually type email to ring-users@ to announce the new node
notify teun a new node has arrived and he should add it to map

this should be easier...

hostname
owner
owner noc email address
location
ASN

nlnog / nlnog-ring Goto Github PK

nlnog-ring's People

Contributors

Stargazers

Watchers

Forkers

nlnog-ring's Issues

Recommend Projects

Recommend Topics

Recommend Org