Giter Site home page Giter Site logo

Comments (70)

pmelange avatar pmelange commented on July 22, 2024

Here is some info from @Noki's router. It's an x86-64 virual machine.

Fri Aug 11 14:12:07 2023 daemon.err td-client: [b.bbb-vpn.berlin.freifunk.net:8943] Connection lost.
Fri Aug 11 14:12:07 2023 daemon.info td-client: Performing broker selection...
Fri Aug 11 14:12:07 2023 daemon.info olsrd[23577]: Removing interface bbbdigger
Fri Aug 11 14:12:07 2023 daemon.notice netifd: bbbdigger (23281): udhcpc: received SIGTERM
Fri Aug 11 14:12:07 2023 daemon.notice netifd: bbbdigger (23281): udhcpc: unicasting a release of 10.36.193.52 to 10.36.193.6
Fri Aug 11 14:12:07 2023 daemon.notice netifd: bbbdigger (23281): udhcpc: sending release
Fri Aug 11 14:12:07 2023 daemon.notice netifd: bbbdigger (23281): udhcpc: connect: Network unreachable
Fri Aug 11 14:12:07 2023 daemon.notice netifd: bbbdigger (23281): udhcpc: entering released state
Fri Aug 11 14:12:07 2023 daemon.notice netifd: bbbdigger (23281): Command failed: Permission denied
Fri Aug 11 14:12:07 2023 daemon.notice netifd: Interface 'bbbdigger' is now down
Fri Aug 11 14:12:07 2023 daemon.notice netifd: Interface 'bbbdigger' is disabled
Fri Aug 11 14:12:08 2023 daemon.debug td-client: [b.bbb-vpn.berlin.freifunk.net:8943] Broker usage: 63
Fri Aug 11 14:12:08 2023 daemon.info td-client: Selected b.bbb-vpn.berlin.freifunk.net:8943 as the best broker.
Fri Aug 11 14:12:08 2023 daemon.info olsrd: /etc/init.d/olsrd: olsrd_setup_smartgw_rules() Notice: Inserting firewall rules for SmartGateway
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: Tunnel tnl_600b1f0a removed, to -
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: Tunnel tnl_a07f1f0a added, to 10.31.127.160
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: Tunnel tnl_a07f1f0a removed, to -
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: Tunnel tnl_a0821f0a added, to 10.31.130.160
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: Tunnel tnl_a0821f0a removed, to -
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: Writing '1' (was 0) to /proc/sys/net/ipv4/conf/all/send_redirects
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: Writing '1' (was 0) to /proc/sys/net/ipv4/conf/eth2/send_redirects
Fri Aug 11 14:12:08 2023 daemon.info olsrd[23577]: olsr.org - pre-0.9.9-git_0000000-hash_91d5ae30edfe8ee23769434a3154dcd7 stopped
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: Writing '1' (was 1) to /proc/sys/net/ipv4/ip_forward
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/tunl0/rp_filter
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: Writing '0' (was 1) to /proc/sys/net/ipv4/conf/all/send_redirects
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/all/rp_filter
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: Writing '0' (was 1) to /proc/sys/net/ipv4/conf/eth2/send_redirects
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/eth2/rp_filter
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: Adding interface eth2
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: New main address: 10.31.21.185
Fri Aug 11 14:12:08 2023 daemon.info olsrd[24105]: olsr.org - pre-0.9.9-git_0000000-hash_91d5ae30edfe8ee23769434a3154dcd7 successfully started
Fri Aug 11 14:12:09 2023 daemon.info td-client: [b.bbb-vpn.berlin.freifunk.net:8943] Tunnel successfully established.
Fri Aug 11 14:12:09 2023 daemon.notice netifd: Interface 'bbbdigger' is enabled
Fri Aug 11 14:12:09 2023 daemon.notice netifd: Network device 'bbbdigger' link is up
Fri Aug 11 14:12:09 2023 daemon.notice netifd: Interface 'bbbdigger' has link connectivity
Fri Aug 11 14:12:09 2023 daemon.notice netifd: Interface 'bbbdigger' is setting up now
Fri Aug 11 14:12:09 2023 daemon.notice netifd: bbbdigger (24271): udhcpc: started, v1.33.2
Fri Aug 11 14:12:09 2023 daemon.notice netifd: bbbdigger (24271): udhcpc: sending discover
Fri Aug 11 14:12:09 2023 daemon.notice netifd: bbbdigger (24271): udhcpc: sending select for 10.36.193.52
Fri Aug 11 14:12:09 2023 daemon.notice netifd: bbbdigger (24271): udhcpc: lease of 10.36.193.52 obtained, lease time 1800
Fri Aug 11 14:12:09 2023 daemon.notice netifd: Interface 'bbbdigger' is now up
Fri Aug 11 14:12:09 2023 user.notice firewall: Reloading firewall due to ifup of bbbdigger (bbbdigger)
Fri Aug 11 14:12:09 2023 user.notice policyrouting: Add route: ip route add 10.36.193.0/24 dev bbbdigger table localnets
Fri Aug 11 14:12:09 2023 user.notice policyrouting: Add route: ip route add 10.36.193.0/24 dev bbbdigger table olsr
Fri Aug 11 14:12:09 2023 user.notice policyrouting: Use mesh gateway for interface bbbdigger (IPv4)
Fri Aug 11 14:12:10 2023 daemon.info olsrd: /etc/init.d/olsrd: olsrd_setup_smartgw_rules() Notice: Inserting firewall rules for SmartGateway
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24105]: Writing '1' (was 0) to /proc/sys/net/ipv4/conf/all/send_redirects
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24105]: Writing '1' (was 0) to /proc/sys/net/ipv4/conf/eth2/send_redirects
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24105]: olsr.org - pre-0.9.9-git_0000000-hash_91d5ae30edfe8ee23769434a3154dcd7 stopped
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '1' (was 1) to /proc/sys/net/ipv4/ip_forward
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/tunl0/rp_filter
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '0' (was 1) to /proc/sys/net/ipv4/conf/all/send_redirects
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/all/rp_filter
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '0' (was 1) to /proc/sys/net/ipv4/conf/bbbdigger/send_redirects
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/bbbdigger/rp_filter
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Adding interface bbbdigger
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: New main address: 10.36.193.52
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '0' (was 1) to /proc/sys/net/ipv4/conf/eth2/send_redirects
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/eth2/rp_filter
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Adding interface eth2
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: Received netlink error code File exists (-17)
Fri Aug 11 14:12:10 2023 daemon.err olsrd[24533]: Error on inserting policy rule aimed to activate RtTable 112!
Fri Aug 11 14:12:10 2023 daemon.info olsrd[24533]: olsr.org - pre-0.9.9-git_0000000-hash_91d5ae30edfe8ee23769434a3154dcd7 successfully started
Fri Aug 11 14:12:14 2023 daemon.err olsrd[24546]: olsrd exit: main: Bad configuration
Fri Aug 11 14:12:17 2023 daemon.info olsrd[24533]: Tunnel tnl_a0821f0a added, to 10.31.130.160
Fri Aug 11 14:12:17 2023 daemon.info olsrd[24533]: Tunnel tnl_a0821f0a removed, to -
Fri Aug 11 14:12:17 2023 daemon.info olsrd[24533]: Tunnel tnl_38531f0a added, to 10.31.83.56
Fri Aug 11 14:12:29 2023 daemon.err olsrd[24670]: olsrd exit: main: Bad configuration
Fri Aug 11 14:12:44 2023 daemon.err olsrd[24673]: olsrd exit: main: Bad configuration
Fri Aug 11 14:13:10 2023 daemon.warn td-client: [b.bbb-vpn.berlin.freifunk.net:8943] Tunnel has timed out, closing down interface.
Fri Aug 11 14:13:10 2023 daemon.notice netifd: Network device 'bbbdigger' link is down
Fri Aug 11 14:13:10 2023 daemon.notice netifd: Interface 'bbbdigger' has link connectivity loss
Fri Aug 11 14:13:10 2023 daemon.err td-client: [b.bbb-vpn.berlin.freifunk.net:8943] Connection lost.

and @kaechele suggested to get the following info

root@noki-test:~# cat /proc/net/nf_conntrack
ipv4     2 tcp      6 37 TIME_WAIT src=127.0.0.1 dst=127.0.0.1 sport=57290 dport=2006 packets=8 bytes=431 src=127.0.0.1 dst=127.0.0.1 sport=2006 dport=57290 packets=8 bytes=135529 [ASSURED] mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=149.112.112.112 sport=42202 dport=53 packets=1 bytes=71 [UNREPLIED] src=149.112.112.112 dst=172.31.226.250 sport=53 dport=42202 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=149.112.112.112 sport=42202 dport=53 packets=1 bytes=71 [UNREPLIED] src=149.112.112.112 dst=172.31.226.250 sport=53 dport=42202 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 73 ESTABLISHED src=10.31.212.199 dst=20.93.28.56 sport=56461 dport=443 packets=10 bytes=410 [UNREPLIED] src=20.93.28.56 dst=172.31.226.250 sport=443 dport=56461 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 29 LAST_ACK src=10.31.212.199 dst=172.217.18.10 sport=56553 dport=443 packets=7 bytes=581 [UNREPLIED] src=172.217.18.10 dst=172.31.226.250 sport=443 dport=56553 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 54 src=10.31.212.38 dst=149.112.112.112 sport=42162 dport=53 packets=2 bytes=124 [UNREPLIED] src=149.112.112.112 dst=172.31.226.250 sport=53 dport=42162 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 13 src=10.31.212.38 dst=1.0.0.1 sport=46972 dport=53 packets=1 bytes=61 [UNREPLIED] src=1.0.0.1 dst=172.31.226.250 sport=53 dport=46972 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=1.0.0.1 sport=53905 dport=53 packets=1 bytes=65 [UNREPLIED] src=1.0.0.1 dst=172.31.226.250 sport=53 dport=53905 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 27 LAST_ACK src=10.31.212.199 dst=172.217.18.10 sport=56556 dport=443 packets=7 bytes=550 [UNREPLIED] src=172.217.18.10 dst=172.31.226.250 sport=443 dport=56556 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 38 src=192.168.189.3 dst=194.150.168.168 sport=51147 dport=53 packets=1 bytes=75 src=194.150.168.168 dst=192.168.189.3 sport=53 dport=51147 packets=1 bytes=120 mark=0 zone=0 use=2
ipv4     2 udp      17 15 src=10.31.212.199 dst=142.250.185.67 sport=49784 dport=443 packets=3 bytes=3834 [UNREPLIED] src=142.250.185.67 dst=172.31.226.250 sport=443 dport=49784 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 54 src=10.31.212.38 dst=9.9.9.9 sport=42162 dport=53 packets=2 bytes=124 [UNREPLIED] src=9.9.9.9 dst=172.31.226.250 sport=53 dport=42162 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 14 SYN_SENT src=10.31.212.199 dst=40.99.148.82 sport=56629 dport=443 packets=1 bytes=60 [UNREPLIED] src=40.99.148.82 dst=172.31.226.250 sport=443 dport=56629 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 59 src=10.31.212.38 dst=255.255.255.255 sport=698 dport=698 packets=10665 bytes=14656064 [UNREPLIED] src=255.255.255.255 dst=10.31.212.38 sport=698 dport=698 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 11 src=172.31.226.250 dst=172.31.224.1 sport=68 dport=67 packets=1 bytes=328 src=172.31.224.1 dst=172.31.226.250 sport=67 dport=68 packets=1 bytes=328 mark=0 zone=0 use=2
ipv4     2 udp      17 38 src=192.168.189.3 dst=46.182.19.48 sport=51147 dport=53 packets=1 bytes=75 src=46.182.19.48 dst=192.168.189.3 sport=53 dport=51147 packets=1 bytes=120 mark=0 zone=0 use=2
ipv4     2 tcp      6 28 SYN_SENT src=10.31.212.199 dst=2.19.213.33 sport=56630 dport=443 packets=1 bytes=60 [UNREPLIED] src=2.19.213.33 dst=172.31.226.250 sport=443 dport=56630 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 13 src=10.31.212.38 dst=1.1.1.1 sport=46972 dport=53 packets=2 bytes=122 [UNREPLIED] src=1.1.1.1 dst=172.31.226.250 sport=53 dport=46972 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 13 src=10.31.212.38 dst=1.0.0.1 sport=49873 dport=53 packets=1 bytes=61 [UNREPLIED] src=1.0.0.1 dst=172.31.226.250 sport=53 dport=49873 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 30 src=10.31.212.199 dst=142.250.180.106 sport=54932 dport=443 packets=8 bytes=8100 [UNREPLIED] src=142.250.180.106 dst=172.31.226.250 sport=443 dport=54932 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 33 src=192.168.189.3 dst=77.87.49.11 sport=41975 dport=8943 packets=93 bytes=59362 src=77.87.49.11 dst=192.168.189.3 sport=8943 dport=41975 packets=40 bytes=46366 [ASSURED] mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=9.9.9.9 sport=42202 dport=53 packets=1 bytes=71 [UNREPLIED] src=9.9.9.9 dst=172.31.226.250 sport=53 dport=42202 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 120 src=192.168.189.3 dst=77.87.49.11 sport=46999 dport=8943 packets=370 bytes=447620 src=77.87.49.11 dst=192.168.189.3 sport=8943 dport=46999 packets=171 bytes=193226 [ASSURED] mark=0 zone=0 use=2
ipv4     2 udp      17 13 src=10.31.212.38 dst=149.112.112.112 sport=49873 dport=53 packets=1 bytes=61 [UNREPLIED] src=149.112.112.112 dst=172.31.226.250 sport=53 dport=49873 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 49 ESTABLISHED src=10.31.212.199 dst=52.123.144.67 sport=56601 dport=443 packets=10 bytes=5822 [UNREPLIED] src=52.123.144.67 dst=172.31.226.250 sport=443 dport=56601 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 178 src=192.168.189.3 dst=77.87.49.11 sport=47108 dport=8943 packets=91 bytes=59074 src=77.87.49.11 dst=192.168.189.3 sport=8943 dport=47108 packets=107 bytes=116418 [ASSURED] mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=1.1.1.1 sport=53905 dport=53 packets=1 bytes=65 [UNREPLIED] src=1.1.1.1 dst=172.31.226.250 sport=53 dport=53905 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 34 src=10.31.212.199 dst=142.250.180.106 sport=56187 dport=443 packets=6 bytes=6497 [UNREPLIED] src=142.250.180.106 dst=172.31.226.250 sport=443 dport=56187 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 30 src=10.31.212.199 dst=142.250.180.99 sport=54138 dport=443 packets=5 bytes=6390 [UNREPLIED] src=142.250.180.99 dst=172.31.226.250 sport=443 dport=54138 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 29 src=10.31.212.199 dst=142.251.143.46 sport=50077 dport=443 packets=5 bytes=6390 [UNREPLIED] src=142.251.143.46 dst=172.31.226.250 sport=443 dport=50077 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 59 src=10.31.21.185 dst=255.255.255.255 sport=698 dport=698 packets=1266 bytes=117368 [UNREPLIED] src=255.255.255.255 dst=10.31.21.185 sport=698 dport=698 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 36 src=10.31.215.33 dst=77.87.50.8 sport=44899 dport=25826 packets=250 bytes=335807 [UNREPLIED] src=77.87.50.8 dst=172.31.226.250 sport=25826 dport=44899 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 58 src=10.31.212.38 dst=1.1.1.1 sport=56122 dport=53 packets=1 bytes=70 [UNREPLIED] src=1.1.1.1 dst=172.31.226.250 sport=53 dport=56122 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 56 SYN_SENT src=10.31.212.199 dst=52.98.240.146 sport=56634 dport=443 packets=1 bytes=60 [UNREPLIED] src=52.98.240.146 dst=172.31.226.250 sport=443 dport=56634 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 58 src=10.31.212.38 dst=149.112.112.112 sport=56122 dport=53 packets=1 bytes=70 [UNREPLIED] src=149.112.112.112 dst=172.31.226.250 sport=53 dport=56122 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 2 src=10.36.193.6 dst=10.36.193.52 sport=67 dport=68 packets=2 bytes=656 [UNREPLIED] src=10.36.193.52 dst=172.31.226.250 sport=68 dport=67 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 icmp     1 28 src=192.168.189.3 dst=192.36.148.17 type=8 code=0 id=4036 packets=213 bytes=5964 src=192.36.148.17 dst=192.168.189.3 type=0 code=0 id=4036 packets=213 bytes=5964 mark=0 zone=0 use=2
ipv4     2 udp      17 38 src=192.168.189.3 dst=9.9.9.10 sport=51147 dport=53 packets=1 bytes=75 src=9.9.9.10 dst=192.168.189.3 sport=53 dport=51147 packets=1 bytes=120 mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=9.9.9.9 sport=53905 dport=53 packets=1 bytes=65 [UNREPLIED] src=9.9.9.9 dst=172.31.226.250 sport=53 dport=53905 packets=0 bytes=0 mark=0 zone=0 use=2
ipv6     10 udp      17 38 src=0000:0000:0000:0000:0000:0000:0000:0001 dst=0000:0000:0000:0000:0000:0000:0000:0001 sport=36240 dport=53 packets=1 bytes=95 src=0000:0000:0000:0000:0000:0000:0000:0001 dst=0000:0000:0000:0000:0000:0000:0000:0001 sport=53 dport=36240 packets=1 bytes=140 mark=0 zone=0 use=2
ipv4     2 udp      17 22 src=10.31.212.38 dst=9.9.9.9 sport=39539 dport=53 packets=1 bytes=66 [UNREPLIED] src=9.9.9.9 dst=172.31.226.250 sport=53 dport=39539 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 icmp     1 4 src=172.31.226.250 dst=46.182.19.48 type=8 code=0 id=26379 packets=1 bytes=84 src=46.182.19.48 dst=172.31.226.250 type=0 code=0 id=26379 packets=1 bytes=84 mark=0 zone=0 use=2
ipv4     2 udp      17 38 src=192.168.189.3 dst=80.67.169.40 sport=51147 dport=53 packets=1 bytes=75 src=80.67.169.40 dst=192.168.189.3 sport=53 dport=51147 packets=1 bytes=120 mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=1.1.1.1 sport=42202 dport=53 packets=1 bytes=71 [UNREPLIED] src=1.1.1.1 dst=172.31.226.250 sport=53 dport=42202 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 67 TIME_WAIT src=127.0.0.1 dst=127.0.0.1 sport=58386 dport=2006 packets=5 bytes=275 src=127.0.0.1 dst=127.0.0.1 sport=2006 dport=58386 packets=5 bytes=14805 [ASSURED] mark=0 zone=0 use=2
ipv4     2 udp      17 54 src=10.31.212.38 dst=1.1.1.1 sport=42162 dport=53 packets=2 bytes=124 [UNREPLIED] src=1.1.1.1 dst=172.31.226.250 sport=53 dport=42162 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 1 src=127.0.0.1 dst=127.0.0.1 sport=35993 dport=53 packets=1 bytes=75 src=127.0.0.1 dst=127.0.0.1 sport=53 dport=35993 packets=1 bytes=91 mark=0 zone=0 use=2
ipv4     2 udp      17 13 src=10.31.212.38 dst=1.1.1.1 sport=49873 dport=53 packets=2 bytes=122 [UNREPLIED] src=1.1.1.1 dst=172.31.226.250 sport=53 dport=49873 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 37 src=192.168.189.3 dst=77.87.50.8 sport=44286 dport=25826 packets=88 bytes=118788 [UNREPLIED] src=77.87.50.8 dst=192.168.189.3 sport=25826 dport=44286 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 58 src=10.31.212.38 dst=9.9.9.9 sport=56122 dport=53 packets=1 bytes=70 [UNREPLIED] src=9.9.9.9 dst=172.31.226.250 sport=53 dport=56122 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 22 src=10.31.212.38 dst=149.112.112.112 sport=39539 dport=53 packets=1 bytes=66 [UNREPLIED] src=149.112.112.112 dst=172.31.226.250 sport=53 dport=39539 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 19 src=10.36.193.6 dst=255.255.255.255 sport=698 dport=698 packets=253 bytes=335244 [UNREPLIED] src=255.255.255.255 dst=10.36.193.6 sport=698 dport=698 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 300 ESTABLISHED src=10.31.212.38 dst=10.31.21.185 sport=37866 dport=22 packets=526 bytes=34882 src=10.31.21.185 dst=10.31.212.38 sport=22 dport=37866 packets=862 bytes=350958 [ASSURED] mark=0 zone=0 use=4
ipv4     2 udp      17 13 src=10.31.212.38 dst=9.9.9.9 sport=49873 dport=53 packets=1 bytes=61 [UNREPLIED] src=9.9.9.9 dst=172.31.226.250 sport=53 dport=49873 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 tcp      6 35 SYN_SENT src=10.31.212.199 dst=52.98.240.114 sport=56631 dport=443 packets=1 bytes=60 [UNREPLIED] src=52.98.240.114 dst=172.31.226.250 sport=443 dport=56631 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 39 src=192.168.189.3 dst=149.112.112.10 sport=51147 dport=53 packets=1 bytes=75 src=149.112.112.10 dst=192.168.189.3 sport=53 dport=51147 packets=1 bytes=120 mark=0 zone=0 use=2
ipv4     2 icmp     1 25 src=10.31.212.2 dst=1.1.1.1 type=8 code=0 id=65217 packets=164 bytes=13776 src=1.1.1.1 dst=172.31.226.250 type=0 code=0 id=65217 packets=164 bytes=13776 mark=0 zone=0 use=2
ipv4     2 udp      17 24 src=10.31.212.38 dst=1.0.0.1 sport=42202 dport=53 packets=1 bytes=71 [UNREPLIED] src=1.0.0.1 dst=172.31.226.250 sport=53 dport=42202 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 udp      17 13 src=10.31.212.38 dst=149.112.112.112 sport=46972 dport=53 packets=1 bytes=61 [UNREPLIED] src=149.112.112.112 dst=172.31.226.250 sport=53 dport=46972 packets=0 bytes=0 mark=0 zone=0 use=2
ipv4     2 icmp     1 26 src=192.168.189.3 dst=77.87.50.8 type=8 code=0 id=36208 packets=213 bytes=17892 src=77.87.50.8 dst=192.168.189.3 type=0 code=0 id=36208 packets=213 bytes=17892 mark=0 zone=0 use=2
ipv4     2 udp      17 58 src=10.31.212.38 dst=1.0.0.1 sport=56122 dport=53 packets=1 bytes=70 [UNREPLIED] src=1.0.0.1 dst=172.31.226.250 sport=53 dport=56122 packets=0 bytes=0 mark=0 zone=0 use=2
ipv6     10 udp      17 1 src=0000:0000:0000:0000:0000:0000:0000:0001 dst=0000:0000:0000:0000:0000:0000:0000:0001 sport=35993 dport=53 packets=1 bytes=95 src=0000:0000:0000:0000:0000:0000:0000:0001 dst=0000:0000:0000:0000:0000:0000:0000:0001 sport=53 dport=35993 packets=1 bytes=111 mark=0 zone=0 use=2

And maybe capturing some packets. But what exactly should I be capturing the packets on? Tcpdump on the client and capturing packets going to the server? Is there anything else that could be helpful to debug this problem.

I would also like to make it clear that is does not happen on older versions of tunneldigger-broker. Specifically we are using c50ef46 which never causes this problem on multiple server each with appox 90 clients

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Specifically we are using c50ef46 which never causes this problem on multiple server each with appox 90 clients

That would put the regression range to c50ef46...4f72b30. The NAT removal is the first thing that sticks out. Old kernels are known to have problems with our multiple UDP sockets on the same port (see #126). What's the kernel version on the host?

Since it already happens with 2 clients, I am fairly sure this is a kernel issue -- that's usually how those manifest.

FWIW we have a big setup with 4 servers with 150 clients each, and I don't see a lot of these issues. (I can't say for sure none of this happens since how would I know, but we don't get user complaints en masse.) So the fact that you have servers that are unaffected doesn't say much.

reason=0x5

"ERROR_REASON_UNDEFINED". Not very useful... this is the default that will be given when something calls close() on the server without specifying a reason.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

I have the tunneldigger-broker running on openwrt (x86-64 Virtual Machine) as a test of the latest commit to tunneldigger-broker and so far we have 2 routers connecting to the broker.

How is that VM exposed to the outside such that the router can connect to it? In #126 (comment) an in-between NAT was mentioned. If that NAT screws up and doesn't properly map the ports back and forth then that could of course also explain the problem. Though I'm not sure how that should happen without screwing up all UDP completely... from the NAT perspective this is just two different clients talking to UDP port 10000 on the server, and it has to keep those data streams apart.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Specifically we are using c50ef46 which never causes this problem on multiple server each with appox 90 clients

That would put the regression range to c50ef46...4f72b30. The NAT removal is the first thing that sticks out. Old kernels are known to have problems with our multiple UDP sockets on the same port (see #126). What's the kernel version on the host?

Since it already happens with 2 clients, I am fairly sure this is a kernel issue -- that's usually how those manifest.

FWIW we have a big setup with 4 servers with 150 clients each, and I don't see a lot of these issues. (I can't say for sure none of this happens since how would I know, but we don't get user complaints en masse.) So the fact that you have servers that are unaffected doesn't say much.

reason=0x5

"ERROR_REASON_UNDEFINED". Not very useful... this is the default that will be given when something calls close() on the server without specifying a reason.

the client is

root@noki-test:~# uname -a
Linux noki-test 5.4.238 #0 SMP Mon Apr 17 13:15:36 2023 x86_64 GNU/Linux

the broker is

root@b-bbb-bvpn:~# uname -a
Linux b-bbb-bvpn 5.10.176 #0 SMP Thu Apr 27 20:28:15 2023 x86_64 GNU/Linux

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

I have the tunneldigger-broker running on openwrt (x86-64 Virtual Machine) as a test of the latest commit to tunneldigger-broker and so far we have 2 routers connecting to the broker.

How is that VM exposed to the outside such that the router can connect to it? In #126 (comment) an in-between NAT was mentioned. If that NAT screws up and doesn't properly map the ports back and forth then that could of course also explain the problem. Though I'm not sure how that should happen without screwing up all UDP completely... from the NAT perspective this is just two different clients talking to UDP port 10000 on the server, and it has to keep those data streams apart.

The broker is directly connected to the internet. What I was mentioning with connection to NAT was the client device.

Feel free to make a tunneldigger connection to b.bbb-vpn.berlin.freifunk.net:8943
(using 8943 for now during the test phase, later it will be switched to 8942)

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

the client is

root@noki-test:~# uname -a
Linux noki-test 5.4.238 #0 SMP Mon Apr 17 13:15:36 2023 x86_64 GNU/Linux

the broker is

root@b-bbb-bvpn:~# uname -a
Linux b-bbb-bvpn 5.10.176 #0 SMP Thu Apr 27 20:28:15 2023 x86_64 GNU/Linux

And the client which is not constantly reconnecting is:

root@ptest:~# uname -a
Linux ptest 5.4.238 #0 Mon Apr 17 13:15:36 2023 ppc GNU/Linux

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

The broker is the relevant kernel here. 5.10 should be new enough...

from tunneldigger.

kaechele avatar kaechele commented on July 22, 2024

It should even be new enough to have one other bug fixed that existed with UDP and SO_REUSEPORT: https://kernel.dance/#69421bf98482d089e50799f45e48b25ce4a8d154

from tunneldigger.

kaechele avatar kaechele commented on July 22, 2024

FYI, I have updated #126 with the above information. The new minimum Kernel that should work with the master branch is therefor 5.10.152.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

FWIW we have a big setup with 4 servers with 150 clients each, and I don't see a lot of these issues. (I can't say for sure none of this happens since how would I know, but we don't get user complaints en masse.) So the fact that you have servers that are unaffected doesn't say much.

sudo journalctl -u tunneldigger.service | grep Clos

will show you how long the clients connected. Anything less that 100 I would be suspicious of. But mostly I'm getting closings in the 5 to 30 second range.

Also, how else can I debug this issue? Any pointers would be helpful. I would like to get the new broker online.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

will show you how long the clients connected.

Hm yes, I am seeing some clients that struggle, and others that last very long. Interesting.

Also, how else can I debug this issue? Any pointers would be helpful. I would like to get the new broker online.

I think a good first step would be trying to figure out why the server is closing the tunnel. So for instance you could modify this function

def close(self, reason=protocol.ERROR_REASON_UNDEFINED):

to raise an exception if the code is ERROR_REASON_UNDEFINED. Then the stacktrace will tell us where that close call is coming from.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024
oot@b-bbb-bvpn:/usr/lib/python3.10/site-packages/tunneldigger_broker# /usr/bin/python -m tunneldigger_broker.main /var/etc/tunneldigger-broker.cfg
[INFO/tunneldigger.broker] Initializing the tunneldigger broker.
[INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/setup' for hook 'session.up'.
[INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/teardown' for hook 'session.pre-down'.
[INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/mtu-changed' for hook 'session.mtu-changed'.
[INFO/tunneldigger.broker] Maximum number of tunnels is 1024.
[INFO/tunneldigger.broker] Tunnel identifier base is 100.
[INFO/tunneldigger.broker] Listening on 77.87.49.11:8943.
[INFO/tunneldigger.broker] Broker initialized.
[INFO/tunneldigger.broker] Creating tunnel (b6:01:be:53:f0:d5:71:a2:3a:35) with id 100.
[INFO/tunneldigger.tunnel] Set tunnel 100 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 101.
[INFO/tunneldigger.tunnel] Set tunnel 101 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.tunnel] Closing tunnel 101 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 6 seconds (reason=0x5)
[INFO/tunneldigger.broker] Shutting down tunneldigger broker.
[INFO/tunneldigger.tunnel] Closing tunnel 100 (b6:01:be:53:f0:d5:71:a2:3a:35) after 8 seconds (reason=0x5)
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/broker.py", line 203, in close
    tunnel.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
    event_loop.start()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 61, in start
    pollable.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/tunnel.py", line 247, in close
    raise
RuntimeError: No active exception to reraise
[INFO/tunneldigger.tunnel] Closing tunnel 101 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 6 seconds (reason=0x5)
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/broker.py", line 203, in close
    tunnel.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
    event_loop.start()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 61, in start
    pollable.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/tunnel.py", line 247, in close
    raise
RuntimeError: No active exception to reraise
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
    event_loop.start()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 61, in start
    pollable.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/tunnel.py", line 247, in close
    raise
RuntimeError: No active exception to reraise

I put raise at the very beginning of the function after the logger call. If that's too aggressive, let me know

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Ah... something is catching the exception and then initiating a broker shutdown. And we don't get the original stacktrace somehow. Dang.

Is there a way in Python to just print a stacktrace without raising an exception?

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

I'm not a python user...

I changed the raise to traceback.print_stack()

root@b-bbb-bvpn:/usr/lib/python3.10/site-packages/tunneldigger_broker# /usr/bin/python -m tunneldigger_broker.main /var/etc/tunneldigger-broker.cfg
[INFO/tunneldigger.broker] Initializing the tunneldigger broker.
[INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/setup' for hook 'session.up'.
[INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/teardown' for hook 'session.pre-down'.
[INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/mtu-changed' for hook 'session.mtu-changed'.
[INFO/tunneldigger.broker] Maximum number of tunnels is 1024.
[INFO/tunneldigger.broker] Tunnel identifier base is 100.
[INFO/tunneldigger.broker] Listening on 77.87.49.11:8943.
[INFO/tunneldigger.broker] Broker initialized.
[INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 100.
[INFO/tunneldigger.tunnel] Set tunnel 100 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.broker] Creating tunnel (b6:01:be:53:f0:d5:71:a2:3a:35) with id 101.
[INFO/tunneldigger.tunnel] Set tunnel 101 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.tunnel] Closing tunnel 100 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 12 seconds (reason=0x5)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
    event_loop.start()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 61, in start
    pollable.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/tunnel.py", line 248, in close
    traceback.print_stack()
[INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
[INFO/tunneldigger.tunnel] Set tunnel 101 MTU to 1438.
[INFO/tunneldigger.hooks] Running hook 'session.mtu-changed' via script '/usr/lib/tunneldigger-broker/hooks/mtu-changed'.
[INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 102.
[INFO/tunneldigger.tunnel] Set tunnel 102 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.tunnel] Closing tunnel 102 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 12 seconds (reason=0x5)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
    event_loop.start()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 61, in start
    pollable.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/tunnel.py", line 248, in close
    traceback.print_stack()
[INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.

It's basically the same trace as before. It's now line 248 in tunnel.py because I had to add an import traceback above

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Oh, I guess you just did raise in the code? That's re-raising an exception, not raising a new one, so it would lead to confusing errors. But yeah print_stack sounds good. Here's the diff you should apply:

--- a/broker/src/tunneldigger_broker/tunnel.py
+++ b/broker/src/tunneldigger_broker/tunnel.py
@@ -244,6 +244,8 @@ class Tunnel(protocol.HandshakeProtocolMixin, network.Pollable):
         """
 
         logger.info("Closing tunnel {} ({}) after {} seconds (reason=0x{:x})".format(self.tunnel_id, self.uuid, int(time.time() - self.created_time), reason))
+        import traceback
+        traceback.print_stack()
 
         # Run pre-down hook.
         self.broker.hook_manager.run_hook(

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Okay so this is where close gets called:

elif event & select.EPOLLERR or event & select.EPOLLHUP:
pollable.close()

epoll is telling us that the socket went bust, so we're closing the socket. Strange. Sounds like this is related to #148. That PR was a bugfix for #143.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Should I separate the EPOLLERR from the EPOLLHUP so we can see which we are dealing with?

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

I did the separation...

[INFO/tunneldigger.broker] Broker initialized.
[INFO/tunneldigger.broker] Creating tunnel (b6:01:be:53:f0:d5:71:a2:3a:35) with id 100.
[INFO/tunneldigger.tunnel] Set tunnel 100 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.broker] Rejecting tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) due to global rate limiting: last tunnel was created too recently
[WARNING/tunneldigger.protocol] Failed to create tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) while processing prepare request.
[INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 101.
[INFO/tunneldigger.tunnel] Set tunnel 101 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.tunnel] Set tunnel 100 MTU to 1438.
[INFO/tunneldigger.hooks] Running hook 'session.mtu-changed' via script '/usr/lib/tunneldigger-broker/hooks/mtu-changed'.
[INFO/tunneldigger.tunnel] Closing tunnel 101 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 26 seconds (reason=0x5)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
    event_loop.start()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 61, in start
    pollable.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/tunnel.py", line 248, in close
    traceback.print_stack()
[INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
[INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 102.
[INFO/tunneldigger.tunnel] Set tunnel 102 MTU to 1446.
[INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
[INFO/tunneldigger.tunnel] Closing tunnel 102 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 9 seconds (reason=0x5)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
    event_loop.start()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 61, in start
    pollable.close()
  File "/usr/lib/python3.10/site-packages/tunneldigger_broker/tunnel.py", line 248, in close
    traceback.print_stack()
[INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.

eventloop.py", line 61 is the EPOLLERR.

60                    elif event & select.EPOLLERR:
61                        pollable.close()
62                    elif event & select.EPOLLHUP:
63                        pollable.close()

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Right, so the socket closed in an unexpected way. Why socket got closed... no idea, not sure if we can get the kernel to tell us a reason.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Can you capture a packet trace of the communication between that one client and the broker (as seen from the broker)?

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

dump.pcap.gz

Here is a capture of 2 full sessions including the time up to when the client tries to reconnect.

I hope it brings some new info.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Screenshot_2023-08-15_23-41-05
Screenshot_2023-08-15_23-41-26

Might be interesting. The timing is pretty close. The first connections was 14 sec, the second 22.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Yeah that looks like an MTU problem. Can you try decreasing the MTU inside the network?

However I didn't think it would just cut the connection then... does l2tp not do fragmentation?

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

No, l2tp does not do fragmentation. It is UDP based. This is why we have in Tunneldigger max package size discovery. Not sure how you are running it, but we have preset different possible package size and then Tunneldigger picked the largest one which still worked.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

But is it really expected that the entire connection dies when a too-big-packet is sent inside the connection?

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

Hm, tunnel should not die. But TCP connections inside tunnel might have issues (trying to resend but unsuccessfully). But if there is one too-big-packet then there are also probably more?

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

What seems to be happening is that the kernel sends us a EPOLLERR when a packet that went through the tunnel leads to an outside-the-tunnel "ICMP fragmentation needed" packet. tunneldigger used to entirely ignore EPOLLERR, but that caused #143. Now tunneldigger closes the connection on EPOLLERR, and that means such an "fragmentation needed" packet closes the connection. At least that's my current hypothesis.

Though it is also unclear what to do instead. We can drop the packet, but that will lead to timeouts. Ideally we'd generate an ICMP-fragmentation-needed inside the tunnel, but that's really something the kernel would have to do...

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

I prefer to have pmtu=0 in the config so that on the broker I can have a bridge for each mtu size. This works flawlessly with c50ef46

As a test, I set pmtu=1446 and also pmtu=1438. In both cases the tunnel stays up for over 5 minutes.

Perhaps now that NAT is removed, the ICMP message from the client is not being processed in the same way that it was before?

But on the other hand, if I change the MTU on the wan port of my stable client (with pmtu=0 on the broker), the pmtu feature works totally fine.

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

It has been ages since I have done any work around tunneldigger so feel free to ignore my thoughts. But I think there might then be mismatch with MTU set on the inside/tunnel interface?

that on the broker I can have a bridge for each mtu size.

I think we have it like that as well. I think we also do not use kernel's PMTU, but use the one by tunneldigger (if I remember correctly). I am not sure now how we decide to which interface to connect a client though.

Anyway, I think you two seems to have much more recent knowledge about all this so I will leave you both to it. :-)

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

The messages that is too big is also a strange one -- it is supposedly a control message, and filled up with lots of 0s!

msg_type is CONTROL_TYPE_PMTUD. Ah, so the PMTU probe is itself killing the connection... great.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Perhaps now that NAT is removed, the ICMP message from the client is not being processed in the same way that it was before?

I think this has nothing to do with the NAT patch, it is about #148. You could try commenting out these two lines

elif event & select.EPOLLERR or event & select.EPOLLHUP:
pollable.close()

However... just ignoring EPOLLERR led to problems in the past, so this is not a proper fix. But it would still be informative to see whether this helps.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

You could try commenting out these two lines

elif event & select.EPOLLERR or event & select.EPOLLHUP:
pollable.close()

However... just ignoring EPOLLERR led to problems in the past, so this is not a proper fix. But it would still be informative to see whether this helps.

With the lines commented out, the problem client stays connected and successfully changes mtu and it's associated bridge

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Okay, we learned something!

So... EPOLLERR can indicate a permanent error, as in #143, or a transient error, as in "the control packet you just wanted to send was too big". Dang.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Dang

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Could you try this branch? https://github.com/wlanslovenija/tunneldigger/tree/pmtu-fix
It should then ignore the first few EPOLLERR, and that count resets when a message is received (such as the keepalive). Hopefully that is enough to handle #143 while still also fixing your PMTU issues...

FWIW we are not using PMTU discovery in our network so the short-lived clients for us must have a different reason. 🤷

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024
Wed Aug 16 11:39:06 2023 daemon.err python[4098]: [INFO/tunneldigger.broker] Shutting down tunneldigger broker.
Wed Aug 16 11:39:06 2023 daemon.err python[4098]: [INFO/tunneldigger.tunnel] Closing tunnel 101 (b6:01:be:53:f0:d5:71:a2:3a:35) after 20 seconds (reason=0x5)
Wed Aug 16 11:39:06 2023 daemon.err python[4098]: [INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
Wed Aug 16 11:39:06 2023 kern.info kernel: [497564.843694] br-digger1438: port 1(l2tp101-101) entered blocking state
Wed Aug 16 11:39:06 2023 kern.info kernel: [497564.843697] br-digger1438: port 1(l2tp101-101) entered disabled state
Wed Aug 16 11:39:06 2023 kern.info kernel: [497564.843779] device l2tp101-101 entered promiscuous mode
Wed Aug 16 11:39:06 2023 kern.info kernel: [497564.843829] br-digger1438: port 1(l2tp101-101) entered blocking state
Wed Aug 16 11:39:06 2023 kern.info kernel: [497564.843831] br-digger1438: port 1(l2tp101-101) entered forwarding state
Wed Aug 16 11:39:06 2023 kern.info kernel: [497564.848972] device l2tp101-101 left promiscuous mode
Wed Aug 16 11:39:06 2023 kern.info kernel: [497564.848994] br-digger1438: port 1(l2tp101-101) entered disabled state
Wed Aug 16 11:39:06 2023 daemon.notice netifd: bridge 'br-digger1438' link is down
Wed Aug 16 11:39:06 2023 daemon.notice netifd: Interface 'digger1438' has link connectivity loss
Wed Aug 16 11:39:06 2023 daemon.info olsrd[2611]: Removing interface br-digger1438
Wed Aug 16 11:39:06 2023 daemon.err python[4098]: Traceback (most recent call last):
Wed Aug 16 11:39:06 2023 daemon.err python[4098]:   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
Wed Aug 16 11:39:06 2023 daemon.err python[4098]:   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
Wed Aug 16 11:39:06 2023 daemon.err python[4098]:   File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 115, in <module>
Wed Aug 16 11:39:06 2023 daemon.err python[4098]:     event_loop.start()
Wed Aug 16 11:39:06 2023 daemon.err python[4098]:   File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 76, in start
Wed Aug 16 11:39:06 2023 daemon.err python[4098]:     pollable.close(reason=protocol.ERROR_REASON_FAILURE)
Wed Aug 16 11:39:06 2023 daemon.err python[4098]: TypeError: HookProcess.close() got an unexpected keyword argument 'reason'

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

@RalfJung I sent you a private email yesterday to the address listed on your webpage. Did you get a chance to read it?

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Also, my stable client is losing it's connection freqently with the above patch

Wed Aug 16 11:47:57 2023 daemon.err python[4288]: [INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 106.
Wed Aug 16 11:47:57 2023 daemon.err python[4288]: [INFO/tunneldigger.tunnel] Set tunnel 106 MTU to 1446.
Wed Aug 16 11:47:57 2023 daemon.err python[4288]: [INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
Wed Aug 16 11:47:57 2023 daemon.notice netifd: bridge 'br-digger1446' link is up
Wed Aug 16 11:47:57 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/send_redirects
Wed Aug 16 11:47:57 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity
Wed Aug 16 11:47:57 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/rp_filter
Wed Aug 16 11:47:57 2023 daemon.info olsrd[2611]: Adding interface br-digger1446
Wed Aug 16 11:47:57 2023 kern.info kernel: [498096.085857] br-digger1446: port 1(l2tp106-106) entered blocking state
Wed Aug 16 11:47:57 2023 kern.info kernel: [498096.085859] br-digger1446: port 1(l2tp106-106) entered disabled state
Wed Aug 16 11:47:57 2023 kern.info kernel: [498096.085897] device l2tp106-106 entered promiscuous mode
Wed Aug 16 11:47:57 2023 kern.info kernel: [498096.085926] br-digger1446: port 1(l2tp106-106) entered blocking state
Wed Aug 16 11:47:57 2023 kern.info kernel: [498096.085927] br-digger1446: port 1(l2tp106-106) entered forwarding state
Wed Aug 16 11:48:09 2023 daemon.err python[4288]: [INFO/tunneldigger.tunnel] Closing tunnel 106 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 12 seconds (reason=0x4)
Wed Aug 16 11:48:09 2023 daemon.err python[4288]: [INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
Wed Aug 16 11:48:09 2023 kern.info kernel: [498108.098874] device l2tp106-106 left promiscuous mode
Wed Aug 16 11:48:09 2023 kern.info kernel: [498108.098911] br-digger1446: port 1(l2tp106-106) entered disabled state
Wed Aug 16 11:48:09 2023 daemon.notice netifd: bridge 'br-digger1446' link is down
Wed Aug 16 11:48:09 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity loss
Wed Aug 16 11:48:09 2023 daemon.info olsrd[2611]: Removing interface br-digger1446

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

HookProcess.close() got an unexpected keyword argument 'reason'

Did you completely switch to the branch? That function does have that argument...
Don't just cherry-pick one file. You need to be on the branch entirely.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

HookProcess.close() got an unexpected keyword argument 'reason'

Did you completely switch to the branch? That function does have that argument...

I copied https://raw.githubusercontent.com/wlanslovenija/tunneldigger/51be74a18ae7f0f64a8854ed6023646a338e5cb5/broker/src/tunneldigger_broker/eventloop.py directly into the running system. I will do a fresh install of the python code now.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

@RalfJung I sent you a private email yesterday to the address listed on your webpage. Did you get a chance to read it?

I got the mail, yeah. But I don't have the setup to connect a client to your server, so server SSH access isn't really helpful, sorry. But thanks for the offer!

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

With a complete new checkout...

Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Initializing the tunneldigger broker.
Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/setup' for hook 'session.up'.
Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/teardown' for hook 'session.pre-down'.
Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Registered script '/usr/lib/tunneldigger-broker/hooks/mtu-changed' for hook 'session.mtu-changed'.
Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Maximum number of tunnels is 1024.
Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Tunnel identifier base is 100.
Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Listening on 77.87.49.11:8943.
Wed Aug 16 11:56:17 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Broker initialized.
Wed Aug 16 11:56:22 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 100.
Wed Aug 16 11:56:22 2023 daemon.err python[4795]: [INFO/tunneldigger.tunnel] Set tunnel 100 MTU to 1446.
Wed Aug 16 11:56:22 2023 daemon.err python[4795]: [INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
Wed Aug 16 11:56:22 2023 daemon.notice netifd: bridge 'br-digger1446' link is up
Wed Aug 16 11:56:22 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity
Wed Aug 16 11:56:22 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/send_redirects
Wed Aug 16 11:56:22 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/rp_filter
Wed Aug 16 11:56:22 2023 daemon.info olsrd[2611]: Adding interface br-digger1446
Wed Aug 16 11:56:22 2023 kern.info kernel: [498600.767594] br-digger1446: port 1(l2tp100-100) entered blocking state
Wed Aug 16 11:56:22 2023 kern.info kernel: [498600.767598] br-digger1446: port 1(l2tp100-100) entered disabled state
Wed Aug 16 11:56:22 2023 kern.info kernel: [498600.767648] device l2tp100-100 entered promiscuous mode
Wed Aug 16 11:56:22 2023 kern.info kernel: [498600.767691] br-digger1446: port 1(l2tp100-100) entered blocking state
Wed Aug 16 11:56:22 2023 kern.info kernel: [498600.767693] br-digger1446: port 1(l2tp100-100) entered forwarding state
Wed Aug 16 11:56:38 2023 daemon.err python[4795]: [INFO/tunneldigger.tunnel] Closing tunnel 100 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 16 seconds (reason=0x4)
Wed Aug 16 11:56:38 2023 daemon.err python[4795]: [INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
Wed Aug 16 11:56:38 2023 kern.info kernel: [498616.779932] device l2tp100-100 left promiscuous mode
Wed Aug 16 11:56:38 2023 kern.info kernel: [498616.779978] br-digger1446: port 1(l2tp100-100) entered disabled state
Wed Aug 16 11:56:38 2023 daemon.info olsrd[2611]: Removing interface br-digger1446
Wed Aug 16 11:56:38 2023 daemon.notice netifd: bridge 'br-digger1446' link is down
Wed Aug 16 11:56:38 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity loss
Wed Aug 16 11:56:47 2023 daemon.err python[4795]: [INFO/tunneldigger.broker] Creating tunnel (b6:01:be:53:f0:d5:71:a2:3a:35) with id 101.
Wed Aug 16 11:56:47 2023 daemon.err python[4795]: [INFO/tunneldigger.tunnel] Set tunnel 101 MTU to 1446.
Wed Aug 16 11:56:47 2023 daemon.err python[4795]: [INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
Wed Aug 16 11:56:47 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/send_redirects
Wed Aug 16 11:56:47 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/rp_filter
Wed Aug 16 11:56:47 2023 daemon.info olsrd[2611]: Adding interface br-digger1446
Wed Aug 16 11:56:47 2023 daemon.notice netifd: bridge 'br-digger1446' link is up
Wed Aug 16 11:56:47 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity
Wed Aug 16 11:56:48 2023 kern.info kernel: [498626.235669] br-digger1446: port 1(l2tp101-101) entered blocking state
Wed Aug 16 11:56:48 2023 kern.info kernel: [498626.235673] br-digger1446: port 1(l2tp101-101) entered disabled state
Wed Aug 16 11:56:48 2023 kern.info kernel: [498626.235719] device l2tp101-101 entered promiscuous mode
Wed Aug 16 11:56:48 2023 kern.info kernel: [498626.235757] br-digger1446: port 1(l2tp101-101) entered blocking state
Wed Aug 16 11:56:48 2023 kern.info kernel: [498626.235758] br-digger1446: port 1(l2tp101-101) entered forwarding state
Wed Aug 16 11:57:07 2023 daemon.err python[4795]: [INFO/tunneldigger.tunnel] Set tunnel 101 MTU to 1438.
Wed Aug 16 11:57:07 2023 daemon.err python[4795]: [INFO/tunneldigger.hooks] Running hook 'session.mtu-changed' via script '/usr/lib/tunneldigger-broker/hooks/mtu-changed'.
Wed Aug 16 11:57:08 2023 kern.info kernel: [498646.244150] device l2tp101-101 left promiscuous mode
Wed Aug 16 11:57:08 2023 kern.info kernel: [498646.244177] br-digger1446: port 1(l2tp101-101) entered disabled state
Wed Aug 16 11:57:08 2023 daemon.notice netifd: bridge 'br-digger1446' link is down
Wed Aug 16 11:57:08 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity loss
Wed Aug 16 11:57:08 2023 daemon.info olsrd[2611]: Removing interface br-digger1446
Wed Aug 16 11:57:08 2023 daemon.notice netifd: bridge 'br-digger1438' link is up
Wed Aug 16 11:57:08 2023 daemon.notice netifd: Interface 'digger1438' has link connectivity
Wed Aug 16 11:57:08 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1438/send_redirects
Wed Aug 16 11:57:08 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1438/rp_filter
Wed Aug 16 11:57:08 2023 daemon.info olsrd[2611]: Adding interface br-digger1438
Wed Aug 16 11:57:08 2023 kern.info kernel: [498646.283740] br-digger1438: port 1(l2tp101-101) entered blocking state
Wed Aug 16 11:57:08 2023 kern.info kernel: [498646.283743] br-digger1438: port 1(l2tp101-101) entered disabled state
Wed Aug 16 11:57:08 2023 kern.info kernel: [498646.283835] device l2tp101-101 entered promiscuous mode
Wed Aug 16 11:57:08 2023 kern.info kernel: [498646.283886] br-digger1438: port 1(l2tp101-101) entered blocking state
Wed Aug 16 11:57:08 2023 kern.info kernel: [498646.283888] br-digger1438: port 1(l2tp101-101) entered forwarding state
Wed Aug 16 11:57:40 2023 daemon.err python[4795]: [INFO/tunneldigger.tunnel] Set tunnel 102 MTU to 1446.
Wed Aug 16 11:57:40 2023 daemon.err python[4795]: [INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
Wed Aug 16 11:57:40 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/send_redirects
Wed Aug 16 11:57:40 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/rp_filter
Wed Aug 16 11:57:40 2023 daemon.info olsrd[2611]: Adding interface br-digger1446
Wed Aug 16 11:57:40 2023 daemon.notice netifd: bridge 'br-digger1446' link is up
Wed Aug 16 11:57:40 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity
Wed Aug 16 11:57:40 2023 kern.info kernel: [498678.901008] br-digger1446: port 1(l2tp102-102) entered blocking state
Wed Aug 16 11:57:40 2023 kern.info kernel: [498678.901012] br-digger1446: port 1(l2tp102-102) entered disabled state
Wed Aug 16 11:57:40 2023 kern.info kernel: [498678.901061] device l2tp102-102 entered promiscuous mode
Wed Aug 16 11:57:40 2023 kern.info kernel: [498678.901100] br-digger1446: port 1(l2tp102-102) entered blocking state
Wed Aug 16 11:57:40 2023 kern.info kernel: [498678.901101] br-digger1446: port 1(l2tp102-102) entered forwarding state
Wed Aug 16 11:57:51 2023 authpriv.info dropbear[4907]: Child connection from 161.35.114.58:37058
Wed Aug 16 11:57:51 2023 authpriv.warn dropbear[4907]: Login attempt for nonexistent user
Wed Aug 16 11:57:51 2023 authpriv.info dropbear[4907]: Exit before auth from <161.35.114.58:37058>: Exited normally
Wed Aug 16 11:57:57 2023 daemon.err python[4795]: [INFO/tunneldigger.tunnel] Closing tunnel 102 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 17 seconds (reason=0x4)
Wed Aug 16 11:57:57 2023 daemon.err python[4795]: [INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
Wed Aug 16 11:57:57 2023 kern.info kernel: [498695.912741] device l2tp102-102 left promiscuous mode
Wed Aug 16 11:57:57 2023 kern.info kernel: [498695.912767] br-digger1446: port 1(l2tp102-102) entered disabled state
Wed Aug 16 11:57:57 2023 daemon.notice netifd: bridge 'br-digger1446' link is down
Wed Aug 16 11:57:57 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity loss
Wed Aug 16 11:57:57 2023 daemon.info olsrd[2611]: Removing interface br-digger1446

reason 0x4

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

0x4 is the new 0x5, I changed the error on "connection closed due to epoll error" from ERROR_REASON_UNDEFINED to ERROR_REASON_FAILURE.

But why you would now see this for both clients, I cannot explain.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

@RalfJung I sent you a private email yesterday to the address listed on your webpage. Did you get a chance to read it?

I got the mail, yeah. But I don't have the setup to connect a client to your server, so server SSH access isn't really helpful, sorry. But thanks for the offer!

I don't think it's necessary for you to have your own client. You can just rely on the one stable client and the one unstable client which are already connecting. But if you aren't interested in ssh access, that's also fine.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

0x5 is the new 0x5, I changed the error on "connection closed due to epoll error" from ERROR_REASON_UNDEFINED to ERROR_REASON_FAILURE.

But why you would now see this for both clients, I cannot explain.

I already updated my comment above. I wrote too quickly. The client with uuid ending in 9f:41 is the unstable client. The client with uuid ending in 3a:35 is the stable client.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Ah okay. So it doesn't make things worse. But it also doesn't help. Dang.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

I did a tcpdump on the working client, which changes mtu successfully. The "Destination unreachable (Fragmentation need)" packet is not coming in.

So, I assume that the icmp message is not coming from the problem client but instead some router between the problem client and the broker.

Just trying to make sure the situation is clear.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

And I just checked to make sure. There are no icmp messages going out of the problem client.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

I got it working!

                        self.error_count[fd] += 1                                                      
                        if self.error_count[fd] >= 10:                                                 
                            pollable.close(reason=protocol.ERROR_REASON_FAILURE)                       
                        pollable.read(file_object)                                                     

read the object to get it out of the buffer??? Anyhow, the problem client is changing mtu's and not disconnecting after just a few seconds.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Probably read is not the right call to make. Because after another test, I got the following error

Wed Aug 16 13:14:28 2023 daemon.err python[7737]: [ERROR/tunneldigger.broker] Unhandled top-level exception:                                                                                                                      
Wed Aug 16 13:14:28 2023 daemon.err python[7737]: [ERROR/tunneldigger.broker] Traceback (most recent call last):
Wed Aug 16 13:14:28 2023 daemon.err python[7737]:   File "/usr/lib/python3.10/site-packages/tunneldigger_broker/main.py", line 116, in <module>
Wed Aug 16 13:14:28 2023 daemon.err python[7737]:     event_loop.start()                                                                                                                                                          
Wed Aug 16 13:14:28 2023 daemon.err python[7737]:   File "/usr/lib/python3.10/site-packages/tunneldigger_broker/eventloop.py", line 79, in start
Wed Aug 16 13:14:28 2023 daemon.err python[7737]:     pollable.read(file_object)
Wed Aug 16 13:14:28 2023 daemon.err python[7737]:   File "/usr/lib/python3.10/site-packages/tunneldigger_broker/hooks.py", line 90, in read
Wed Aug 16 13:14:28 2023 daemon.err python[7737]:     data = file_object.read()       
Wed Aug 16 13:14:28 2023 daemon.err python[7737]: ValueError: read of closed file   
Wed Aug 16 13:14:28 2023 daemon.err python[7737]:                                                                

And on the broker, I'm getting the fragmentation needed message 12 times. Is there a better way to flush the queue than to read?

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Ok. so now I seperated the ERR and HUP cases and it seems to work reliably:

                                                                                                                                 
                    if event & select.EPOLLIN:                                                                                   
                        pollable.read(file_object)                                                                               
                        # we got a message here, reset error count -- if there was an error, it probably was transient           
                        self.error_count[fd] = 0                                                                                 
                    elif event & select.EPOLLHUP:                                                                                
                        pollable.close(reason=protocol.ERROR_REASON_FAILURE)                                                     
                    elif event & select.EPOLLERR:                                                                                
                        # Here we have a problem. This can indicate permanent connection failure                                 
                        # (https://github.com/wlanslovenija/tunneldigger/issues/143), or it can                                  
                        # indicate that we sent a packet that was too big (e.g. the PMTU probe                                   
                        # reply, see https://github.com/wlanslovenija/tunneldigger/issues/171).                                  
                        # To distinguish the two, we count how many consecutive errors we see without a proper `read` in between.
                        # If that reaches a threshold, we consider this error permanent and close the connection.
                        # PMTU discovery sends 6 probes, so 10 should be safe as a threshold.          
                        self.error_count[fd] += 1                                                      
                        if self.error_count[fd] >= 10:                                                 
                            pollable.close(reason=protocol.ERROR_REASON_FAILURE)                       
                        pollable.read(file_object)                                                     

After 6 connections, manually re-established by me, there is no "read of closed file" errors anymore

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Hm, interesting that that helps. We certainly shouldn't close-then-read. But I am surprised that read makes any difference when there's no EPOLLIN...

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

We certainly shouldn't close-then-read.

very true. reworked

                        self.error_count[fd] += 1                                                                
                        pollable.read(file_object)                                           
                        if self.error_count[fd] >= 10:                      
                            pollable.close(reason=protocol.ERROR_REASON_FAILURE)                

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

I am surprised that read makes any difference when there's no EPOLLIN

I'm not really sure either. But I guess the error message is something which can be somehow read and which anyhow should be considered to be different than normal IN data.

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

So once you get EPOLLERR error, you can get the underlying errno using getsockopt: https://stackoverflow.com/questions/13150377/how-to-get-errno-when-epoll-wait-returns-epollerr

Maybe that could help to know when the EPOLLERR error is something to ignore and when something to act upon?

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

We're doing a read on the socket now, which also seems to return that pending error.

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

Sure, but shouldn't we count based on the error type and not use read to log it?

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Not sure how I would factor the error type into the counting...
Do you mean, entirely ignore the error if it's EMSGSIZE, not even add it to the count? I guess we could, though I doubt it is necessary.

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

Do you mean, entirely ignore the error if it's EMSGSIZE, not even add it to the count?

Yea, for example. So maybe ignore if error is EMSGSIZE, but otherwise immediately close it and not even count?

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

Other errors can be transient too... I'm seeing a bunch of [Errno 111] Connection refused and at least some of these clients actually have somewhat working connections... just occasionally one of the packets gets rejected for some reason.

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

And some connections see just a few of these errors initially and then they work fine.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

So, after leaving the system on overnight (commit fdbb096) I can report that everything seems to be running stable.

Thank you everyone for helping to squash this bug

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Just updated to d538780 and it seems to be working well.

Maybe it's a good time to update

VERSION = '0.4.0.dev1'
and tag this? Once that is done, I would make a PR to openwrt-packages to pull in the latest changes

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

I've never been involved in a release, but yeah I'm all for making another one. @mitar any objections? I'd say we should wait a week to see if unexpected issues come up in production (I deployed d538780 to all our servers this night).

@pmelange do you want to make a PR changing the setup.py version and, if necessary, completing the changelog in HISTORY.rst?

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

No objections.

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

But probably we should write up a good changelog, like the whole fact that NAT is not necessary and all other stuff since the last release What all has changed since the last release?

No objections. Tell me if you need any permissions for anything.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

Are we talking about going to 0.4.0 or 0.4.0-dev2?

from tunneldigger.

mitar avatar mitar commented on July 22, 2024

We could go with 0.4.0-dev2 for now and then in few weeks also do 0.4.0 if people test it. If we think people will test a released 0.4.0-dev2. But my feeling is that we do not have such a community of eager beta testers so it might be easier to just go with 0.4.0 and then if bugs are reported we release a bugfix release and this is it.

from tunneldigger.

pmelange avatar pmelange commented on July 22, 2024

ok. then I will work on 0.4.0. BTW, there are a LOT of commits since 0.3.0 (Sun Apr 2 13:05:23 2017). It's going to take a while to go through all the commits and create a changelog :/

git diff v0.3.0 --stat
49 files changed, 1332 insertions(+), 1026 deletions(-)

from tunneldigger.

RalfJung avatar RalfJung commented on July 22, 2024

I thought 0.4.0-dev1 was just a name for "somewhere along the way to 0.4.0". I don't see the point to tag a dev version (and there are no such tags in the repo).

from tunneldigger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.