Giter Site home page Giter Site logo

open-balena-vpn's Introduction

openBalena VPN

Description

openBalena VPN augments an OpenVPN server with the following components/features:

  • open-balena-connect-proxy is a http connect proxy that handles connections through the vpn to services on connected devices, used by external services such as balena-proxy
  • open-balena-vpn-api which consists of an internal API for handling authentication and tracking device state, and spawns openvpn server instances
  • haproxy used for balancing new connections between openvpn instances
  • libnss-openvpn is used to handle dns lookups of devices for connections via open-balena-connect-proxy

Networking

Networking is configured by a number of environmental variables:

  • VPN_GATEWAY (optional) dictates the server end of the p2p connection
  • VPN_BASE_SUBNET in CIDR notation is the entire subnet used for all servers
  • VPN_INSTANCE_SUBNET_BITMASK is the VLSM to split VPN_BASE_SUBNET into VPN_BASE_PORT and VPN_BASE_MANAGEMENT_PORT

Given a base subnet of 100.64.0.0/10 and a per-instance VLSM of 20 a server the first instance subnet would be 100.64.0.0/20 and the second would be 100.64.16.0/20, and so forth up to 100.127.240.1/20 for the 1024th instance.

If VPN_GATEWAY is not defined then the first usable address of the instance subnet will be used in its place. This address, and the second usable address, are used to facilitate the virtual p2p connections by openvpn.

The rest of the subnet, the third usable address to the last usable address, is used as a DHCP pool for devices.

Note that the dhcp pool size will also dictate the max clients per process, with the max clients per server being max_clients_per_instance * VPN_INSTANCE_COUNT and not the size of the base subnet. A VLSM of 20 will allow for 4,094 clients per instance, and a base subnet of size /10 will allow for a total of a total of 4,194,302 clients.

Base ports are increments by the process instance ID (1-indexed) to calculate the port for that instance.

DNS

OpenVPN writes connected client information to /var/run/openvpn/server-${id}.status which are interrogated by libnss-openvpn allowing for lookup of connected device VPN addresses via uuid.

Client Authentication / State

VPN client authentication is initiated via an event from the vpn management console which proxies the credentials to the balena api which ultimately decides the fate of the client.

Accessing Clients

Connections to devices can be established via open-balena-connect-proxy which exposes a HTTP CONNECT Proxy server allowing for access to devices via a hostname in the format {deviceUUID}.balena:{port}. The destination port is limited based on the requesting user and device configuration. The listening port is configured by the VPN_CONNECT_PROXY_PORT variable.

open-balena-vpn's People

Contributors

ab77 avatar abresas avatar balena-ci avatar balena-renovate[bot] avatar brownjohnf avatar dfunckt avatar flesler avatar flowzone-app[bot] avatar hedss avatar james2710 avatar klutchell avatar lekkas avatar lifeeth avatar lorenzo-stoakes avatar mikesimos avatar page- avatar petrosagg avatar renovate-bot avatar thgreasi avatar wrboyce avatar xginn8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-balena-vpn's Issues

WebUI reporting incorrect device connectivity status.

Multiple users have reported discrepancy between connectivity status for their devices on the UI and actual reachability of the devices.
The following cases have been reported so far:

  • Device is seen 'Online' on the UI but is in fact unreachable via webterminal and ssh. JF ticket here
  • Device is not seen 'Online' on the UI, but the openvpn client on the device in fact has an active connection with the VPN backend. JF ticket here

Thus there seems to be a disconnect between the real connectivity status of the device and what the API thinks it is.

To Reproduce
Steps to reproduce the behavior:
Only very few devices (relative to the total number of devices online) encountered this problem. VPN backend k8s pods getting recycled is most likely what is causing this problem. There is however no sure shot way to reproduce the problem at will.

Devices with flaky connections appear "Online" but are unreachable

As mentioned in https://www.flowdock.com/app/rulemotion/resin-tech/threads/wdSU6cV3l2S2RgB9VL1ukrAb-V-,

i've noticed there are a number of devices that appear online, but upon trying to connect i get the following error:

โฏ balena ssh {{UUID}}
error: host error:
Connection to ssh.balena-devices.com closed.

According to https://github.com/balena-io/open-balena-vpn/blob/master/config/confd_env_backend/templates/server.conf.tmpl#L14, the openvpn keepalive behavior should be configured to 10/60 by default, so presumably these devices are pinging back just fine. Occasionally there will be out-of-order messages given the "Online" state:

21 Jan 2020 15:03:59 vpn[95]: debug: [vpn-124120] successfully updated state for device: uuid={{uuid}} worker_id=2 connected=true virtual_address={{vip}}
21 Jan 2020 15:43:00 vpn[98]: debug: [vpn-124118] successfully updated state for device: uuid={{uuid}} worker_id=3 connected=true virtual_address={{vip}}
21 Jan 2020 15:43:02 vpn[95]: debug: [vpn-124120] successfully updated state for device: uuid={{uuid}} worker_id=2 connected=false

Even pinging from the VPN instance directly shows slow networking:

--- {{uuid}}.vpn ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 64ms
rtt min/avg/max/mdev = 2365.627/2991.429/3526.436/482.054 ms, pipe 4

The devices affected are running balenaOS 2.2.0, fwiw.

Cloud erroneously reports VPN connection down

Describe the bug
A device may remain in HeartbeatOnly status, but really has a VPN connection with with open-balena-vpn [1]. Look for this sequence of messages in the server VPN logs for the device:

  1. Connected to worker X.1
  2. Connected to pod Y
  3. Quickly disconnected from pod Y
  4. Connected to worker X.1 again

There is a silent disconnect from worker X.1 between item 1 and item 4. There likely is a race condition where the disconnect and connect that worker came in the wrong order.

This pattern is distinct from a related sequence where the server logs showed a "dropping oos disconnect" event. See #287 .

[1] At time of writing we have not confirmed that the device really has a VPN connection, but it seems likely.

To Reproduce
Verify that the device is connected to the VPN if possible.

Logs / Screenshots
A number in parentheses at the beginning of a line corresponds to the position in the sequence listed above.

17 Aug 2022 12:16:38.893open-balena-vpn-5cf5d597b-djxcq vpn[551]: debug: [vpn-127199] successfully updated state for device: uuid=3d48dc2ce7eab29b063e38667417afce connected=true
17 Aug 2022 12:16:39.038open-balena-vpn-5cf5d597b-djxcq vpn[716]: info: [vpn-127199.1] connection established with client_id=1701769 uuid=3d48dc2ce7eab29b063e38667417afce
17 Aug 2022 12:16:40.942open-balena-vpn-5cf5d597b-djxcq vpn[551]: debug: [vpn-127199] successfully updated state for device: uuid=3d48dc2ce7eab29b063e38667417afce connected=false
(1) 17 Aug 2022 12:16:56.937open-balena-vpn-5cf5d597b-74wz5 vpn[714]: info: [vpn-127211.1] connection established with client_id=1555092 uuid=3d48dc2ce7eab29b063e38667417afce
17 Aug 2022 12:16:57.182open-balena-vpn-5cf5d597b-74wz5 vpn[549]: debug: [vpn-127211] successfully updated state for device: uuid=3d48dc2ce7eab29b063e38667417afce connected=true
17 Aug 2022 12:38:49.398open-balena-vpn-5cf5d597b-djxcq vpn[716]: info: [vpn-127199.1] connection established with client_id=1724529 uuid=3d48dc2ce7eab29b063e38667417afce
(2) 17 Aug 2022 12:38:50.022open-balena-vpn-5cf5d597b-djxcq vpn[551]: debug: [vpn-127199] successfully updated state for device: uuid=3d48dc2ce7eab29b063e38667417afce connected=true
(3) 17 Aug 2022 12:38:54.141open-balena-vpn-5cf5d597b-djxcq vpn[551]: debug: [vpn-127199] successfully updated state for device: uuid=3d48dc2ce7eab29b063e38667417afce connected=false
(4) 17 Aug 2022 12:39:08.755open-balena-vpn-5cf5d597b-74wz5 vpn[714]: info: [vpn-127211.1] connection established with client_id=1577933 uuid=3d48dc2ce7eab29b063e38667417afce
17 Aug 2022 13:01:36.929open-balena-vpn-5cf5d597b-74wz5 vpn[714]: info: [vpn-127211.1] connection established with client_id=1600626 uuid=3d48dc2ce7eab29b063e38667417afce
`

merge systemd services

The vpn-api and connect-proxy services should be merged into a single systemd service so that state can be more easily shared between workers.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • Update balena/open-balena-base Docker tag to v14.9.3
  • Update dependency typescript to v5
  • ๐Ÿ” Create all rate-limited PRs at once ๐Ÿ”

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

cargo
auth/Cargo.toml
  • ureq 2.4.0
dockerfile
Dockerfile
  • balena/open-balena-base v14.9.2
  • rust 1-bullseye
github-actions
.github/workflows/flowzone.yml
npm
package.json
  • @balena/env-parsing ^1.1.0
  • @balena/es-version ^1.0.1
  • @balena/node-metrics-gatherer ^6.0.3
  • @sentry/node ^7.12.0
  • bluebird ^3.7.2
  • compression ^1.7.4
  • event-stream ^4.0.1
  • eventemitter3 ^5.0.0
  • express ^4.18.1
  • lodash ^4.17.21
  • memoizee ^0.4.15
  • morgan ^1.10.0
  • netmask ^2.0.2
  • node-tunnel ^4.0.1
  • pinejs-client-request ^7.3.5
  • request ^2.88.2
  • request-promise ^4.2.6
  • telnet-client ^1.4.11
  • typed-error ^3.2.1
  • winston ^3.8.1
  • @balena/lint ^6.2.0
  • @types/bluebird ^3.5.36
  • @types/chai ^4.3.3
  • @types/chai-as-promised ^7.1.5
  • @types/compression ^1.7.2
  • @types/event-stream ^4.0.0
  • @types/express ^4.17.13
  • @types/lodash ^4.14.184
  • @types/memoizee ^0.4.8
  • @types/mocha ^10.0.0
  • @types/morgan ^1.9.3
  • @types/netmask ^1.0.30
  • @types/node ^18.11.7
  • @types/request-promise ^4.1.48
  • chai ^4.3.6
  • chai-as-promised ^7.1.1
  • husky ^8.0.1
  • lint-staged ^13.0.3
  • mocha ^10.0.0
  • nock ^13.2.9
  • openvpn-client 0.0.2
  • ts-node ^10.9.1
  • typescript ^4.8.2
  • node ^18.14.0
  • npm ^9.4.1
regex
Dockerfile
  • ncabatoff/process-exporter 0.7.10
Dockerfile
  • rust 1-bullseye as rust-builder

Upgrade base image version to patch vulnerabilities

Summary:

The currently used version of the open-balena-base image has the following vulnerabilities. Upgrading the version to the latest will solve them.

Issue Is:

+--------------+------------------+----------+------------------------------+------------------------------+---------------------------------------+
|   LIBRARY    | VULNERABILITY ID | SEVERITY |      INSTALLED VERSION       |        FIXED VERSION         |                 TITLE                 |
+--------------+------------------+----------+------------------------------+------------------------------+---------------------------------------+
| bind9-host   | CVE-2021-25215   | HIGH     | 1:9.11.5.P4+dfsg-5.1+deb10u3 | 1:9.11.5.P4+dfsg-5.1+deb10u5 | bind: An assertion check              |
|              |                  |          |                              |                              | can fail while answering              |
|              |                  |          |                              |                              | queries for DNAME records...          |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25215 |
+              +------------------+          +                              +                              +---------------------------------------+
|              | CVE-2021-25216   |          |                              |                              | bind: Vulnerability in                |
|              |                  |          |                              |                              | BIND's GSSAPI security policy         |
|              |                  |          |                              |                              | negotiation can be targeted by...     |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25216 |
+--------------+------------------+          +                              +                              +---------------------------------------+
| libbind9-161 | CVE-2021-25215   |          |                              |                              | bind: An assertion check              |
|              |                  |          |                              |                              | can fail while answering              |
|              |                  |          |                              |                              | queries for DNAME records...          |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25215 |
+              +------------------+          +                              +                              +---------------------------------------+
|              | CVE-2021-25216   |          |                              |                              | bind: Vulnerability in                |
|              |                  |          |                              |                              | BIND's GSSAPI security policy         |
|              |                  |          |                              |                              | negotiation can be targeted by...     |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25216 |
+--------------+------------------+          +                              +                              +---------------------------------------+
| libdns1104   | CVE-2021-25215   |          |                              |                              | bind: An assertion check              |
|              |                  |          |                              |                              | can fail while answering              |
|              |                  |          |                              |                              | queries for DNAME records...          |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25215 |
+              +------------------+          +                              +                              +---------------------------------------+
|              | CVE-2021-25216   |          |                              |                              | bind: Vulnerability in                |
|              |                  |          |                              |                              | BIND's GSSAPI security policy         |
|              |                  |          |                              |                              | negotiation can be targeted by...     |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25216 |
+--------------+------------------+          +                              +                              +---------------------------------------+
| libisc1100   | CVE-2021-25215   |          |                              |                              | bind: An assertion check              |
|              |                  |          |                              |                              | can fail while answering              |
|              |                  |          |                              |                              | queries for DNAME records...          |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25215 |
+              +------------------+          +                              +                              +---------------------------------------+
|              | CVE-2021-25216   |          |                              |                              | bind: Vulnerability in                |
|              |                  |          |                              |                              | BIND's GSSAPI security policy         |
|              |                  |          |                              |                              | negotiation can be targeted by...     |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25216 |
+--------------+------------------+          +                              +                              +---------------------------------------+
| libisccc161  | CVE-2021-25215   |          |                              |                              | bind: An assertion check              |
|              |                  |          |                              |                              | can fail while answering              |
|              |                  |          |                              |                              | queries for DNAME records...          |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25215 |
+              +------------------+          +                              +                              +---------------------------------------+
|              | CVE-2021-25216   |          |                              |                              | bind: Vulnerability in                |
|              |                  |          |                              |                              | BIND's GSSAPI security policy         |
|              |                  |          |                              |                              | negotiation can be targeted by...     |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25216 |
+--------------+------------------+          +                              +                              +---------------------------------------+
| libisccfg163 | CVE-2021-25215   |          |                              |                              | bind: An assertion check              |
|              |                  |          |                              |                              | can fail while answering              |
|              |                  |          |                              |                              | queries for DNAME records...          |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25215 |
+              +------------------+          +                              +                              +---------------------------------------+
|              | CVE-2021-25216   |          |                              |                              | bind: Vulnerability in                |
|              |                  |          |                              |                              | BIND's GSSAPI security policy         |
|              |                  |          |                              |                              | negotiation can be targeted by...     |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25216 |
+--------------+------------------+          +                              +                              +---------------------------------------+
| liblwres161  | CVE-2021-25215   |          |                              |                              | bind: An assertion check              |
|              |                  |          |                              |                              | can fail while answering              |
|              |                  |          |                              |                              | queries for DNAME records...          |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25215 |
+              +------------------+          +                              +                              +---------------------------------------+
|              | CVE-2021-25216   |          |                              |                              | bind: Vulnerability in                |
|              |                  |          |                              |                              | BIND's GSSAPI security policy         |
|              |                  |          |                              |                              | negotiation can be targeted by...     |
|              |                  |          |                              |                              | -->avd.aquasec.com/nvd/cve-2021-25216 |
+--------------+------------------+----------+------------------------------+------------------------------+---------------------------------------+

register as single service instance

As opposed to each worker registering as an individual service_instance, the master process should handle service registration and then make the service ID available to workers.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • Update balena/open-balena-base Docker tag to v17.0.9
  • Lock file maintenance
  • ๐Ÿ” Create all rate-limited PRs at once ๐Ÿ”

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

cargo
auth/Cargo.toml
  • ureq 2.9.1
dockerfile
Dockerfile
  • balena/open-balena-base v17.0.4
  • rust 1-bookworm
github-actions
.github/workflows/flowzone.yml
npm
package.json
  • @balena/env-parsing ^1.1.10
  • @balena/es-version ^1.0.3
  • @balena/node-metrics-gatherer ^6.0.3
  • @sentry/node ^7.99.0
  • bluebird ^3.7.2
  • compression ^1.7.4
  • event-stream ^4.0.1
  • eventemitter3 ^5.0.1
  • express ^4.18.2
  • lodash ^4.17.21
  • memoizee ^0.4.15
  • morgan ^1.10.0
  • netmask ^2.0.2
  • node-tunnel ^4.0.1
  • pinejs-client-request ^7.4.0
  • request ^2.88.2
  • request-promise ^4.2.6
  • telnet-client ^1.4.11
  • typed-error ^3.2.2
  • winston ^3.11.0
  • @balena/lint ^7.3.0
  • @types/bluebird ^3.5.42
  • @types/chai ^4.3.11
  • @types/chai-as-promised ^7.1.8
  • @types/compression ^1.7.5
  • @types/event-stream ^4.0.5
  • @types/express ^4.17.21
  • @types/lodash ^4.14.202
  • @types/memoizee ^0.4.11
  • @types/mocha ^10.0.6
  • @types/morgan ^1.9.9
  • @types/netmask ^2.0.5
  • @types/node ^20.11.16
  • @types/request-promise ^4.1.51
  • chai ^4.4.1
  • chai-as-promised ^7.1.1
  • husky ^8.0.3
  • lint-staged ^15.2.1
  • mocha ^10.2.0
  • nock ^13.5.1
  • openvpn-client 0.0.2
  • ts-node ^10.9.2
  • typescript ^5.3.3
  • node ^21.6.1
  • npm ^10.4.0
regex
Dockerfile
  • debian_12/haproxy 2.6.12-1+deb12u1

Cloud erroneously reports VPN connection down after dropping "oos disconnect" event

Describe the bug
balenaCloud may report the a device is in Heartbeat Only status even though there is a connection to the device. This status may remain for hours or days while the device is connected to the VPN.

In the case reported here, the server logs show a series of events in short order:

  1. A device is connected to a pod A, worker B (A.B)
  2. The device loses the connection to (A.B) but the server does not recognize this loss yet
  3. The device connects and quickly disconnects to another pod C, placing the device in Heartbeat Only status
  4. The device connects to pod A, worker D (A.D). However, this event does not put the device in Online state because pod A has not received the disconnect event, and thinks the device state already is Online.
  5. The server receives the disconnect event for pod/worker A.B, and disregards it, dropping the oos disconnect event

To Reproduce
Verify that the device is online if possible.

Logs / Screenshots
The server VPN logs below provide an example.

11 Aug 2022 17:45:54.968cloudlink-fc9c8bf9d-q4tnv vpn[549]: debug: [vpn-127188] successfully updated state for device: uuid=d012 connected=false
11 Aug 2022 17:46:03.701cloudlink-fc9c8bf9d-q4tnv vpn[722]: info: [vpn-127188.4] connection established with client_id=458 uuid=d012
11 Aug 2022 17:48:28.417cloudlink-fc9c8bf9d-php92 vpn[549]: debug: [vpn-127176] successfully updated state for device: uuid=d012 connected=true
11 Aug 2022 17:48:35.045cloudlink-fc9c8bf9d-php92 vpn[733]: info: [vpn-127176.4] connection established with client_id=8349 uuid=d012
11 Aug 2022 19:09:19.739cloudlink-fc9c8bf9d-sf5gt vpn[549]: debug: [vpn-127177] successfully updated state for device: uuid=d012 connected=true
11 Aug 2022 19:09:21.913cloudlink-fc9c8bf9d-sf5gt vpn[549]: debug: [vpn-127177] successfully updated state for device: uuid=d012 connected=false
11 Aug 2022 19:09:30.991cloudlink-fc9c8bf9d-sf5gt vpn[736]: info: [vpn-127177.1] connection established with client_id=19232 uuid=d012
11 Aug 2022 19:10:02.162cloudlink-fc9c8bf9d-php92 vpn[731]: info: [vpn-127176.2] connection established with client_id=17537 uuid=d012
11 Aug 2022 19:10:08.162cloudlink-fc9c8bf9d-php92 vpn[549]: warning: [vpn-127176] dropping oos disconnect event for uuid=d012 worker=4 (expected=2)
11 Aug 2022 19:40:30.565cloudlink-fc9c8bf9d-q4tnv vpn[722]: info: [vpn-127188.4] disconnecting d012
11 Aug 2022 19:40:30.629cloudlink-fc9c8bf9d-q4tnv vpn[722]: debug: [vpn-127188.4] successfully updated state for device: uuid=d012 connected=false

Should limit batching connectivity events to avoid too large POST bodies leading to 413 errors

See: https://www.flowdock.com/app/rulemotion/resin-tech/threads/OhMy8GplWPRfFES56zRO6-0WNh9
See:

const response: IncomingMessage = await request
.post({
url: `https://${BALENA_API_HOST}/services/vpn/client-${eventType}`,
timeout: REQUEST_TIMEOUT,
json: true,
body: {
serviceId,
uuids,
connected,
},
headers: { Authorization: `Bearer ${apiKey}` },
})
.promise()
.timeout(REQUEST_TIMEOUT);

Implement exponential backoff on VPN link failure to limit data usage

A user reported a device on a cellular connection that was using about "60MB per 3-4 minutes" (1GB per hour). We were unable to open a command terminal to either the host OS or the single app container, or to reboot the device via the web dashboard. Logentries for resin-vpn indicated roughly 4 VPN connection attempts per second:

04 Jun 2019 20:32:59.939ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:00.122ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:00.122ab30ad763b7d open-balena[75]: info: [proxy] [worker-1] tunnel requested to device [device-uuid] on port 22222
04 Jun 2019 20:33:00.206ab30ad763b7d open-balena[75]: info: [proxy] [worker-1] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:00.340ab30ad763b7d open-balena[75]: info: [proxy] [worker-1] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:00.340ab30ad763b7d open-balena[75]: info: [proxy] [worker-2] tunnel requested to device [device-uuid] on port 22222
04 Jun 2019 20:33:00.360ab30ad763b7d open-balena[75]: info: [proxy] [worker-2] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:00.940ab30ad763b7d open-balena[75]: info: [proxy] [worker-2] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:00.940ab30ad763b7d open-balena[75]: info: [proxy] [worker-2] tunnel requested to device [device-uuid] on port 22222
04 Jun 2019 20:33:01.011ab30ad763b7d open-balena[75]: info: [proxy] [worker-2] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:01.181ab30ad763b7d open-balena[75]: info: [proxy] [worker-2] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:01.181ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] tunnel requested to device [device-uuid] on port 22222
04 Jun 2019 20:33:01.181ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:01.344ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:01.344ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] tunnel requested to device [device-uuid] on port 22222
04 Jun 2019 20:33:01.410ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:01.497ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:01.544ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] tunnel requested to device [device-uuid] on port 22222
04 Jun 2019 20:33:01.544ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:01.744ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:01.744ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] tunnel requested to device [device-uuid] on port 22222
04 Jun 2019 20:33:01.744ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] connecting to [device-uuid].vpn:22222
04 Jun 2019 20:33:01.872ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] forwarding tunnel request for [device-uuid]:22222 via 52.207.237.69
04 Jun 2019 20:33:01.872ab30ad763b7d open-balena[75]: info: [proxy] [worker-3] tunnel requested to device [device-uuid] on port 22222

It is not clear whether these reconnection attempts explain the whole data usage, but there seems to be a case for some kind of capped exponential backoff of VPN connection failures, to limit data usage in such conditions.

CONNECT proxy should accept TLS / HTTPS connections

The balena CLI tunnel command (balena tunnel) uses the the open-balena-vpn CONNECT proxy at vpn.balena-cloud.com:3128 with 'basic' HTTP authentication, over a plain unencrypted TCP connection. This is not secure enough - at risk of local network sniffing. The feature request is for TLS / SSL / HTTPS to be supported by the CONNECT proxy.

Customer report: https://forums.balena.io/t/balena-tunnel-connections-are-not-encrypted/191826/
Internal discussion: https://www.flowdock.com/app/rulemotion/resin-tech/threads/nOheKZaBwyMwC2PmzfPw0NxYEMl

Related balena-cli issue: balena-io/balena-cli#2042

Device in Heartbeat Only mode but VPN connected

Like the title states, a user is able to confirm that a VPN connection exists, but the balena dashboard reports the device is in Heartbeat Only mode. So the cloud thinks there is not a VPN connection.

In the log below, we see what looks like a race condition between VPN instances. The numbers in parentheses refer to the log lines below.

  • (1) the device is connected on 2022-12-04
  • (3,4) the device connects to VPN instance 127577
  • (2,5) the device connects and disconnects to VPN instance 127568
  • (6) VPN instance 127568 recognizes the device as connected

Logs / Screenshots

(1) 2022-12-04 16:09:43 [854936.832432] vpn[718]: info: [vpn-127564.7] connection established with client_id=1506309 uuid=6ea096ec955ef0fff8eeedbe64316564
(2) 2022-12-05 05:18:17 [902248.477595] vpn[548]: debug: [vpn-127568] successfully updated state for device: uuid=6ea096ec955ef0fff8eeedbe64316564 connected=true
(3) 2022-12-05 05:18:32 [ 1150.154516] vpn[714]: info: [vpn-127577.2] connection established with client_id=3134 uuid=6ea096ec955ef0fff8eeedbe64316564
(4) 2022-12-05 05:18:33 [ 1150.471219] vpn[548]: debug: [vpn-127577] successfully updated state for device: uuid=6ea096ec955ef0fff8eeedbe64316564 connected=true
(5) 2022-12-05 05:18:33 [902264.764050] vpn[548]: debug: [vpn-127568] successfully updated state for device: uuid=6ea096ec955ef0fff8eeedbe64316564 connected=false
(6) 2022-12-05 05:18:47 [902278.515212] vpn[716]: info: [vpn-127568.4] connection established with client_id=1613412 uuid=6ea096ec955ef0fff8eeedbe64316564```

returned a non-zero code: 1

Hello Please help I am getting this error while balena push fleet-name, Can any one help me how to install vpn on balena-os

Step 1/36 : FROM balena/open-balena-base:v13.3.1 as base
[Info] Still Working...
[main] ---> eabf902914f1
[main] Step 2/36 : FROM base as builder
[main] ---> eabf902914f1
[main] Step 3/36 : COPY package.json package-lock.json /usr/src/app/
[main] ---> 4dd7120289a5
[main] Step 4/36 : RUN npm ci && npm cache clean --force 2>/dev/null
[main] ---> Running in 241d2d772b8f
[main] standard_init_linux.go:211: exec user process caused "exec format error"
[main]
[main] Removing intermediate container 241d2d772b8f
[main] The command '/bin/sh -c npm ci && npm cache clean --force 2>/dev/null' returned a non-zero code: 1
[Info] Uploading images
[Success] Successfully uploaded images
[Error] Some services failed to build:
[Error] Service: main
[Error] Error: The command '/bin/sh -c npm ci && npm cache clean --force 2>/dev/null' returned a non-zero code: 1
[Info] Built on arm06
[Error] Not deploying release.
Remote build failed

VPN link to device down for hours or days following "dropping oos disconnect event" warning in backend logs

Update 2022-08-16
This issue likely refers to the same underlying cause as #287.

Device logs shared by a user (see linked thread):

 2022-03-18 21:46:05.399237+00 | openvpn                                    | Fri Mar 18 21:46:05 2022 OpenVPN 2.4.7 x86_64-poky-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Mar 15 2021
...
 2022-03-18 21:46:12.228184+00 | openvpn                                    | Fri Mar 18 21:46:12 2022 Initialization Sequence Completed

 2022-03-19 10:05:07.108315+00 | systemd-tmpfiles                           | [/etc/tmpfiles.d/openvpn.conf:1] Line references path below legacy directory /var/run/, updating /var/run/openvpn โ†’ /run/openvpn; please update the tmpfiles.d/ drop-in file accordingly.
 2022-03-20 10:05:07.11587+00  | systemd-tmpfiles                           | [/etc/tmpfiles.d/openvpn.conf:1] Line references path below legacy directory /var/run/, updating /var/run/openvpn โ†’ /run/openvpn; please update the tmpfiles.d/ drop-in file accordingly.

 2022-03-20 13:50:36.516331+00 | openvpn                                    | Sun Mar 20 13:50:36 2022 [vpn.balena-cloud.com] Inactivity timeout (--ping-restart), restarting

In the logs above, the VPN link was down for around 40 hours (as reported by the user in the linked support thread) between timestamps 2022-03-18 21:46 (Initialization Sequence Completed) and 2022-03-20 13:50 (Inactivity timeout (--ping-restart), restarting).

Correlating those device logs with the 'resin-vpn' backend logs from logentries.com, I see that the outage begins with the following logged warning message (in backend logs):

18 Mar 2022 21:46:56.625open-balena-vpn-5f67b5dd87-pns9p vpn[517]: warning: [vpn-125520] dropping oos disconnect event for uuid=dadc266427f2de52562a5fde78ee0359 worker=7 (expected=3)

Logentries link: https://logentries.com/app/5915e005#/search/log/5b01a592?log_q=where(dadc266427f2de52562a5fde78ee0359)&f=1647637200000&t=1647810000000

The 'dropping oos disconnect event' warning message comes from the following lines of code (VPN backend service):
https://github.com/balena-io/open-balena-vpn/blob/v11.2.4/src/api.ts#L121-L132

		if (workerMap[uuid] !== workerId) {
			logger.warning(
				`dropping oos disconnect event for uuid=${uuid} worker=${workerId} (expected=${workerMap[uuid]})`,
			);
			captureException(
				new Error('Out of Sync OpenVPN Client Event Received'),
				'openvpn-oos-event',
				{ tags: { uuid }, req },
			);
			return res.sendStatus(400);
		}

So 'oos' stands for 'out of sync'. Note also "worker=7 (expected=3)" in the warning message. Could it be that, for some reason, the backend worker assigned to a device (if there is such as thing, not sure that workers are assigned to devices) changed and this somehow caused the the openvpn service on the device to become idle? The backend sends code 400 (Bad Request) to the device. Long-shot theory, and it is particularly puzzling that openvpn would tolerate 40 hours of radio silence following error code 400. What's going on?

Device running balenaOS v2.95.1, supervisor v12.11.36.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.