Giter Site home page Giter Site logo

exoip's Introduction

exoip: heartbeat monitor for Exoscale Elastic IP Addresses

Build Status Docker image Latest version

exoip is a small tool meant to make the process of watching Exoscale Elastic IP Addresses and performing state transitions much easier.

$ go install github.com/exoscale/exoip/cmd/exoip

exoip can run in one of three modes:

  • Association Mode (-A): associates an EIP with an instance and exit.

  • Dissociation Mode (-D): dissociates an EIP from an instance and exit.

  • Watchdog Mode (-W): watches for peer liveness and handle necessary state transitions.

Watchdog protocol

The goal of exoip is to assert liveness of peers participating in the ownership of an Exoscale Elastic IP. The assumption is that at least two peers will participate in the election process.

exoip uses a protocol very similar to CARP and to some extent VRRP.

The idea is quite simple: for each of its configured peers, exoip sends a 24-byte payload through UDP. The payload consists of a protocol version, a (repeated, for error checking) priority to help elect masters, the Elastic IP that must be shared accross alll peers, and the peer's Nic ID.

The layout of the payload is as follows:

  2bytes  2bytes  4 bytes         16 bytes
+-------+-------+---------------+-------------------------------+
| PROTO | PRIO  |    EIP        |   NicID (128bit UUID)         |
+-------+-------+---------------+-------------------------------+

When a peer fails to advertise for a configurable period of time, it is considered dead and action is taken to reclaim its ownership of the configured Elastic IP Address.

Configuration

exoip is configured through command line arguments or an equivalent environment variable:

-A
    Association mode (exclusive with -D and -W)
-D
    Dissociation mode (exclusive with -A and -W)
-W
    Watchdog mode (exclusive with -A and -D)
-P int (or IF_HOST_PRIORITY)
    Host priority (lowest wins) (default 10, maximum 255)
-l string (or IF_BIND_TO)
    Address to bind to (default ":12345")
-i string (or IF_EXOSCALE_INSTANCE_ID)
    Instance ID of one self (useful when running from a container)
-p string (or IF_EXOSCALE_PEERS)
    peers to communicate with (may be repeated and/or comma-separated)
-G string (or IF_EXOSCALE_PEER_GROUP)
    Security-Group to use to create/maintain the list of peers
-r int (or IF_DEAD_RATIO)
    Dead ratio (default 3)
-t int (or IF_ADVERTISEMENT_INTERVAL)
    Advertisement interval in seconds (default 1)
-xi string (or IF_ADDRESS)
    Exoscale Elastic IP to watch over
-xk string (or IF_EXOSCALE_API_KEY)
    Exoscale API Key
-xs string (or IF_EXOSCALE_API_SECRET)
    Exoscale API Secret

Signals

When running as a Docker container, signals are the best way to interact with the running container.

exoip listens to SIGUSR1 and SIGUSR2 which will influence the current priority value by respectively doing a -1 or a +1 on it. SIGUSR1 will promote it to a higher rank while SIGUSR2 will lower its rank. A simple way to put on backup mode a node without restarting exoip.

SIGTERM or SIGINT will attempt to disassociate the Elastic IP before quitting.

Information

$ echo -n "info" | nc -4u -w1 0.0.0.0 12345

Building

If you wish to inspect exoip and build it by yourself, you can install it by using go get.

cd cmd/exoip
go build

Setup using Cloud Init

As shown in the HAProxy Elastic IP Automatic failover article, exoip can be setup as a dummy net interface. Below is the article configuration described using Cloud Init (supported by Ubuntu, Debian, RHEL, CentOS, etc.)

#cloud-config

package_update: true
package_upgrade: true

packages:
- ifupdown

write_files:
- path: /etc/network/interfaces
  content: |
    source /etc/network/interfaces.d/*.cfg
- path: /etc/network/interfaces.d/51-exoip.cfg
  content: |
    auto lo:1
    iface lo:1 inet static
      address 198.51.100.50              # change me
      netmask 255.255.255.255
      exoscale-peer-group load-balancer  # change me
      exoscale-api-key EXO....           # change me
      exoscale-api-secret LZ...          # change me
      up exoip -W &
      down killall exoip

runcmd:
- wget https://github.com/exoscale/exoip/releases/download/v0.4.3/exoip_0.4.3_linux_amd64.deb
- wget https://github.com/exoscale/exoip/releases/download/v0.4.3/exoip_0.4.3_linux_amd64.deb.sig
- gpg --keyserver hkps://keys.opengpg.org --recv-keys B2DB6B250321137D9DB7210281426F034A3D05F7
- gpg --verify --trust-model always exoip_0.4.3_linux_amd64.deb.sig
- sudo dpkg -i exoip_0.4.3_linux_amd64.deb
- sudo ifup lo:1

exoip's People

Contributors

brutasse avatar exo-cedric avatar greut avatar pst avatar pyr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

fakod

exoip's Issues

be resistent to HTML

Sometimes HTTP fails...

May 31 17:12:15 lb01 exoip[951]: fatal: invalid character '<' looking for beginning of value
May 31 17:12:15 lb01 ifup[760]: fatal error: invalid character '<' looking for beginning of value

Cannot start container with newest version

The container does not start in the newest version when started in watchdog mode. It fails with the following error:

could not execute: exec: "ip": executable file not found in $PATH
exoip [CRIT   ] fatal: could not find metadata server
fatal error: could not find metadata server

My best guess would be, that the change to the linuxkit-base-image caused the problem, because the image does not provide any environment, so the ip command is not found.

no resilience on network issues

UseCase:
Running an Exoip Pod in a Kubernetes Cluster. All exoip Container default priority.

exoip removes peer Nics if it fails to connect the node. If the node that could not be connected (in case of a network issue or ACL misconfiguration) is still alive, no one re-adds the Nic again.

exec user process caused "no such file or directory"

Running exoip in a Kubernetes environment. Version latest (since there are no other tags)

kubectl -n kube-system logs exoip-xxx-master-1
panic: standard_init_linux.go:178: exec user process caused "no such file or directory" [recovered]
	panic: standard_init_linux.go:178: exec user process caused "no such file or directory"

goroutine 1 [running, locked to thread]:
panic(0x7eb2e0, 0xc820136f50)
	/usr/lib/go1.6/src/runtime/panic.go:481 +0x3e6
github.com/urfave/cli.HandleAction.func1(0xc8200c52f8)
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/Godeps/_workspace/src/github.com/urfave/cli/app.go:478 +0x38e
panic(0x7eb2e0, 0xc820136f50)
	/usr/lib/go1.6/src/runtime/panic.go:443 +0x4e9
github.com/opencontainers/runc/libcontainer.(*LinuxFactory).StartInitialization.func1(0xc8200c4c08, 0xc82001a070, 0xc8200c4d18)
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/factory_linux.go:259 +0x136
github.com/opencontainers/runc/libcontainer.(*LinuxFactory).StartInitialization(0xc8200586e0, 0x7fe1b2243320, 0xc820136f50)
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/factory_linux.go:277 +0x5b1
main.glob.func8(0xc820076780, 0x0, 0x0)
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/main_unix.go:26 +0x68
reflect.Value.call(0x74fac0, 0x9012a0, 0x13, 0x847808, 0x4, 0xc8200c5278, 0x1, 0x1, 0x0, 0x0, ...)
	/usr/lib/go1.6/src/reflect/value.go:435 +0x120d
reflect.Value.Call(0x74fac0, 0x9012a0, 0x13, 0xc8200c5278, 0x1, 0x1, 0x0, 0x0, 0x0)
	/usr/lib/go1.6/src/reflect/value.go:303 +0xb1
github.com/urfave/cli.HandleAction(0x74fac0, 0x9012a0, 0xc820076780, 0x0, 0x0)
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/Godeps/_workspace/src/github.com/urfave/cli/app.go:487 +0x2ee
github.com/urfave/cli.Command.Run(0x84a6b8, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8e05e0, 0x51, 0x0, ...)
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/Godeps/_workspace/src/github.com/urfave/cli/command.go:191 +0xfec
github.com/urfave/cli.(*App).Run(0xc820001980, 0xc82000a100, 0x2, 0x2, 0x0, 0x0)
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/Godeps/_workspace/src/github.com/urfave/cli/app.go:240 +0xaa4
main.main()
	/build/amd64-usr/var/tmp/portage/app-emulation/runc-1.0.0_rc2_p9/work/runc-1.0.0_rc2_p9/main.go:137 +0xe24

Possible nil pointer derefence on syslog.New failure

I was curious on how the logger was implemented, and by going through the code I stumbled upon this minor issue.

At

AssertSuccess(err)

AssertSuccess will call Logger.Crit on a nil Logger in case of non-nil error.

This simply means that the error message written to stderr is a little more cryptic than wanted if the connection to the system log daemon fails :-).

standard_init_linux.go

With latest
standard_init_linux.go:185: exec user process caused "no such file or directory"

Support netplan

We have received a request that ExoIP does not work properly with Ubuntu 18.04 due to the switch to netplan. Can we implement support for that?

list of peers remains until exoip restart

UseCase:
Running an Exoip Pod in a Kubernetes Cluster

The exoip pod gets started on Cluster startup. Every exoip instance stores its peers, related to the given security group. OK, so far. But if I create an additional cluster node afterwards, peers do not get updated on the initial nodes. So no one will recognize if the new node dies again.
This is an issue for dynamically created nodes.

So peers should get updated more frequently than on startup.

exoip v0.3.15 - failed to update peers when service is stopped

Aug 28 15:16:59 kube-master-pp001 docker-exoip[17131]: exoip [INFO ] claimed ip xxxx on nic xxxx
Aug 28 15:16:59 kube-master-pp001 docker-exoip[17131]: exoip [WARNING] CheckState took longer than allowed interval (1000ms): 2304ms
Aug 28 15:16:59 kube-master-pp001 docker-exoip[17131]: exoip [WARNING] PingPeers took longer than allowed interval (1000ms): 1304ms
Aug 28 15:17:34 kube-master-pp001 docker-exoip[17131]: exoip [CRIT ] failure sending to peer xxx: write udp xxxx:52139->xxxx:12345: write: connection refused
Aug 28 15:17:35 kube-master-pp001 docker-exoip[17131]: exoip [INFO ] peer xxxx last seen 2018-08-28T13:17:32Z (3097ms ago), considering dead.
Aug 28 15:17:36 kube-master-pp001 docker-exoip[17131]: exoip [CRIT ] failure sending to peer xxx: write udp xxxx:52139->xxxx:12345: write: connection refused
Aug 28 15:17:37 kube-master-pp001 docker-exoip[17131]: exoip [INFO ] released ip 159.100.244.186 from nic 95dd6d1c-b442-4830-b074-4b51fc239d6e
Aug 28 15:17:38 kube-master-pp001 docker-exoip[17131]: exoip [CRIT ] default nic ID doesn't match
Aug 28 15:17:38 kube-master-pp001 docker-exoip[17131]: exoip [WARNING] CheckState took longer than allowed interval (1000ms): 3111ms
Aug 28 15:17:38 kube-master-pp001 docker-exoip[17131]: exoip [WARNING] PingPeers took longer than allowed interval (1000ms): 1111ms
Aug 28 15:17:39 kube-master-pp001 docker-exoip[17131]: exoip [CRIT ] failure sending to peer xxx: write udp xxxx:52139->xxxx:12345: write: connection refused

Dockerfile?

Is it possible to have the Dockerfile of exoscale/exoip here?

Versioning of exoip image

Hi guys, the image on docker hub is last updated 4 days.
However, I cannot see a code change. Additionally there are tags on the Git Repo but none at docker hub.
Can we have some versioning here (tags an Github)? Or do you expect to build the images by our own?

best Christopher

map SIGUSR1 and SIGUSR2 to two priorities [RFC]

Allows to control exoip priorities using SIGUSR1 and SIGUSR2

Two options:

  • configure two priorities and map them to the signals
  • use USR1 to lower the prioritiy value, and USR2 to increase it

Less configuration

Hey!

To lessen the amount of configuration needed, the only mandatory parameter could be the EIP to use. exoip would add the current VM to the specified EIP and retrieve the peers from the API (those are VM that share the EIP). The cluster ID would be the EIP. The list of peers need to be refreshed at regular interval (or maybe only when the current master goes down for non-master or when the master receives messages from unknown peers). The metadata server could be used to retrieve the current VM.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.