Giter Site home page Giter Site logo

More testes with ALB + Consul for scenarios with total reconstruction from an node from scratch about ap-application-load-balancer HOT 2 OPEN

fititnt avatar fititnt commented on September 23, 2024
More testes with ALB + Consul for scenarios with total reconstruction from an node from scratch

from ap-application-load-balancer.

Comments (2)

fititnt avatar fititnt commented on September 23, 2024

Comments from https://github.com/fititnt/ap-alb-cluster-demo (private repository at this moment).

Ok, maybe one way is MASTER and SERVER consul nodes have the same /var/consul/serf/local.keyring (some people says that can just delete from all nodes), and then restart (not just reload) the MASTER. If reloading one SERVER that is not the mater it gone down.

This may have some reasoning behind to only works with restart and not just reload. Different from what I would expect from softwares like galera cluster (that in worst case scenario, is necessary to boostrap again the full cluster and specify the donor) in this case only doing restart worked. But is important to take in account that beyond the server that I recreated from scrach, the other 2 nodes where health, did not crashed, and had the same content. Maybe the way Consul was implemented, all nodes with this

    consul_node_role: server
    consul_bootstrap_expect: true

with a full restart and 2 from 3 nodes with same content, the servers decided on the fly decide the new leader.

## Initial state: 
# - aguia-pescadora-echo: MASTER
# - aguia-pescadora-foxtrot: SERVER
# - aguia-pescadora-delta: FAILED

#$ ansible all -a "cat /var/consul/serf/local.keyring"
#aguia-pescadora-delta.etica.ai | CHANGED | rc=0 >>
#["0Ywo8MG0Cdu2MHKbH0DTLO1iOa1MRhGemXf246NPVD0="]
#
#aguia-pescadora-echo.etica.ai | CHANGED | rc=0 >>
#["0Ywo8MG0Cdu2MHKbH0DTLO1iOa1MRhGemXf246NPVD0="]
#
#aguia-pescadora-foxtrot.etica.ai | CHANGED | rc=0 >>
#["0Ywo8MG0Cdu2MHKbH0DTLO1iOa1MRhGemXf246NPVD0="]

# $ ansible all -a "consul keyring -list"              
# aguia-pescadora-delta.etica.ai | CHANGED | rc=0 >>
# ==> Gathering installed encryption keys...
# WAN:
#   0Ywo8MG0Cdu2MHKbH0DTLO1iOa1MRhGemXf246NPVD0= [1/1]
# dc-germany (LAN):
#   0Ywo8MG0Cdu2MHKbH0DTLO1iOa1MRhGemXf246NPVD0= [1/1]
# aguia-pescadora-echo.etica.ai | CHANGED | rc=0 >>
# ==> Gathering installed encryption keys...
# WAN:
#   XjDjtEywI3QOwIfDjGksG21f0fbhXRL+hT/daevcba0= [2/2]
# dc-germany (LAN):
#   XjDjtEywI3QOwIfDjGksG21f0fbhXRL+hT/daevcba0= [2/2]
# aguia-pescadora-foxtrot.etica.ai | CHANGED | rc=0 >>
# ==> Gathering installed encryption keys...
# WAN:
#   XjDjtEywI3QOwIfDjGksG21f0fbhXRL+hT/daevcba0= [2/2]
#
# dc-germany (LAN):
#
#
# root@aguia-pescadora-delta:/var/consul/serf# consul join 167.86.127.220
# Error joining address '167.86.127.220': Unexpected response code: 500 (1 error occurred:
#	* Failed to join 167.86.127.220: No installed keys could decrypt the message
#
#)
#Failed to join any nodes.
#
## ACTION 1
# Copied /var/consul/serf/local.keyring from Echo (MASTER) to other nodes
#    ansible-playbook roles/ap-application-load-balancer/ad-hoc-alb/distribute-file.yml -i apd.etica.ai,ape.etica.ai -e "donor=ape.etica.ai path=/var/consul/serf/local.keyring"
## ACTION 2
# Reload consul on all nodes
#    systemctl reload consul
# Result:
# NO CHANGE:
# - aguia-pescadora-echo: MASTER
# - aguia-pescadora-foxtrot: SERVER
# - aguia-pescadora-delta: FAILED
## ACTION 3:
# root@echo (the master): and issued one RESTART (not a reload)
# Result: ECHO failed state, only echo is active, and is MASTER
# - aguia-pescadora-echo: MASTER
# - aguia-pescadora-foxtrot: FAILED
# - aguia-pescadora-delta: FAILED
## ACTIOn 4:
# Restarted the last and operational node, ECHO (Note: all servers have same /var/consul/serf/local.keyring)
# Result:
# Full operationa cluster again
# - aguia-pescadora-delta: SERVER
# - aguia-pescadora-echo: MASTER
# - aguia-pescadora-foxtrot: SERVER
#
# So, reload is not suficient, needs RESTART (No, did not need to do a full boostrap again, like would be with MariaDB/Galera)

from ap-application-load-balancer.

fititnt avatar fititnt commented on September 23, 2024

Humm...Actually, this behavior, even if not intuitive compared with Galera cluster (that in my experience in case of total cluster fail, it will need human intervention) the Consul may actually be more resilient in case of total node going down.

Nice, nice.

from ap-application-load-balancer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.