Giter Site home page Giter Site logo

Comments (16)

jasonodonnell avatar jasonodonnell commented on July 24, 2024 5

Hi @iniinikoski, my apologies for the delay.

I ended up needing to change more than I originally thought to make this work. I'll be finalizing this work in the coming weeks but wanted to get you something in the meantime.

You can find the code changes in the raft branch: https://github.com/hashicorp/vault-helm/tree/raft.

Here's a working example:

Vault Helm Raft

This is a tech-preview of Raft and as such, should not be used in production.

Clone the repo:

mkdir ~/vault-raft && cd ~/vault-raft
git clone [email protected]:hashicorp/vault-helm.git && cd ~/vault-raft/vault-helm
git checkout raft

Next, create a custom values-raft.yaml file so we can just inject our custom values:

cat >~/vault-raft/vault-helm/values-raft.yaml <<EOL
server:
  ha:
    enabled: true

    raft:
      enabled: true

    config: |
      ui = true
      cluster_addr = "https://POD_IP:8201"

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }

      storage "raft" {
        path = "/vault/data"
      }
EOL
helm install --name=vault -f values-raft.yaml .

Once deployed you can initialize vault-0 and unseal:

Note: vault-0 is going to be our leader initially.

kubectl exec -ti vault-0 -- vault operator init
kubectl exec -ti vault-0 -- vault operator unseal

Next, for each other vault pod, join the raft cluster and unseal:

kubectl exec -ti <NAME OF POD> -- vault operator raft join http://vault-0.vault-headless:8200
kubectl exec -ti <NAME OF POD> -- vault operator unseal

After logging into Vault using a token, you can check the configuration of Raft:

kubectl exec -ti vault-0 -- vault login
kubectl exec -ti vault-0 -- vault operator raft configuration -format=json

Or using status:

kubectl exec -ti vault-0 -- vault status

from vault-helm.

jasonodonnell avatar jasonodonnell commented on July 24, 2024 4

Hi @iniinikoski, thanks for bringing this to my attention!

I agree this should be documented (and supported). Currently right now there's a small limitation with the HA mode that doesn't create data volumes. Basically data volumes only get created in standalone mode.

I will adjust some things here to support HA mode with data volumes (raft) and document how to do this.

Leaving this issue open to track progress on this feature.

from vault-helm.

jasonodonnell avatar jasonodonnell commented on July 24, 2024 4

@Josua-SR minimum cluster size is 3.

Good catch on the API_ADDR env, PR to fix that here: #237.

The old Raft branch and instructions are no longer relevant and I would advise everyone to stop using them.

With the new feature here's how you bootstrap the Raft cluster:

$ helm install vault \
    --set='server.ha.enabled=true' \
    --set='server.ha.raft.enabled=true' .

$ kubectl exec -ti vault-0 -- vault operator init
$ kubectl exec -ti vault-0 -- vault operator unseal

$ kubectl exec -ti vault-1 -- vault operator raft join http://vault-0.vault-internal:8200
$ kubectl exec -ti vault-1 -- vault operator unseal

$ kubectl exec -ti vault-2 -- vault operator raft join http://vault-0.vault-internal:8200
$ kubectl exec -ti vault-2 -- vault operator unseal

$ kubectl exec -ti vault-0 -- vault status

Note: Helm does not delete volumes and it's possible you have old PVCs hanging around from a failed attempt. Make sure to clean them up.

from vault-helm.

sdeoras avatar sdeoras commented on July 24, 2024 2

I am trying out raft storage (v1.4.0) via helm chart (v0.5.0) with TLS enabled and having some issues with cert validation. Is there a way I specify to skip cert validation when joining leader node. I am already using VAULT_SKIP_VERIFY=true but that does not seem to affect the vault operator raft join call to the leader node.

2020-04-13T01:28:23.075Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault-0.vault-internal:8200
2020-04-13T01:28:23.080Z [INFO]  core: join attempt failed: error="error during raft bootstrap init call: Put https://vault-0.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge: x509: certificate is valid for <redacted>, not vault-0.vault-internal"
2020-04-13T01:28:23.080Z [ERROR] core: failed to join raft cluster: error="failed to join any raft leader node"

from vault-helm.

lawliet89 avatar lawliet89 commented on July 24, 2024 1

Might be more useful to implement this after Vault supports retry join. cf. hashicorp/vault#7856

from vault-helm.

ngarafol avatar ngarafol commented on July 24, 2024 1

Hi, I am using this example, but I seem to be having problems when deleting pod. vault service picks up new pod but raft is still using old pod ip, and new pod doesnt join back to cluster.

Cant rejoin new pod manually since raft is already initialized. Using vault 1.3.2

Opened issue here with more info: hashicorp/vault#8489

from vault-helm.

jasonodonnell avatar jasonodonnell commented on July 24, 2024

Tracking this feature here: #58

from vault-helm.

Josua-SR avatar Josua-SR commented on July 24, 2024

It appears that a few days ago support was merged into the master branch. So I tried using the same isntructions @jasonodonnell posted above, but I never get to a functioning vault cluster.

Errors are as follows:

After raft join, unseal fails with Error unsealing: context deadline exceeded.

At this point, vault status reports that vault-0 is not a leader anymore:

Cluster Name           vault-cluster-e4db2105
Cluster ID             41716ef5-fb2b-9b46-6466-51998a3bbf70
HA Enabled             true
HA Cluster             https://vault-0.vault-internal:8201
HA Mode                standby
Active Node Address    http-internal://10.233.123.248:8200

While vault status on vault-1 shows that it stayed sealed, and exits with error before printing the active node:

kubectl exec -ti vault-1 -- vault status
Key                Value
---                -----
Seal Type          shamir
Initialized        true
Sealed             true
Total Shares       1
Threshold          1
Unseal Progress    0/1
Unseal Nonce       n/a
Version            1.3.3
HA Enabled         true
command terminated with exit code 2

These are the steps:

1. enable raft (and some deployment-specific bits) in values.yaml:

diff --git a/values.yaml b/values.yaml
index 1616394..e8d3c40 100644
--- a/values.yaml
+++ b/values.yaml
@@ -267,10 +268,10 @@ server:
   dataStorage:
     enabled: true
     # Size of the PVC created
-    size: 10Gi
+    size: 5Gi
     # Name of the storage class to use.  If null it will use the
     # configured default Storage Class.
-    storageClass: null
+    storageClass: openebs-hostpath
     # Access Mode of the storage device being used for the PVC
     accessMode: ReadWriteOnce
 
@@ -336,8 +337,8 @@ server:
   # Helm project by default.  It is possible to manually configure Vault to use a
   # different HA backend.
   ha:
-    enabled: false
-    replicas: 3
+    enabled: true
+    replicas: 2
     
     # Enables Vault's integrated Raft storage.  Unlike the typical HA modes where 
     # Vault's persistence is external (such as Consul), enabling Raft mode will create 
@@ -346,7 +347,7 @@ server:
     raft:
       
       # Enables Raft integrated storage
-      enabled: false
+      enabled: true
       config: |
         ui = true
         cluster_addr = "https://POD_IP:8201"

2. deploy to Cluster

helm install vault ./
NAME: vault
LAST DEPLOYED: Sat Mar 21 17:16:09 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!
...

3. initialize intiial leader vault-0

kubectl exec -ti vault-0 -- vault operator init -key-shares=1 -key-threshold=1
Unseal Key 1: secret
Initial Root Token: secret
...
kubectl exec -ti vault-0 -- vault operator unseal
Unseal Key (will be hidden): 
Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false
Total Shares           1
Threshold              1
Version                1.3.3
Cluster Name           vault-cluster-e4db2105
Cluster ID             41716ef5-fb2b-9b46-6466-51998a3bbf70
HA Enabled             true
HA Cluster             n/a
HA Mode                standby
Active Node Address    <none>

4. join second instance to cluster

kubectl exec -ti vault-1 vault operator raft join http://vault-0.vault-internal:8200
Key       Value
---       -----
Joined    true

kubectl exec -ti vault-1 -- vault operator unseal
Unseal Key (will be hidden): 
Error unsealing: context deadline exceeded
command terminated with exit code 2

It turns out that vault-0 has a noisy log-file after this failed unseal attempt:

Big log
kubectl logs vault-0
==> Vault server configuration:

             Api Address: http-internal://10.233.123.248:8200
                     Cgo: disabled
         Cluster Address: https://vault-0.vault-internal:8201
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
               Log Level: info
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.3.3

2020-03-21T16:16:30.502Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
==> Vault server started! Log data will stream in below:

2020-03-21T16:16:37.617Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:16:40.624Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:16:43.613Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:16:46.627Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:16:49.639Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:16:52.650Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:16:55.625Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:16:58.623Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:01.621Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:04.646Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:07.618Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:10.615Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:13.608Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:16.620Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:19.620Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:22.643Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:25.624Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:28.631Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:31.616Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:34.635Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:37.627Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:40.617Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:43.633Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:46.624Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:49.625Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:52.657Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:55.619Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:17:58.625Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:18:01.616Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:18:04.615Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:18:07.622Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:18:10.614Z [INFO]  core: seal configuration missing, not initialized
2020-03-21T16:18:11.935Z [ERROR] core: no seal config found, can't determine if legacy or new-style shamir
2020-03-21T16:18:11.935Z [INFO]  core: security barrier not initialized
2020-03-21T16:18:12.005Z [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:166abab1-7d7a-5d3c-37a8-6d600d44c0d7 Address:vault-0.vault-internal:8201}]"
2020-03-21T16:18:12.005Z [INFO]  storage.raft: entering leader state: leader="Node at 166abab1-7d7a-5d3c-37a8-6d600d44c0d7 [Leader]"
2020-03-21T16:18:12.219Z [INFO]  core: security barrier initialized: stored=1 shares=1 threshold=1
2020-03-21T16:18:12.486Z [INFO]  core: post-unseal setup starting
2020-03-21T16:18:12.597Z [INFO]  core: loaded wrapping token key
2020-03-21T16:18:12.597Z [INFO]  core: successfully setup plugin catalog: plugin-directory=
2020-03-21T16:18:12.597Z [INFO]  core: no mounts; adding default mount table
2020-03-21T16:18:12.775Z [INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2020-03-21T16:18:12.775Z [INFO]  core: successfully mounted backend: type=system path=sys/
2020-03-21T16:18:12.776Z [INFO]  core: successfully mounted backend: type=identity path=identity/
2020-03-21T16:18:13.253Z [INFO]  core: successfully enabled credential backend: type=token path=token/
2020-03-21T16:18:13.253Z [INFO]  core: restoring leases
2020-03-21T16:18:13.253Z [INFO]  rollback: starting rollback manager
2020-03-21T16:18:13.253Z [INFO]  expiration: lease restore complete
2020-03-21T16:18:13.408Z [INFO]  identity: entities restored
2020-03-21T16:18:13.409Z [INFO]  identity: groups restored
2020-03-21T16:18:13.486Z [INFO]  core: post-unseal setup complete
2020-03-21T16:18:13.741Z [INFO]  core: root token generated
2020-03-21T16:18:13.886Z [INFO]  core: pre-seal teardown starting
2020-03-21T16:18:13.886Z [INFO]  rollback: stopping rollback manager
2020-03-21T16:18:13.886Z [INFO]  core: pre-seal teardown complete
2020-03-21T16:19:31.910Z [INFO]  core.cluster-listener: starting listener: listener_address=[::]:8201
2020-03-21T16:19:31.910Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2020-03-21T16:19:31.936Z [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:166abab1-7d7a-5d3c-37a8-6d600d44c0d7 Address:vault-0.vault-internal:8201}]"
2020-03-21T16:19:31.936Z [INFO]  core: vault is unsealed
2020-03-21T16:19:31.936Z [INFO]  storage.raft: entering follower state: follower="Node at [::]:8201 [Follower]" leader=
2020-03-21T16:19:31.936Z [INFO]  core: entering standby mode
2020-03-21T16:19:39.237Z [WARN]  storage.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-21T16:19:39.237Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=2
2020-03-21T16:19:39.303Z [INFO]  storage.raft: election won: tally=1
2020-03-21T16:19:39.303Z [INFO]  storage.raft: entering leader state: leader="Node at [::]:8201 [Leader]"
2020-03-21T16:19:39.396Z [INFO]  core: acquired lock, enabling active operation
2020-03-21T16:19:39.529Z [INFO]  core: post-unseal setup starting
2020-03-21T16:19:39.529Z [INFO]  core: loaded wrapping token key
2020-03-21T16:19:39.529Z [INFO]  core: successfully setup plugin catalog: plugin-directory=
2020-03-21T16:19:39.530Z [INFO]  core: successfully mounted backend: type=system path=sys/
2020-03-21T16:19:39.530Z [INFO]  core: successfully mounted backend: type=identity path=identity/
2020-03-21T16:19:39.531Z [INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2020-03-21T16:19:39.533Z [INFO]  core: successfully enabled credential backend: type=token path=token/
2020-03-21T16:19:39.533Z [INFO]  core: restoring leases
2020-03-21T16:19:39.533Z [INFO]  rollback: starting rollback manager
2020-03-21T16:19:39.533Z [INFO]  identity: entities restored
2020-03-21T16:19:39.533Z [INFO]  identity: groups restored
2020-03-21T16:19:39.533Z [INFO]  expiration: lease restore complete
2020-03-21T16:19:39.596Z [INFO]  core: post-unseal setup complete
2020-03-21T16:21:43.476Z [INFO]  storage.raft: updating configuration: command=AddStaging server-id=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f server-addr=vault-1.vault-internal:8201 servers="[{Suffrage:Voter ID:166abab1-7d7a-5d3c-37a8-6d600d44c0d7 Address:vault-0.vault-internal:8201} {Suffrage:Voter ID:d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f Address:vault-1.vault-internal:8201}]"
2020-03-21T16:21:43.507Z [INFO]  storage.raft: added peer, starting replication: peer=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f
2020-03-21T16:21:43.543Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.664Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.784Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.875Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.989Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.104Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.184Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.436Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.789Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.849Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:45.579Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:45.605Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:46.007Z [WARN]  storage.raft: failed to contact: server-id=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f time=2.500184019s
2020-03-21T16:21:46.007Z [WARN]  storage.raft: failed to contact quorum of nodes, stepping down
2020-03-21T16:21:46.007Z [INFO]  storage.raft: entering follower state: follower="Node at [::]:8201 [Follower]" leader=
2020-03-21T16:21:46.007Z [WARN]  core: leadership lost, stopping active operation
2020-03-21T16:21:46.007Z [INFO]  core: pre-seal teardown starting
2020-03-21T16:21:46.260Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:46.507Z [INFO]  rollback: stopping rollback manager
2020-03-21T16:21:46.508Z [INFO]  core: pre-seal teardown complete
2020-03-21T16:21:46.508Z [ERROR] core: clearing leader advertisement failed: error="node is not the leader"
2020-03-21T16:21:46.508Z [ERROR] core: unlocking HA lock failed: error="node is not the leader"
2020-03-21T16:21:46.643Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:46.939Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:47.645Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:49.305Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:51.421Z [WARN]  storage.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-21T16:21:51.421Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=3
2020-03-21T16:21:51.520Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:52.233Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:56.227Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:57.138Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:21:57.138Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=4
2020-03-21T16:21:57.244Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:03.173Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:03.173Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=5
2020-03-21T16:22:03.266Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:03.789Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:22:09.130Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:09.130Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=6
2020-03-21T16:22:09.234Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:13.171Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:22:17.829Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:17.829Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=7
2020-03-21T16:22:17.921Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:23.625Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:23.625Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=8
2020-03-21T16:22:23.777Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:32.918Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:32.918Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=9
2020-03-21T16:22:33.013Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:33.239Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:22:38.920Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:38.920Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=10
2020-03-21T16:22:39.014Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:48.083Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:48.083Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=11
2020-03-21T16:22:48.180Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:55.038Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:55.038Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=12
2020-03-21T16:22:55.136Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:02.573Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:23:04.562Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:04.562Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=13
2020-03-21T16:23:04.660Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:11.567Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:11.567Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=14
2020-03-21T16:23:11.661Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:20.125Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:20.125Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=15
2020-03-21T16:23:20.223Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:28.193Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:28.193Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=16
2020-03-21T16:23:28.297Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:34.001Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:34.001Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=17
2020-03-21T16:23:34.098Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:39.682Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:39.682Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=18
2020-03-21T16:23:39.776Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:47.414Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:23:49.430Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:49.430Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=19
2020-03-21T16:23:49.522Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:56.184Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:56.184Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=20
2020-03-21T16:23:56.290Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:02.367Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:02.367Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=21
2020-03-21T16:24:02.469Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:07.906Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:07.906Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=22
2020-03-21T16:24:08.002Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:15.258Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:15.258Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=23
2020-03-21T16:24:15.358Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:23.617Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:23.617Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=24
2020-03-21T16:24:23.713Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:31.524Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:31.524Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=25
2020-03-21T16:24:31.629Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:38.581Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:38.581Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=26
2020-03-21T16:24:38.671Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:46.740Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:46.740Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=27
2020-03-21T16:24:46.840Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:54.393Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:24:55.563Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:55.563Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=28
2020-03-21T16:24:55.663Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:25:00.933Z [WARN]  storage.raft: Election timeout reached, restarting election
2020-03-21T16:25:00.933Z [INFO]  storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=29
2020-03-21T16:25:01.041Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"

I am quite lost to what this means.
From what I can tell, the most fishy line in the log of vault-0 is:
2020-03-21T16:21:43.543Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
where it tries to contact the newly joined instance on port 8201. Why 8201? And why is the connection refused?

Second most fishy line is the stepping down as leader:

2020-03-21T16:21:46.007Z [WARN]  storage.raft: failed to contact: server-id=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f time=2.500184019s
2020-03-21T16:21:46.007Z [WARN]  storage.raft: failed to contact quorum of nodes, stepping down
2020-03-21T16:21:46.007Z [INFO]  storage.raft: entering follower state: follower="Node at [::]:8201 [Follower]" leader=
2020-03-21T16:21:46.007Z [WARN]  core: leadership lost, stopping active operation

Any ideas how to procede here? Are these in fact the right steps?

from vault-helm.

ngarafol avatar ngarafol commented on July 24, 2024

@Josua-SR did you check my comment above? You have to use dns for resolving, think you still use ip addresses. Check it hashicorp/vault#8489

from vault-helm.

Josua-SR avatar Josua-SR commented on July 24, 2024

@ngarafol no not really, I don't understand most of the comments in that thread, and they seemed to refer to implementation details only present in that PR, such es e.g. the vault-headless service name.

But I am on master branch of the helm chart ... so I have no idea which comments still apply and which don't.

If you were referring to the change of simply declaring the VAULT_CLUSTER_ADDR as https://$(HOSTNAME):8201 in templates/server-statefulset.yaml - well, I quickly did the run-through, and it behaves exactly like I describe above, no change.
It also seems superfluous, since the cluster address is set in the vault config file derived from the config | sections in values.yaml.

Also I'd like to note that the IPs that appear in my logs are actually valid .... and refer to the vault-0 and vault-1 pods

from vault-helm.

ngarafol avatar ngarafol commented on July 24, 2024

@Josua-SR Oh, ok. Yes, I use raft branch, and with dns mode, I managed to raft join nodes to leader. I transit unseal.

One thing that is odd is your active node address in the beginning with http-internal:// ...

Mine is like: Active Node Address http://10.10.117.75:8200

Other thing, try to use vault v1.3.4 version, could be that raft is fixed there. It fixes CVE but could this issue too, you dont know unless you try.

How does your raft configuration look like?

You can login with initial root token, and then run: vault operator raft configuration -format json.

Also, why do you use 2 replicas? Isnt 3 the minimum? n/2+1 ?

from vault-helm.

Josua-SR avatar Josua-SR commented on July 24, 2024

Hi @ngarafol
Yep, I thought so too. http-internal comes from server-statefulset.yaml:

- name: VAULT_API_ADDR
  value: "{{ include "vault.scheme" . }}-internal://$(POD_IP):8200"

I removed the -internal suffix here, but again it changed nothing in the unseal timeout and the refused connections to vault-1.vault-internal:8201 :(
vault status now shows Active Node Address http://10.233.123.54:8200 - so the change worked, but didn't help with the problem at hand.

As to raft status, first I can't vault login: Error authenticating: empty response from lookup-self
And raft operator raft configuration -format json returns null

I use 2 replicas because I am tight on kubernetes nodes at the moment ... ...

EDIT: v1.3.4 behaves the same

from vault-helm.

chrw avatar chrw commented on July 24, 2024

It would be awesome to have the raft setup working with trusted TLS certificates! Maybe I can come up with some notes by the end of the week since I'm currently working on our new productive Vault setup. 😊

from vault-helm.

jasonodonnell avatar jasonodonnell commented on July 24, 2024

Hi @sdeoras,

Have you tried adding the internal SAN hostname to your certificate? Raft uses the headless service to communicate directly with other pods so it's a valid SAN.

from vault-helm.

sdeoras avatar sdeoras commented on July 24, 2024

@jasonodonnell thanks. i'll give it a try.

from vault-helm.

jasonodonnell avatar jasonodonnell commented on July 24, 2024

Closing this issue now that Raft support and the documentation have been released. Please open a new issue if you have raft specific problems!

from vault-helm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.