Comments (16)
Hi @iniinikoski, my apologies for the delay.
I ended up needing to change more than I originally thought to make this work. I'll be finalizing this work in the coming weeks but wanted to get you something in the meantime.
You can find the code changes in the raft
branch: https://github.com/hashicorp/vault-helm/tree/raft.
Here's a working example:
Vault Helm Raft
This is a tech-preview of Raft and as such, should not be used in production.
Clone the repo:
mkdir ~/vault-raft && cd ~/vault-raft
git clone [email protected]:hashicorp/vault-helm.git && cd ~/vault-raft/vault-helm
git checkout raft
Next, create a custom values-raft.yaml
file so we can just inject our custom values:
cat >~/vault-raft/vault-helm/values-raft.yaml <<EOL
server:
ha:
enabled: true
raft:
enabled: true
config: |
ui = true
cluster_addr = "https://POD_IP:8201"
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/vault/data"
}
EOL
helm install --name=vault -f values-raft.yaml .
Once deployed you can initialize vault-0
and unseal:
Note: vault-0
is going to be our leader initially.
kubectl exec -ti vault-0 -- vault operator init
kubectl exec -ti vault-0 -- vault operator unseal
Next, for each other vault
pod, join the raft cluster and unseal:
kubectl exec -ti <NAME OF POD> -- vault operator raft join http://vault-0.vault-headless:8200
kubectl exec -ti <NAME OF POD> -- vault operator unseal
After logging into Vault using a token, you can check the configuration of Raft:
kubectl exec -ti vault-0 -- vault login
kubectl exec -ti vault-0 -- vault operator raft configuration -format=json
Or using status
:
kubectl exec -ti vault-0 -- vault status
from vault-helm.
Hi @iniinikoski, thanks for bringing this to my attention!
I agree this should be documented (and supported). Currently right now there's a small limitation with the HA mode that doesn't create data volumes. Basically data volumes only get created in standalone
mode.
I will adjust some things here to support HA mode with data volumes (raft) and document how to do this.
Leaving this issue open to track progress on this feature.
from vault-helm.
@Josua-SR minimum cluster size is 3.
Good catch on the API_ADDR env, PR to fix that here: #237.
The old Raft branch and instructions are no longer relevant and I would advise everyone to stop using them.
With the new feature here's how you bootstrap the Raft cluster:
$ helm install vault \
--set='server.ha.enabled=true' \
--set='server.ha.raft.enabled=true' .
$ kubectl exec -ti vault-0 -- vault operator init
$ kubectl exec -ti vault-0 -- vault operator unseal
$ kubectl exec -ti vault-1 -- vault operator raft join http://vault-0.vault-internal:8200
$ kubectl exec -ti vault-1 -- vault operator unseal
$ kubectl exec -ti vault-2 -- vault operator raft join http://vault-0.vault-internal:8200
$ kubectl exec -ti vault-2 -- vault operator unseal
$ kubectl exec -ti vault-0 -- vault status
Note: Helm does not delete volumes and it's possible you have old PVCs hanging around from a failed attempt. Make sure to clean them up.
from vault-helm.
I am trying out raft
storage (v1.4.0) via helm chart (v0.5.0) with TLS enabled and having some issues with cert validation. Is there a way I specify to skip cert validation when joining leader node. I am already using VAULT_SKIP_VERIFY=true
but that does not seem to affect the vault operator raft join
call to the leader node.
2020-04-13T01:28:23.075Z [INFO] core: attempting to join possible raft leader node: leader_addr=https://vault-0.vault-internal:8200
2020-04-13T01:28:23.080Z [INFO] core: join attempt failed: error="error during raft bootstrap init call: Put https://vault-0.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge: x509: certificate is valid for <redacted>, not vault-0.vault-internal"
2020-04-13T01:28:23.080Z [ERROR] core: failed to join raft cluster: error="failed to join any raft leader node"
from vault-helm.
Might be more useful to implement this after Vault supports retry join. cf. hashicorp/vault#7856
from vault-helm.
Hi, I am using this example, but I seem to be having problems when deleting pod. vault service picks up new pod but raft is still using old pod ip, and new pod doesnt join back to cluster.
Cant rejoin new pod manually since raft is already initialized. Using vault 1.3.2
Opened issue here with more info: hashicorp/vault#8489
from vault-helm.
Tracking this feature here: #58
from vault-helm.
It appears that a few days ago support was merged into the master branch. So I tried using the same isntructions @jasonodonnell posted above, but I never get to a functioning vault cluster.
Errors are as follows:
After raft join
, unseal
fails with Error unsealing: context deadline exceeded
.
At this point, vault status
reports that vault-0 is not a leader anymore:
Cluster Name vault-cluster-e4db2105
Cluster ID 41716ef5-fb2b-9b46-6466-51998a3bbf70
HA Enabled true
HA Cluster https://vault-0.vault-internal:8201
HA Mode standby
Active Node Address http-internal://10.233.123.248:8200
While vault status
on vault-1 shows that it stayed sealed, and exits with error before printing the active node:
kubectl exec -ti vault-1 -- vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 1
Threshold 1
Unseal Progress 0/1
Unseal Nonce n/a
Version 1.3.3
HA Enabled true
command terminated with exit code 2
These are the steps:
1. enable raft (and some deployment-specific bits) in values.yaml:
diff --git a/values.yaml b/values.yaml
index 1616394..e8d3c40 100644
--- a/values.yaml
+++ b/values.yaml
@@ -267,10 +268,10 @@ server:
dataStorage:
enabled: true
# Size of the PVC created
- size: 10Gi
+ size: 5Gi
# Name of the storage class to use. If null it will use the
# configured default Storage Class.
- storageClass: null
+ storageClass: openebs-hostpath
# Access Mode of the storage device being used for the PVC
accessMode: ReadWriteOnce
@@ -336,8 +337,8 @@ server:
# Helm project by default. It is possible to manually configure Vault to use a
# different HA backend.
ha:
- enabled: false
- replicas: 3
+ enabled: true
+ replicas: 2
# Enables Vault's integrated Raft storage. Unlike the typical HA modes where
# Vault's persistence is external (such as Consul), enabling Raft mode will create
@@ -346,7 +347,7 @@ server:
raft:
# Enables Raft integrated storage
- enabled: false
+ enabled: true
config: |
ui = true
cluster_addr = "https://POD_IP:8201"
2. deploy to Cluster
helm install vault ./
NAME: vault
LAST DEPLOYED: Sat Mar 21 17:16:09 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!
...
3. initialize intiial leader vault-0
kubectl exec -ti vault-0 -- vault operator init -key-shares=1 -key-threshold=1
Unseal Key 1: secret
Initial Root Token: secret
...
kubectl exec -ti vault-0 -- vault operator unseal
Unseal Key (will be hidden):
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 1
Threshold 1
Version 1.3.3
Cluster Name vault-cluster-e4db2105
Cluster ID 41716ef5-fb2b-9b46-6466-51998a3bbf70
HA Enabled true
HA Cluster n/a
HA Mode standby
Active Node Address <none>
4. join second instance to cluster
kubectl exec -ti vault-1 vault operator raft join http://vault-0.vault-internal:8200
Key Value
--- -----
Joined true
kubectl exec -ti vault-1 -- vault operator unseal
Unseal Key (will be hidden):
Error unsealing: context deadline exceeded
command terminated with exit code 2
It turns out that vault-0 has a noisy log-file after this failed unseal attempt:
Big log
kubectl logs vault-0
==> Vault server configuration:
Api Address: http-internal://10.233.123.248:8200
Cgo: disabled
Cluster Address: https://vault-0.vault-internal:8201
Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level: info
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.3.3
2020-03-21T16:16:30.502Z [INFO] proxy environment: http_proxy= https_proxy= no_proxy=
==> Vault server started! Log data will stream in below:
2020-03-21T16:16:37.617Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:16:40.624Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:16:43.613Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:16:46.627Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:16:49.639Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:16:52.650Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:16:55.625Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:16:58.623Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:01.621Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:04.646Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:07.618Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:10.615Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:13.608Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:16.620Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:19.620Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:22.643Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:25.624Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:28.631Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:31.616Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:34.635Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:37.627Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:40.617Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:43.633Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:46.624Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:49.625Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:52.657Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:55.619Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:17:58.625Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:18:01.616Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:18:04.615Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:18:07.622Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:18:10.614Z [INFO] core: seal configuration missing, not initialized
2020-03-21T16:18:11.935Z [ERROR] core: no seal config found, can't determine if legacy or new-style shamir
2020-03-21T16:18:11.935Z [INFO] core: security barrier not initialized
2020-03-21T16:18:12.005Z [INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:166abab1-7d7a-5d3c-37a8-6d600d44c0d7 Address:vault-0.vault-internal:8201}]"
2020-03-21T16:18:12.005Z [INFO] storage.raft: entering leader state: leader="Node at 166abab1-7d7a-5d3c-37a8-6d600d44c0d7 [Leader]"
2020-03-21T16:18:12.219Z [INFO] core: security barrier initialized: stored=1 shares=1 threshold=1
2020-03-21T16:18:12.486Z [INFO] core: post-unseal setup starting
2020-03-21T16:18:12.597Z [INFO] core: loaded wrapping token key
2020-03-21T16:18:12.597Z [INFO] core: successfully setup plugin catalog: plugin-directory=
2020-03-21T16:18:12.597Z [INFO] core: no mounts; adding default mount table
2020-03-21T16:18:12.775Z [INFO] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2020-03-21T16:18:12.775Z [INFO] core: successfully mounted backend: type=system path=sys/
2020-03-21T16:18:12.776Z [INFO] core: successfully mounted backend: type=identity path=identity/
2020-03-21T16:18:13.253Z [INFO] core: successfully enabled credential backend: type=token path=token/
2020-03-21T16:18:13.253Z [INFO] core: restoring leases
2020-03-21T16:18:13.253Z [INFO] rollback: starting rollback manager
2020-03-21T16:18:13.253Z [INFO] expiration: lease restore complete
2020-03-21T16:18:13.408Z [INFO] identity: entities restored
2020-03-21T16:18:13.409Z [INFO] identity: groups restored
2020-03-21T16:18:13.486Z [INFO] core: post-unseal setup complete
2020-03-21T16:18:13.741Z [INFO] core: root token generated
2020-03-21T16:18:13.886Z [INFO] core: pre-seal teardown starting
2020-03-21T16:18:13.886Z [INFO] rollback: stopping rollback manager
2020-03-21T16:18:13.886Z [INFO] core: pre-seal teardown complete
2020-03-21T16:19:31.910Z [INFO] core.cluster-listener: starting listener: listener_address=[::]:8201
2020-03-21T16:19:31.910Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2020-03-21T16:19:31.936Z [INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:166abab1-7d7a-5d3c-37a8-6d600d44c0d7 Address:vault-0.vault-internal:8201}]"
2020-03-21T16:19:31.936Z [INFO] core: vault is unsealed
2020-03-21T16:19:31.936Z [INFO] storage.raft: entering follower state: follower="Node at [::]:8201 [Follower]" leader=
2020-03-21T16:19:31.936Z [INFO] core: entering standby mode
2020-03-21T16:19:39.237Z [WARN] storage.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-21T16:19:39.237Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=2
2020-03-21T16:19:39.303Z [INFO] storage.raft: election won: tally=1
2020-03-21T16:19:39.303Z [INFO] storage.raft: entering leader state: leader="Node at [::]:8201 [Leader]"
2020-03-21T16:19:39.396Z [INFO] core: acquired lock, enabling active operation
2020-03-21T16:19:39.529Z [INFO] core: post-unseal setup starting
2020-03-21T16:19:39.529Z [INFO] core: loaded wrapping token key
2020-03-21T16:19:39.529Z [INFO] core: successfully setup plugin catalog: plugin-directory=
2020-03-21T16:19:39.530Z [INFO] core: successfully mounted backend: type=system path=sys/
2020-03-21T16:19:39.530Z [INFO] core: successfully mounted backend: type=identity path=identity/
2020-03-21T16:19:39.531Z [INFO] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2020-03-21T16:19:39.533Z [INFO] core: successfully enabled credential backend: type=token path=token/
2020-03-21T16:19:39.533Z [INFO] core: restoring leases
2020-03-21T16:19:39.533Z [INFO] rollback: starting rollback manager
2020-03-21T16:19:39.533Z [INFO] identity: entities restored
2020-03-21T16:19:39.533Z [INFO] identity: groups restored
2020-03-21T16:19:39.533Z [INFO] expiration: lease restore complete
2020-03-21T16:19:39.596Z [INFO] core: post-unseal setup complete
2020-03-21T16:21:43.476Z [INFO] storage.raft: updating configuration: command=AddStaging server-id=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f server-addr=vault-1.vault-internal:8201 servers="[{Suffrage:Voter ID:166abab1-7d7a-5d3c-37a8-6d600d44c0d7 Address:vault-0.vault-internal:8201} {Suffrage:Voter ID:d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f Address:vault-1.vault-internal:8201}]"
2020-03-21T16:21:43.507Z [INFO] storage.raft: added peer, starting replication: peer=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f
2020-03-21T16:21:43.543Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.664Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.784Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.875Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:43.989Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.104Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.184Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.436Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.789Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:44.849Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:45.579Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:45.605Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:46.007Z [WARN] storage.raft: failed to contact: server-id=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f time=2.500184019s
2020-03-21T16:21:46.007Z [WARN] storage.raft: failed to contact quorum of nodes, stepping down
2020-03-21T16:21:46.007Z [INFO] storage.raft: entering follower state: follower="Node at [::]:8201 [Follower]" leader=
2020-03-21T16:21:46.007Z [WARN] core: leadership lost, stopping active operation
2020-03-21T16:21:46.007Z [INFO] core: pre-seal teardown starting
2020-03-21T16:21:46.260Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-1.vault-internal:8201 error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:46.507Z [INFO] rollback: stopping rollback manager
2020-03-21T16:21:46.508Z [INFO] core: pre-seal teardown complete
2020-03-21T16:21:46.508Z [ERROR] core: clearing leader advertisement failed: error="node is not the leader"
2020-03-21T16:21:46.508Z [ERROR] core: unlocking HA lock failed: error="node is not the leader"
2020-03-21T16:21:46.643Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:46.939Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:47.645Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:49.305Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:51.421Z [WARN] storage.raft: heartbeat timeout reached, starting election: last-leader=
2020-03-21T16:21:51.421Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=3
2020-03-21T16:21:51.520Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:21:52.233Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:56.227Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:21:57.138Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:21:57.138Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=4
2020-03-21T16:21:57.244Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:03.173Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:03.173Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=5
2020-03-21T16:22:03.266Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:03.789Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:22:09.130Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:09.130Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=6
2020-03-21T16:22:09.234Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:13.171Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:22:17.829Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:17.829Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=7
2020-03-21T16:22:17.921Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:23.625Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:23.625Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=8
2020-03-21T16:22:23.777Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:32.918Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:32.918Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=9
2020-03-21T16:22:33.013Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:33.239Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:22:38.920Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:38.920Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=10
2020-03-21T16:22:39.014Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:48.083Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:48.083Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=11
2020-03-21T16:22:48.180Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:22:55.038Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:22:55.038Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=12
2020-03-21T16:22:55.136Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:02.573Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:23:04.562Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:04.562Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=13
2020-03-21T16:23:04.660Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:11.567Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:11.567Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=14
2020-03-21T16:23:11.661Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:20.125Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:20.125Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=15
2020-03-21T16:23:20.223Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:28.193Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:28.193Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=16
2020-03-21T16:23:28.297Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:34.001Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:34.001Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=17
2020-03-21T16:23:34.098Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:39.682Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:39.682Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=18
2020-03-21T16:23:39.776Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:47.414Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:23:49.430Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:49.430Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=19
2020-03-21T16:23:49.522Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:23:56.184Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:23:56.184Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=20
2020-03-21T16:23:56.290Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:02.367Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:02.367Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=21
2020-03-21T16:24:02.469Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:07.906Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:07.906Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=22
2020-03-21T16:24:08.002Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:15.258Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:15.258Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=23
2020-03-21T16:24:15.358Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:23.617Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:23.617Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=24
2020-03-21T16:24:23.713Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:31.524Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:31.524Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=25
2020-03-21T16:24:31.629Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:38.581Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:38.581Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=26
2020-03-21T16:24:38.671Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:46.740Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:46.740Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=27
2020-03-21T16:24:46.840Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:24:54.393Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
2020-03-21T16:24:55.563Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:24:55.563Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=28
2020-03-21T16:24:55.663Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
2020-03-21T16:25:00.933Z [WARN] storage.raft: Election timeout reached, restarting election
2020-03-21T16:25:00.933Z [INFO] storage.raft: entering candidate state: node="Node at [::]:8201 [Candidate]" term=29
2020-03-21T16:25:01.041Z [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
I am quite lost to what this means.
From what I can tell, the most fishy line in the log of vault-0 is:
2020-03-21T16:21:43.543Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f vault-1.vault-internal:8201}" error="dial tcp 10.233.76.32:8201: connect: connection refused"
where it tries to contact the newly joined instance on port 8201. Why 8201? And why is the connection refused?
Second most fishy line is the stepping down as leader:
2020-03-21T16:21:46.007Z [WARN] storage.raft: failed to contact: server-id=d9ca1f2d-cb9c-685b-d24b-b5c393bd7d0f time=2.500184019s
2020-03-21T16:21:46.007Z [WARN] storage.raft: failed to contact quorum of nodes, stepping down
2020-03-21T16:21:46.007Z [INFO] storage.raft: entering follower state: follower="Node at [::]:8201 [Follower]" leader=
2020-03-21T16:21:46.007Z [WARN] core: leadership lost, stopping active operation
Any ideas how to procede here? Are these in fact the right steps?
from vault-helm.
@Josua-SR did you check my comment above? You have to use dns for resolving, think you still use ip addresses. Check it hashicorp/vault#8489
from vault-helm.
@ngarafol no not really, I don't understand most of the comments in that thread, and they seemed to refer to implementation details only present in that PR, such es e.g. the vault-headless service name.
But I am on master branch of the helm chart ... so I have no idea which comments still apply and which don't.
If you were referring to the change of simply declaring the VAULT_CLUSTER_ADDR as https://$(HOSTNAME):8201
in templates/server-statefulset.yaml - well, I quickly did the run-through, and it behaves exactly like I describe above, no change.
It also seems superfluous, since the cluster address is set in the vault config file derived from the config | sections in values.yaml.
Also I'd like to note that the IPs that appear in my logs are actually valid .... and refer to the vault-0 and vault-1 pods
from vault-helm.
@Josua-SR Oh, ok. Yes, I use raft branch, and with dns mode, I managed to raft join nodes to leader. I transit unseal.
One thing that is odd is your active node address in the beginning with http-internal:// ...
Mine is like: Active Node Address http://10.10.117.75:8200
Other thing, try to use vault v1.3.4 version, could be that raft is fixed there. It fixes CVE but could this issue too, you dont know unless you try.
How does your raft configuration look like?
You can login with initial root token, and then run: vault operator raft configuration -format json.
Also, why do you use 2 replicas? Isnt 3 the minimum? n/2+1 ?
from vault-helm.
Hi @ngarafol
Yep, I thought so too. http-internal
comes from server-statefulset.yaml:
- name: VAULT_API_ADDR
value: "{{ include "vault.scheme" . }}-internal://$(POD_IP):8200"
I removed the -internal suffix here, but again it changed nothing in the unseal timeout and the refused connections to vault-1.vault-internal:8201 :(
vault status now shows Active Node Address http://10.233.123.54:8200
- so the change worked, but didn't help with the problem at hand.
As to raft status, first I can't vault login: Error authenticating: empty response from lookup-self
And raft operator raft configuration -format json returns null
I use 2 replicas because I am tight on kubernetes nodes at the moment ... ...
EDIT: v1.3.4 behaves the same
from vault-helm.
It would be awesome to have the raft setup working with trusted TLS certificates! Maybe I can come up with some notes by the end of the week since I'm currently working on our new productive Vault setup. 😊
from vault-helm.
Hi @sdeoras,
Have you tried adding the internal SAN hostname to your certificate? Raft uses the headless service to communicate directly with other pods so it's a valid SAN.
from vault-helm.
@jasonodonnell thanks. i'll give it a try.
from vault-helm.
Closing this issue now that Raft support and the documentation have been released. Please open a new issue if you have raft specific problems!
from vault-helm.
Related Issues (20)
- Latest vault helm chart (0.27.0) does not work with GCPCKMS
- Add a way to create Secrets in the values.yaml
- allow to pin IPs of vault services HOT 3
- json formatted server config converts to a freak vault-config k8s secret which is both hcl and json HOT 1
- Chart prevents synchronisation with ArgoCD when using custom sync label HOT 3
- Add support to external Vault running with tls HOT 2
- Configuring vault ha with raft and ingress HOT 1
- [Feature] Allow the vault sidecar injector to be configured to point to the vault-active service
- storage.raft.fsm: failed to store data: error="input/output error"
- Access denied to helm.releases.hashicorp.com HOT 2
- Test.dockerfile throwing an error while building. HOT 1
- Agent Injector on EKS is not working. HOT 4
- Prometheus metrics disappear in HA setup when all Vault pods are sealed
- Please release a new version of helm chart with the current vault versions HOT 4
- Ability to have top level label on StatefullSet
- Cannot use HOSTNAME env var in VAULT_API_ADDR env var
- helm value server.logLevel does not set the log level but just logs all entries using this value
- Sidecar agent in CSI can't estabish a TLS connection with an external vault using a custom CA
- Deploying vault on OCI gives seal type Shamir not OCIKMS HOT 1
- Tests Assert that HA Should not be able to set the dataStorage StorageClass
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vault-helm.