obolnetwork / charon-k8s-distributed-validator-cluster Goto Github PK
View Code? Open in Web Editor NEWA set of Kubernetes manifests for deploying Distributed Validator Clusters.
A set of Kubernetes manifests for deploying Distributed Validator Clusters.
Problem to be solved
In release v0.13.0 we introduced charon relay a new replacement to the bootnodes, we need to update the test and canary clusters to use it.
Proposed solution
Update the charonxk8s templates and configmap with the new charon relays flag, and bump up the deployment versions from (0.11.0, 0.12.0, latest) to (0.12.0,0.13.0, latest).
Out of scope
If there is anything to highlight as out of scope for this issue, please outline it here.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Extract from - https://discord.com/channels/930779837218562078/1109760461186015273/1110102612411416576
We have been asked by users in the past that they want size 3 clusters
These are usually institutional stakers that want to run single-operator clusters of size 3.
They want this because they want CFT, they want to de-risk their infra.
They want this because they do not care about BFT so they want to reduce infra costs. (3 is cheaper than 4).
Create a cluster internally of 3 nodes and monitor that with one node being down. This cluster can be deployed on gcp like all canary clusters.
We can setup 2 k8 canary cluster - One with mix of VCs and one with Lodestar VCs
Run 2/3 nodes and test for functional completeness and performance.
No missed proposals
No missed attestation
Exits gracefully
Aggregation?
Monitor Logs and Metrics for a period 2-3 weeks.
Update the cluster deployment README doc with the updated instructions to use a GCS bucket backend.
Problem to be solved
Add support for lodestar VC. This needs to be done in order to deploy lodestar VC in canary clusters.
Proposed solution
Add a new template file templates/lodestar-vc.yaml
. Also configure deploy-cluster.sh
script to include lodestar VC in canary cluster VC types. Test lodestar VC in canary cluster deployments.
Parameterize the p2p bootnode to enable clusters to switch config between the private or public bootnodes endpoints
Create k8s service monitors to expose the charon nodes' prom metrics
Run the canary charon clusters using a mix of VCs (teku and lighthouse).
Update the charonxk8s deployment pipeline to remove charon/kiln and add charon/ropsten.
We need to override bootnodes config with the new relay endpoint
Update the charon k8s template to with the bootnodes flag and env var
We need to run the canary clusters on multiple charon versions:
1- Latest release (i.e 0.11.0)
2- Oldest supported release (i.e. 0.10.0)
3- Latest tagged image
We need our charon-k8s templates to support the VCs voluntary exit for Lighthouse and Lodestar.
Automated DKG load tests built here - PR
Suggetsion to add a repository trigger with a relay url arg? Then could potentially automate tests after a dev relay deploy.
Use Github workflow to run the tests.
Send CHARON_P2P_RELAYS as the run parmeter.
Workflow should be able to trigger load tests by taking the Relay
as user defined inputs
Generating definition files with SDK
Add a step to the canary cluster CD workflow to notify the charon discord channel
data-dir flag is deprecated and we should remove it.
Remove the data-dir flag from the k8s template and add the private-key-file flag
When we deploy charon to gcp we need to do canary rollout to nodes < the threshold to ensure the cluster will not operate if a buggy charon version is shipped.
The cluster restart workflow gets triggered whenever a charon package is published, a behavior that we no longer need.
Remove the repository-dispatch event from the workflow
The InitContainer step needs to run a command -
init_container:
- name: init-nimbus
image: statusim/nimbus-eth2:$NIMBUS_VERSION
command:
- sh
- -ac
- "tmp=/data/nimbus/nimbus-keystores \n mkdir -p $${tmp}\n for f in /validator_keys/keystore-*.json; do\n cp $${f} $${tmp}\n cat $${f%.*}.txt | \\\n /home/user/nimbus_beacon_node deposits import \\\n --log-level=debug \\\n --data-dir=/data/nimbus \\\n /data/nimbus/nimbus-keystores \n rm $${tmp}/$(basename $${f}) \n done\n"
volume_mount:
- name: data
mount_path: /data/nimbus
- name: validators
mount_path: /validator_keys
security_context:
run_as_user: 0
However, this does not succeed, the InitContainer crashes.
canary-goerli-exit-3
.env
file VC_TYPES=0,1,2,3
3=nimbus
The Nimbus VC pod crashes.
User kubectl describe pod
to check for the error inside the pod
Notice that the variables inside the initContainer sh
command, are not getting resolved
We run a charon-sepolia cluster and we need to ensure its deployment is automated and managed with the canary workflows
Add the charon-sepolia-1780
cluster to the canary clusters CD workflow.
We need to deploy charon gnosis cluster with 7/5 nodes and 1 validator.
Canary clusters should send logs to Loki.
Add config keys for Loki to charon.yaml.
Verify logs are being sent, check Grafana
Canary clusters were failing as BNs were syncing.
Problem to be solved
We recently added support for nimbus VC to the cdvc repo. Now we want to test it by running the new VC in our canary clusters.
Proposed solution
Add a new template file templates/nimbus-vc.yaml
. You can take inspiration from [https://github.com/ObolNetwork/charon-distributed-validator-cluster/tree/main/nimbus] and also from the templates/lighthouse-vc.yaml
.
Out of scope
None.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
We need a versatile but secure way to persist clusters' config (validators keys, cluster-lock, and cluster.env) to be used by CICD tools and team members.
There are a few reasonable options such as hashicorp vault, GCP Secrets Manager, GCS, and GitHub secrets. We decided to use GCS as it is the least option to introduce operational complexity while it is secure and inherits GCP RBAC rules. In the mid-term, we will reconsider this choice in favor of Vault.
Problem to be solved
Enable the cluster to flexibly run multiple VCs with mixed charon versions.
Proposed solution
Add a flag to DKG load tests to send a user choice to publish or not to publish post dkg.
Update Github workflow to run the tests.
Workflow should be able to trigger load tests by taking the Relay
as user defined inputs
Generating definition files with SDK
We need to include all clusters in the charon CD workflow
The key utility is a Go utility
which -
At the moment we call it manually, however we have a scope to add this into the create-web3signer-secrets.sh
so it can help to avoid the manual step along with a flag to allow the operator to opt out from the upload in case the vault has the keys already
Update the shell script
The script should take the cluster name as an input and prepare all needed secrets inside the namespace
Ensure cluster is up and running
Create a template and script for teku VC voluntary exit
Remove unnecessary volumes mounts from the VCs templates
Update the VCs yaml templates
We prepare/refresh/document and a k8s public repo run a charon node in Kubernetes
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.