k8ssandra / medusa-operator Goto Github PK

9.0 9.0 7.0 826 KB

A Kubernetes operator for managing Cassandra backups/restores with Medusa

License: Apache License 2.0

Dockerfile 1.21% Makefile 6.96% Go 90.64% Shell 1.19%

medusa-operator's Introduction

[DEPRECATED]

This project is deprecated and replaced by k8ssandra-operator

Read this blog post to see what differences exist between K8ssandra and k8ssandra-operator, and why we decided to build an operator.
Follow our migration guide to migrate from K8ssandra (and Apache Cassandra®) to k8ssandra-operator.

K8ssandra

K8ssandra is a simple to manage, production-ready, distribution of Apache Cassandra and Stargate that is ready for Kubernetes. It is built on a foundation of rock-solid open-source projects covering both the transactional and operational aspects of Cassandra deployments. This project is distributed as a collection of Helm charts. Feel free to fork the repo and contribute. If you're looking to install K8ssandra head over to the Quickstarts.

Components

K8ssandra is composed of a number of sub-charts each representing a component in the K8ssandra stack. The default installation is focused on developer deployments with all of the features enabled and configured for running with a minimal set of resources. Many of these components may be deployed independently in a centralized fashion. Below is a list of the components in the K8ssandra stack with links to the appropriate projects.

Apache Cassandra

K8ssandra packages and deploys Apache Cassandra via the cass-operator project. Each Cassandra container has the Management API for Apache Cassandra (MAAC) and Metrics Collector for Apache Cassandra(MCAC) pre-installed and configured to come up automatically.

Stargate

Stargate provides a collection of horizontally scalable API endpoints for interacting with Cassandra databases. Developers may leverage REST and GraphQL alongside the traditional CQL interfaces. With Stargate operations teams gain the ability to independently scale coordination (Stargate) and data (Cassandra) layers. In some use-cases, this has resulted in a lower TCO and smaller infrastructure footprint.

Monitoring

Monitoring includes the collection, storage, and visualization of metrics. Along with the previously mentioned MCAC, K8ssandra utilizes Prometheus and Grafana for the storage and visualization of metrics. Installation and management of these pieces is handled by the Kube Prometheus Stack Helm chart.

Repairs

The Last Pickle Reaper is used to schedule and manage repairs in Cassandra. It provides a web interface to visualize repair progress and manage activity.

Backup & Restore

Another project from The Last Pickle, Medusa, manages the backup and restore of K8ssandra clusters.

Next Steps

If you are looking to run K8ssandra in your Kubernetes environment check out the Getting Started guide, with follow-up details for developers and site reliability engineers.

We are always looking for contributions to the docs, helm charts, and underlying components. Check out the code contribution guide and docs contribution guide

If you are a developer interested in working with the K8ssandra code, here is a guide that will give you an introduction to:

Important technologies and learning resources
Project components
Project processes and resources
Getting up and running with a basic IDE environment
Deploying to a local docker-based cluster environment (kind)
Understanding the K8ssandra project structure
Running unit tests
Troubleshooting tips

Dependencies

For information on the packaged dependencies of K8ssandra and their licenses, check out our open source report.

medusa-operator's People

Contributors

Stargazers

Watchers

Forkers

jsanda burmanm jdonenine adejanovski mattfellows drgray20 jeffbanks

medusa-operator's Issues

Test

Testing new setup.

Allow specification of backup type (full Vs differential) when creating a new backup

Is your feature request related to a problem? Please describe.
It would be useful to request a full backup on occasion. Currently the only backup type is a differential.

Describe the solution you'd like
Optionally be able to specify the backup type when doing a helm install

Describe alternatives you've considered
None

Additional context
None

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-800
┆Priority: Medium

Remove the CassandraDatacenter spec from CassandraBackup objects

CassandraBackup resources shouldn’t store the CassandraDatacenter spec anymore but rather the topology object generated by Medusa.
This object will contain the list of live nodes at the time of the backup along with their token assignments. Sample topology object:

{"ip-179-31-16-38.us-west-2.compute.internal": {"tokens": [-1051417974279770378, -112095142485141927, 9217379683620055011, 939597040143840721], "is_up": true}, 
 "ip-179-31-23-54.us-west-2.compute.internal": {"tokens": [-1080182033668782299, -1156652990105327691, 9134535641516355167, 9216775765860996507], "is_up": true}, 
 "ip-179-31-30-139.us-west-2.compute.internal": {"tokens": [-1018393144648419592, -1342532530495729638, 8846603596658721956, 8849342512812637165], "is_up": true}}

The CRD should also contain the storage bucket prefix, which is controlled by the namespace name.

┆Issue is synchronized with this Jiraserver Feature by Unito
┆Epic: Remote Cluster Restore
┆Issue Number: K8SSAND-522
┆Priority: Medium

Handle x-kubernetes-list-map-keys with newer versions of k8s

Describe the bug
The property x-kubernetes-list-map-keys causes problems with CRDs and newer versiosn of k8s (v1.18+)

To Reproduce
When running make test, with Kubernetes v1.18+, the following error happens:

CustomResourceDefinition.apiextensions.k8s.io "cassandrabackups.cassandra.k8ssandra.io" is invalid:
[spec.validation.openAPIV3Schema.properties[status].properties[cassdcTemplateSpec].properties[spec].properties[podTemplateSpec].properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default:
Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property,
spec.validation.openAPIV3Schema.properties[status].properties[cassdcTemplateSpec].properties[spec].properties[podTemplateSpec].properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default:
Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]

Expected behavior
Generated CRDs should not have errors.

Screenshots
N/A

Environment (please complete the following information):

medusa-operator version:
b376cf6
Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"archive", BuildDate:"2020-11-25T13:19:56Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes cluster kind:
Kind
Manifests:
N/A
Operator logs:
N/A

Additional context
N/A

Error reading the medusa_s3_credentials

Describe the bug
Python rises an error during the initialization of medusa container

Environment:

apiVersion: v1
kind: Secret
metadata:
 name: medusa-bucket-key
type: Opaque
stringData:

 medusa_s3_credentials: |-
   [default]
   aws_access_key_id = xxxxxx
   aws_secret_access_key = xxxxxxxx

medusa-operator version:
0.12.2
Helm charts version info

apiVersion: v2
name: k8ssandra
type: application
version: 1.6.0-SNAPSHOT
dependencies:
  * name: cass-operator
    version: 0.35.2
  * name: reaper-operator
    version: 0.32.3
  * name: medusa-operator
    version: 0.32.0
  * name: k8ssandra-common
    version: 0.28.4

Kubernetes version information:
v1.23.1
Kubernetes cluster kind:
EKS
Operator logs:
MEDUSA_MODE = GRPC sleeping for 0 sec Starting Medusa gRPC service INFO:root:Init service [2022-05-10 12:56:28,368] INFO: Init service DEBUG:root:Loading storage_provider: s3 [2022-05-10 12:56:28,368] DEBUG: Loading storage_provider: s3 DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 169.254.169.254:80 [2022-05-10 12:56:28,371] DEBUG: Starting new HTTP connection (1): 169.254.169.254:80 DEBUG:urllib3.connectionpool:http://169.254.169.254:80 "PUT /latest/api/token HTTP/1.1" 200 56 [2022-05-10 12:56:28,373] DEBUG: http://169.254.169.254:80 "PUT /latest/api/token HTTP/1.1" 200 56 DEBUG:root:Reading AWS credentials from /etc/medusa-secrets/medusa_s3_credentials [2022-05-10 12:56:28,373] DEBUG: Reading AWS credentials from /etc/medusa-secrets/medusa_s3_credentials Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "**main**", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/cassandra/medusa/service/grpc/server.py", line 297, in <module> server.serve() File "/home/cassandra/medusa/service/grpc/server.py", line 60, in serve medusa_pb2_grpc.add_MedusaServicer_to_server(MedusaService(config), self.grpc_server) File "/home/cassandra/medusa/service/grpc/server.py", line 99, in **init** self.storage = Storage(config=self.config.storage) File "/home/cassandra/medusa/storage/**init**.py", line 72, in **init** self.storage_driver = self._connect_storage() File "/home/cassandra/medusa/storage/**init**.py", line 92, in _connect_storage s3_storage = S3Storage(self._config) File "/home/cassandra/medusa/storage/s3_storage.py", line 40, in **init** super().**init**(config) File "/home/cassandra/medusa/storage/abstract_storage.py", line 39, in **init** self.driver = self.connect_storage() File "/home/cassandra/medusa/storage/s3_storage.py", line 78, in connect_storage profile = aws_config[aws_profile] File "/usr/lib/python3.6/configparser.py", line 959, in **getitem** raise KeyError(key) KeyError: 'default'

Which could be the problem?

thanks
Cristian

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1499
┆priority: Medium

Medusa-operator deployments on k8s v1.22 fail

Describe the bug
A number of APIs have been changed/removed in k8s v1.22 and the operator can no longer deploy in those environments.

To Reproduce

Deploy a kind cluster with node version v1.22.0

kind create cluster --image "kindest/node:v1.22.0"

Build and deploy medusa operator (note that this example won't work fully as it lacks the bucket configuration required but it demonstrates the core issue)

kustomize build test/config/dev/s3 | kubectl apply -f -

Observe the following errors:

customresourcedefinition.apiextensions.k8s.io/cassandrabackups.cassandra.k8ssandra.io created
customresourcedefinition.apiextensions.k8s.io/cassandrarestores.cassandra.k8ssandra.io created
clusterrole.rbac.authorization.k8s.io/cass-operator-cr created
clusterrolebinding.rbac.authorization.k8s.io/cass-operator-crb created
configmap/medusa-config created
[unable to recognize "STDIN": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1", unable to recognize "STDIN": no matches for kind "CassandraDatacenter" in version "cassandra.datastax.com/v1beta1"]
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "medusa-dev" not found

This error in particular:

[unable to recognize "STDIN": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1", unable to recognize "STDIN": no matches for kind "CassandraDatacenter" in version "cassandra.datastax.com/v1beta1"]

Expected behavior
Deployments should work on k8s v1.22 (as well as previous versions - as reasonable)

Kubernetes version information:

% kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T20:01:24Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:
Kind

Additional context
This is the partial root cause of k8ssandra/k8ssandra#1127

┆Issue is synchronized with this Jira Task by Unito
┆Reviewer: Michael Burman
┆epic: Deployments on k8s v1.22 fail
┆fixVersions: k8ssandra-1.4.0
┆friendlyId: K8SSAND-960
┆priority: Medium

Integrate Fossa component/license scanning

Integrate via GH Actions to automate scanning and reporting
Investigate how reports can be made public (hopefully via Fossa)

┆Issue is synchronized with this Jiraserver Task by Unito
┆Issue Number: K8SSAND-490
┆Priority: Low

Test Bug

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

medusa-operator version:

Helm charts version info

$ helm ls -A

Helm charts user-supplied values

$ helm get values RELEASE_NAME

Kubernetes version information:

kubectl version

Kubernetes cluster kind:

Manifests:

Operator logs:

Additional context

┆Issue is synchronized with this Jira Feature by Unito
┆Issue Number: K8SSAND-143
┆Priority: Medium

upgrade to cass-operator 1.7.0

We need to bump the cass-operator dependency which also includes updating the Go module dependency to point to the new location in the k8ssandra org.

┆Issue is synchronized with this Jiraserver Task by Unito
┆Epic: K8s API Upgrades
┆Issue Number: K8SSAND-465
┆Priority: Medium

Upgrade Go dependency to 1.15

The Go module is configured to use Go 1.13. The image uses golang:1.13 as its base image. We should upgrade both to Go 1.15.

Remove RESTORE_KEY and BACKUP_NAME env vars after restore is complete

Is your feature request related to a problem? Please describe.

We use terraform to create the datacenter objects in kubernetes. When a backup is applied, the medusa-operator adds RESTORE_KEY and BACKUP_NAME env variables to the medusa container. Next time we try to apply terraform this appears as a change and terraform even fails to apply this change due to field_manager conflicts.

Describe the solution you'd like

After a restore is complete, medusa-operator should remove the environment variables it has added.

Describe alternatives you've considered

None.

Additional context

The diff from terraform:

- {
    - name      = "BACKUP_NAME"
    - value     = "backup0"
    - valueFrom = {
        - configMapKeyRef  = {
            - key      = null
            - name     = null
            - optional = null
          }
        - fieldRef         = {
            - apiVersion = null
            - fieldPath  = null
          }
        - resourceFieldRef = {
            - containerName = null
            - divisor       = null
            - resource      = null
          }
        - secretKeyRef     = {
            - key      = null
            - name     = null
            - optional = null
          }
      }
  },
- {
    - name      = "RESTORE_KEY"
    - value     = "ec6b2264-9644-4ee7-b84c-bb7baf536bb7"
    - valueFrom = {
        - configMapKeyRef  = {
            - key      = null
            - name     = null
            - optional = null
          }
        - fieldRef         = {
            - apiVersion = null
            - fieldPath  = null
          }
        - resourceFieldRef = {
            - containerName = null
            - divisor       = null
            - resource      = null
          }
        - secretKeyRef     = {
            - key      = null
            - name     = null
            - optional = null
          }

Error from terraform on apply failure:

╷
│ Error: There was a field manager conflict when trying to apply the manifest for "databases/cassandra-1"
│
│   with module.cassandra_1.kubernetes_manifest.cassandra_datacenter,
│   on ../../../../tf-modules/eks-cassandra-datacenter/cassandra_datacenter.tf line 15, in resource "kubernetes_manifest" "cassandra_datacenter":
│   15: resource "kubernetes_manifest" "cassandra_datacenter" {
│
│ The API returned the following conflict: "Apply failed with 1 conflict: conflict with \"manager\" using cassandra.datastax.com/v1beta1: .spec.podTemplateSpec.spec.initContainers"
│
│ You can override this conflict by setting "force_conflicts" to true in the "field_manager" block.
╵

Using force_conflicts seems a bit dangerous given that will overwrite anything else that could be important.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1183
┆priority: Medium

MedusaTask BackupSync has limit on number of backups

Type of question

What did you do?
We have a MedusaBackupSchedule set for every 30 minutes and 365day max backup age, using s3 as the storage backend
In a dev env, we have daily restores of these backups to test viability.

After about a week, with backup number exceeding ~400 (I don't have the exact number that tripped the error), the BackupSync task started failing at the medusaoperator with the following error:

"MedusaTask": "k8ssandra-failover/medusa-backup-sync-cluster2-main-1698093327", "cassd
│ c": "main", "CassandraPod": "cluster2-main-az-1c-sts-1", "error": "failed to get backups: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (10903823 vs. 4194304)"

Did you expect to see something different?

Environment (please complete the following information):

medusa-operator version:
0.15.0

Helm charts version info

k8ssandra-operator-1.8.1

Helm charts user-supplied values

# Chart version: 1.8.1
global:
 clusterScoped: true
image:
 registry: '[docker.io](https://docker.io/)'
 repository: 'k8ssandra/k8ssandra-operator'
 tag: 'v1.8.1'
cass-operator:
 image:
  tag: 'v1.17.2'

Kubernetes version information:

EKS v 1.25

Operator logs:
NA

Backup can succeed without being marked as finished

Some backups never get marked as finished for unclear reasons.
Looking at the code, it appears that the doBackup() gRPC call is a blocking one running in a goroutine.
Some backups can last for many hours, making it unreliable to rely on blocking http calls.
Even running the backup and checking the status of the backup in the storage bucket (using medusa status for example), would not be reliable as it would detect successful backups but not failed ones (which look like running ones to Medusa).
Instead, we'd need to make the doBackup() call a short operation which starts a thread running the actual backup. Another gRPC operation should be created to check the state of the thread, allowing to monitor the backup operation in an async fashion.
The Medusa parts of this are captured in this issue.

┆Issue is synchronized with this Jira Bug by Unito
┆Affected Versions: k8ssandra-1.2.0,k8ssandra-1.3.0
┆Epic: Remote Cluster Restore
┆Issue Number: K8SSAND-624
┆Priority: Medium

Do not use the manager client in tests

We are using the manager client in tests which is the same client used by controllers. It reads from caches which means it is not strongly consistent. In tests we usually want strong consistency where the client will read directly from the api server.

@burmanm pointed out kubernetes-sigs/kubebuilder#2066 to me which discusses this.

Deprecation notice

With k8ssandra-operator now available and capable of replacing the stand alone functionality of this operator, provide a deprecation notice in our basic in-tree doc content stating that users should migrate to using k8ssandra operator.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1272
┆priority: Medium

Add bucket prefix and in-place/remote env variables for restores

The Medusa restore init containers need following new env variables to be set:

Bucket prefix: path prefix to point to another cluster's backups in the storage bucket
In Place/Remote: flag indicating whether the performed restore is in-place or remote.

These values will then be used to invoke medusa.restore_node in the medusa-restore init container. The init-container needs to deal with mapping between source and dest topologies to self determine which backup it needs to restore. Pre-checks in the operator will guarantee that a 1:1 mapping is possible and topologies are compatible.

┆Issue is synchronized with this Jiraserver Feature by Unito
┆Epic: Remote Cluster Restore
┆Issue Number: K8SSAND-525
┆Priority: Medium

Check restore mapping is valid before triggering restore

When using NetworkTopologyStrategy, the placement of replicas is dependent on the DC and rack layout. In this context, the replica for a token range will be placed on the node that owns the next token range in the next rack. We can't restore any node to any other node as we may end up putting several replicas in the same rack, and the nodes we'd restore to would possibly end up not containing data they're supposed be replica for. We can be rigid about the mapping and expect the same dc and rack names exclusively, or do a 1:1 mapping between DCs/Racks despite having different names (but expect the same number of both).

This check should be performed before attempting to restore a backup.

The backup definition will be used to map the backup hosts with the restore hosts and perform checks on whether or not the restore is possible:

Same number of DCs
Same number of nodes per DC
Same rack distribution per DC

Nodes will be mapped 1:1 between the source and destination topologies (sorted alphabetically). If the mapping fails to associate nodes 1:1, the restore operation must fail.

┆Issue is synchronized with this Jiraserver Feature by Unito
┆Epic: Remote Cluster Restore
┆Issue Number: K8SSAND-524
┆Priority: Medium

CRDs are broken in v0.3.1

This is a follow up to #42. The CRDs are still broken. The json patches in crd/patches/cassdc_config_patch.json need to be updated since the new version of the CRDs have different paths for the properties that are patched.

┆Issue is synchronized with this Jira Feature by Unito
┆Fix Versions: k8ssandra-1.2.0
┆Issue Number: K8SSAND-472
┆Priority: Medium

medusa-operator - update Go to 1.17

Is your feature request related to a problem? Please describe.
Similar to https://github.com/k8ssandra/k8ssandra-operator/issues/310 we should update medusa-operator to use Go v1.17.

Why do we need it?
Keeping up with go versions is required if we want to update to newest Kubernetes libraries (for example, controller-gen creates v1.17 versions).

┆Issue is synchronized with this Jira Feature by Unito
┆friendlyId: K8SSAND-1346
┆priority: High

update Go module and docker image to latest (1.17)

Is your feature request related to a problem? Please describe.
Our scanners are picking up some high and critical vulnerabilities in Go 1.15
Critical
CVE-2022-23806
CVE-2021-38297
High
CVE-2021-39293
CVE-2022-23772
CVE-2022-23773
CVE-2021-29923

Describe the solution you'd like
update Go module and docker image to latest (1.17)

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1268
┆priority: Medium

CRDs are broken in v0.3.0

In v0.3.0 I attempted to upgrade the apiVersion of the CRDs from apiextensions.k8s.io/v1beta1 to apiextensions.k8s.io/v1. The CRDs are invalid though and installing them will fail with an error like this:

"Error: failed to install CRD crds/backup.yaml: CustomResourceDefinition.apiextensions.k8s.io \"cassandrabackups.cassandra.k8ssandra.io\" is invalid: spec.versions[0].schema.openAPIV3Schema: Required value: schemas are required",

┆Issue is synchronized with this Jira Feature by Unito
┆Epic: K8s API Upgrades
┆Issue Number: K8SSAND-188
┆Priority: Medium

TEST - IGNORE

Epic task test

My epic description.

Task one: #2
Another task #4
Third task #3

How to restore single keyspace using medusa in k8ssandra cluster??

Type of question

Best practices
How to perform a particular operation
Cassandra-related question
Monitoring-related question
Repair-related question
[x ] Backup/restore-related question
Hey, I want to restore only a single keyspace from my s3 backup instead of the whole cluster. can you please help me with how can I back up and restore a particular keyspace using the medusa operator??
Open question

What did you do?

Did you expect to see something different?

Environment (please complete the following information):

medusa-operator version:

v0.1.0

Helm charts version info

$ helm ls -A

Helm charts user-supplied values

$ helm get values RELEASE_NAME

Kubernetes version information:

kubectl version

Kubernetes cluster kind:

Manifests:

Operator logs:

kubectl -n <namespace> logs -l name=<releaseName>-medusa-operator-k8ssandra

Additional context

Wrong grpc-health-probe binary for ARM architecture

Describe the bug
Container health checks fails due to an "exec format error".
The k8ssandra/medusa image for ARM architecture is using the wrong grpc-health-probe binary and it makes to fail the health checks.

To Reproduce

Deploy a K8ssandra cluster with medusa enabled on an ARM architecture.
The medusa container will give the following error: 'Liveness probe failed: exec /bin/grpc_health_probe: exec format error'

Expected behavior
Health checks should work. And give a successful response.

Screenshots

Environment (please complete the following information):

medusa-operator version:
imageID: docker.io/k8ssandra/medusa@sha256:cb5dd774606878148dedb3f015f3edd5f0000be2509ba85e97de1086ec8abee9
Helm charts version info
k8ssandra-operator k8ssandra-operator 7 2024-05-02 08:56:51.021382029 -0500 -05 deployed k8ssandra-operator-1.15.0 1.15.0

Which could be the problem?
The image for ARM architectures list the following command in the layers:
'RUN /bin/sh -c GRPC_HEALTH_PROBE_VERSION=v0.4.25 && wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64 && chmod +x /bin/grpc_health_probe # buildkit'

It is using the amd64 binary version of grpc_health_probe instead of the arm64.
It should be:
'RUN /bin/sh -c GRPC_HEALTH_PROBE_VERSION=v0.4.25 && wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-arm64 && chmod +x /bin/grpc_health_probe # buildkit'

Add control loop to sync CassandraBackup objects with storage bucket

In order to allow remote restores, we need to be able to sync the local CassandraBackup resources with the configured storage backend.

The sync should allow:

Creating local missing resources from existing backups
Deleting local resources for missing backups

A single control loop should be able to handle all 3 operations. The sync should be required explicitly, meaning that a new CRD should be created to allow requesting synchronization to the medusa-operator, and that CRD should contain a prefix so that a cluster can list backups stored for another cluster in a multi-tenant storage bucket.

The sync operation would:

List the backups in the storage bucket, through a gRPC call
Compare that list to the existing CassandraBackup resources
Create a CassandraBackup object for each missing backup
Delete local resources that don’t exist in the list of backups

┆Issue is synchronized with this Jiraserver Feature by Unito
┆Epic: Remote Cluster Restore
┆Issue Number: K8SSAND-521
┆Priority: Medium

Alter system_auth and repair it before remote restores

Remote restores will require to keep the existing system_auth keyspace so that existing credentials can be retained. As token ownership will change, auth data could become unavailable after the restore. To avoid this and as preliminary step to a remote restore, the system_auth keyspace should be altered to be replicated on all nodes in the cluster and then repaired.

┆Issue is synchronized with this Jiraserver Feature by Unito
┆Epic: Remote Cluster Restore
┆Issue Number: K8SSAND-527
┆Priority: Medium

Remove CassandraDatacenter recreation upon remote restore

Medusa operator should not recreate the CassandraDatacenter object upon remote restores as it is assumed to be created with the right topology already (tokens excluded as they'll be enforced).

┆Issue is synchronized with this Jira Feature by Unito
┆Epic: Remote Cluster Restore

k8ssandra / medusa-operator Goto Github PK

medusa-operator's Introduction

[DEPRECATED]

K8ssandra

Components

Apache Cassandra

Stargate

Monitoring

Repairs

Backup & Restore

Next Steps

Dependencies

medusa-operator's People

Contributors

Stargazers

Watchers

Forkers

medusa-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org