Giter Site home page Giter Site logo

reaper-operator's Introduction

Cassandra Reaper Operator

A Kubernetes operator for Cassandra Reaper

Project status: alpha

Features

  • Support for Cassandra storage backend
  • Configure Reaper instance through Reaper custom resource
  • Support for specifying resource requirements, e.g., cpu, memory
  • Support for specifying affinity and anti-affinity

Requirements

  • Go >= 1.13.0
  • Docker client >= 17
  • kubectl >= 1.13
  • Kubernetes >= 1.15.0
  • Operator SDK = 0.14.0

Note: The operator will work with earlier versions of Kubernetes, but the configuration update functionality requires >= 1.15.0.

Dependencies

For information on the packaged dependencies of Reaper Operator and their licenses, check out our open source report.

reaper-operator's People

Contributors

adejanovski avatar burmanm avatar jdonenine avatar jeffbanks avatar jsanda avatar miles-garnsey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reaper-operator's Issues

Deprecation notice

With k8ssandra-operator now available and capable of replacing the stand alone functionality of this operator, provide a deprecation notice in our basic in-tree doc content stating that users should migrate to using k8ssandra operator.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1273
┆priority: Medium

Generate images with fixed tags on commits to master

Is your feature request related to a problem? Please describe.

Cass-operator and management API CI pipelines both emit tagged images when commits are pushed to master.

reaper-operator should be brought into line with this practice to make testing possible in downstream repos (e.g. k8ssandra) without referring to a latest tag.

Describe the solution you'd like

Switch out the latest tag currently used in the image push for a sha based hash.

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-1037
┆Priority: Medium

Upgrade reaper-operator operator-sdk to 1.6.1/controller runtime to 0.9.2

Reaper's operator SDK is old, all versions need to be upgraded to ensure that they support the latest features (e.g. strategic merge patch within the k8s client is required for k8ssand-816 and is not available in the current controller-runtime version.)

The operator-sdk may also have a bearing on the versions of the k8s libraries that can be used, which probably plays into k8ssand-961)

┆Issue is synchronized with this Jira Task by Unito
┆Epic: Reaper on k8s 1.22
┆Issue Number: K8SSAND-1001
┆Priority: Medium

Add support for Cassandra authentication

Is your feature request related to a problem? Please describe.
We want to enable configuration C* authentication. For Reaper that requires specifying credentials for Reaper to use.

Describe the solution you'd like
The following changes are needed to support C* auth:

  • Add property in CRD for specify name of secret with credentials
  • Update controller code to mount secret and set necessary env vars
  • Add/update unit tests
  • Add/update e2e tests

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Need support to enable user authentication for Reaper webui

Is your feature request related to a problem? Please describe.
We need reaper-operator give an interfact to overwrite the REAPER_AUTH_ENABLED env var, so that if we expose reaper webui, it would be more secure.

Describe the solution you'd like
Let CRD to support REAPER_AUTH_ENABLED can be overwritten. For now, it's hard coded "false".

┆Issue is synchronized with this Jiraserver Bug by Unito
┆Issue Number: K8SSAND-720
┆Priority: Medium

CassandraDatacenter CRD needs to be installed for integration tests

The integration tests in #6 are failing with this error:

2020-11-19T13:30:18.023-0500	ERROR	controllers.Reaper	failed to check for CassandraDatacenter readiness	{"reaper": "reaper-test-0/test-reaper", "error": "CassandraDatacenter.cassandra.datastax.com \"test-dc\" not found"}

After some investigation, we discovered that this is due to not installing the CassandraDatacenter CRD. This is breaking integration tests.

I think we need to properly kustomize cass-operator because this will be needed for medusa-operator as well. I think that should be addressed separately but wanted to point it out.

Both Reaper pod containers aren't using the same secrets

Describe the bug
When using a predefined secret for jmx and cassandra, the init container and the main container seem to drift away and use different secret references.
The init container seems to get the default secret name that is used when the helm templates autogenerate it while the main container is using the custom secret name.
This cluster was first using autogenerated secrets and later upgraded to use custom secrets.

Expected behavior
The secrets used to connect to Cassandra should be the same for both the init and the main container

Screenshots

Environment (please complete the following information):

  • reaper-operator version:
    v0.3.3
  • Kubernetes version information:
    v1.21
  • Kubernetes cluster kind:
    GKE
  • Manifests:
  reaper:
    autoschedule: true
    enabled: true
    ingress:
      enabled: false
    image:
      registry: docker.io
      repository: thelastpickle/cassandra-reaper
      tag: 3.0.0
      pullPolicy: IfNotPresent
    cassandraUser:
      username: "reaper"
      secret: "dogfood-reaper-secret"
    jmx:
      username: "reaper"
      secret: "dogfood-reaper-jmx-secret"
  • Operator logs:

Additional context

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1059
┆priority: Medium

Monitor C* clusters for out of band changes

The Reaper CRD allows you to specifies clusters as in the following example:

apiVersion: reaper.cassandra-reaper.io/v1alpha1
kind: Reaper
metadata:
  name: cass-reaper
spec:
  image: jsanda/reaper-k8s:cfad3f9f4cac-e026be8176d4
  serverConfig:
    storageType: cassandra
    cassandraBackend:
      clusterName: cluster-1
      contactPoints:
      - cluster-1
      keyspace: reaper
      replication:
        networkTopologyStrategy:
          dc1: 3
  clusters:
  - name: cluster-2
    namespace: default
    service:
      name: cluster-2
      namespace: default

When this Reaper object is created, the operator will do the following:

  • Create the reaper keyspace in the cluster-1 cluster
  • Create the Deployment for the Reaper application, which is named cass-reaper
  • Make a REST API to add cluster-2 to cass-reaper

A user can go into the Reaper UI and remove cluster-2. This would be an out of band change with respect to the operator. It would make the actual state of cass-reaper different from the desired state , and the operator will not be aware of the change.

There needs to be a background job that checks for out of band changes. When the job detects an out of band change, it should queue the Reaper object for reconciliation. A subsequent reconciliation of cass-reaper for example, will result in cluster-2 being added back to the Reaper application.

Note that the background monitoring job should only inspect clusters that are defined in .spec.clusters. Other clusters can be added to/removed from the Reaper application. The monitoring job will ignore them.

It is also important to note that clusters added to the Reaper application through .spec.clusters cannot removed directly through the application. The clusters must be removed from .spec.clusters to trigger removal from the application.

Deployments on k8s v1.22 fail

Describe the bug
A number of APIs have been changed/removed in k8s v1.22 and the operator can no longer deploy in those environments.

To Reproduce

  1. Deploy a kind cluster with node version v1.22.0
kind create cluster --image "kindest/node:v1.22.0"
  1. Build and deploy medusa operator (note that this example won't work fully as it lacks the bucket configuration required but it demonstrates the core issue)
kustomize build test/config/dev | kubectl apply -f - 
  1. Observe the following errors:
customresourcedefinition.apiextensions.k8s.io/reapers.reaper.cassandra-reaper.io created
clusterrole.rbac.authorization.k8s.io/cass-operator-cr unchanged
clusterrolebinding.rbac.authorization.k8s.io/cass-operator-crb configured
[unable to recognize "STDIN": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1", unable to recognize "STDIN": no matches for kind "CassandraDatacenter" in version "cassandra.datastax.com/v1beta1"]
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found
Error from server (NotFound): error when creating "STDIN": namespaces "reaper-dev" not found

This error in particular:

[unable to recognize "STDIN": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1", unable to recognize "STDIN": no matches for kind "CassandraDatacenter" in version "cassandra.datastax.com/v1beta1"]

Expected behavior
Deployments should work on k8s v1.22 (as well as previous versions - as reasonable)

  • Kubernetes version information:
% kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T20:01:24Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:
    Kind

Additional context
This is the partial root cause of k8ssandra/k8ssandra#1127

┆Issue is synchronized with this Jira Task by Unito
┆Fix Versions: k8ssandra-1.4.0
┆Issue Number: K8SSAND-961
┆Priority: Medium

Add e2e test for registering CassandraDatacenters

We need an e2e test to exercise cassandradatacenter_controller.go. The test should minimally do the following:

  • Deploy a Reaper instance with a CassandraDatacenter backend
  • Register the CassandraDatacenter

It would be nice to also do the following (but not absolutely necessary):

  • Deploy and register a second CassandraDatacenter
  • Remove the CassandraDatacenter from Reaper via REST API and verify that the CassandraDatacenter is added back

Clean up the config directory

Is your feature request related to a problem? Please describe.

The ./config directory in this project needs to be reviewed and so does the one under ./test/config.

It is filled with what appears to be cruft created by operator-sdk like this.

Trying to do kustomize build config/default results in an error, but should probably produce a Reaper deployment.

Under test/config there are a few problems also, for example, where deploy_reaper_test should probably deploy only Reaper, it also deploys CassDCs. This is a good, but it would be more modular (and reusable) if the kustomize.yaml at the top level called one kustomize.yaml for reaper and another kustomize.yaml for the supporting components.

Describe the solution you'd like

It would be nice to two kustomize bases - one for Reaper and one for whatever other bits are required for testing (e.g. Cassandra backend and such).

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-837
┆Priority: Medium

Upgrade Go dependency to 1.15

The Go module is configured to use Go 1.13. The image uses golang:1.13 as its base image. We should upgrade both to Go 1.15.

Upgrade operator SDK

Reaper is currently working from 3-alpha of kubebuilder and from 0.6.2 of the controller-runtime. The latter was released over a year ago and doesn't have all the features we'd like to implement fixes for #63.

We should upgrade controller-runtime to a more recent version - I'm targeting at least 0.8.3. This will correspond to 1.7.1 of the operator SDK. We should run through the migrations described here.

PR in draft under #71 .

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-911
┆Priority: Medium

Document reaper-operator

This project has no standalone documentation, and it could probably use some.

Documentation which is focused on k8ssandra is mainly about the helm charts. This repo should probably have a configuration reference for the CRs at least.

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-906
┆Priority: Medium

Do not use the manager client in tests

We are using the manager client in tests which is the same client used by controllers. It reads from caches which means it is not strongly consistent. In tests we usually want strong consistency where the client will read directly from the api server.

burmanm pointed out kubernetes-sigs/kubebuilder#2066 to me which discusses this.

┆Issue is synchronized with this Jira Task by Unito
┆Reviewer: Michael Burman
┆friendlyId: K8SSAND-144
┆priority: Medium

CVE upgrade requirement k8s.io >=1.18.19

Issue

Per CVE-2021-25737, upgrade to at least 1.18.19 is recommended.

Moderate severity issue

A security issue was discovered in Kubernetes where a user may be able to redirect pod traffic to private networks on a Node. Kubernetes already prevents creation of Endpoint IPs in the localhost or link-local range, but the same validation was not performed on EndpointSlice IPs.

**Fix:
Upgrading 1.18.19

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-941
┆Priority: Medium

CI/CD integration for automated e2e/integration testing

Is your feature request related to a problem? Please describe.
The project is currently lacking automated execution of e2e/integration tests.

Describe the solution you'd like
Provide an automated approach to the execution of the existing e2e/integration tests in a publicly visible and available fashion. Preferably via GH actions.

Describe alternatives you've considered
Other CI/CD tools might be possible, such as CircleCI.

Additional context
This issue might need to be clarified or split up into multiple issues to target specific types of tests or integration approaches.

┆Issue is synchronized with this Jiraserver Task by Unito
┆Issue Number: K8SSAND-149
┆Priority: Medium

Testing

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

┆Issue is synchronized with this Jiraserver Task by Unito
┆Issue Number: K8SSAND-142
┆Priority: Medium

Schema should be fully created prior to starting the Reaper pod

Schema migrations on startup of Reaper can take more time than the liveness probe would allow. This will cause the pods to be killed and restarted, which can happen in the middle of a migration. Some DDL statements are not idempotent (such as adding a column) and can put the pod in a loop as subsequent migration attempts will fail consistently.

Creating the whole schema upfront (including the schema migration table content) would ensure the Reaper pod starts quickly on the first run and avoid that behavior.

This could be done through a job or an init container.

┆Issue is synchronized with this Jira Bug by Unito
┆Reviewer: 5fa4593062584c006b1e04d0
┆Fix Versions: k8ssandra-1.3.0
┆Issue Number: K8SSAND-625
┆Priority: Medium

Add support for JMX authentication

Reaper has the jmxAuth and jmxCredentials properties. This tickets aims to add support only for the latter. It offers more flexibility over jmxAuth since you can specify credentials for multiple clusters, e.g.,

jmxCredentials:
  clusterProduction1:
    username: user1
    password: password1
  clusterProduction2:
    username: user2
    password: password2

Reaper upgrade can cause schema migration exception

Describe the bug
I observed the issue in an a k8ssandra integration test for upgrades. It upgrades from k8ssandra 1.0 to the latest which right now is 1.3.0. This involves a Reaper upgrade. While investigating some test failures I saw this exception in the Reaper log:

ERROR  [2021-08-17 20:48:34,020] [main] i.c.ReaperApplication - Storage is not ready yet, trying again to connect shortly...
org.cognitor.cassandra.migration.MigrationException: Error during migration of script 022_cluster_states.cql while executing 'ALTER TABLE cluster ADD state text;'
        at org.cognitor.cassandra.migration.Database.execute(Database.java:269)
        at java.util.Collections$SingletonList.forEach(Collections.java:4824)
        at org.cognitor.cassandra.migration.MigrationTask.migrate(MigrationTask.java:68)
        at io.cassandrareaper.storage.CassandraStorage.migrate(CassandraStorage.java:366)
        at io.cassandrareaper.storage.CassandraStorage.initializeCassandraSchema(CassandraStorage.java:297)
        at io.cassandrareaper.storage.CassandraStorage.initializeAndUpgradeSchema(CassandraStorage.java:255)
        at io.cassandrareaper.storage.CassandraStorage.<init>(CassandraStorage.java:243)
        at io.cassandrareaper.storage.InitializeStorage.initializeStorageBackend(InitializeStorage.java:69)
        at io.cassandrareaper.ReaperApplication.tryInitializeStorage(ReaperApplication.java:486)
        at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:173)
        at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:90)
        at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:43)
        at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:87)
        at io.dropwizard.cli.Cli.run(Cli.java:78)
        at io.dropwizard.Application.run(Application.java:93)
        at io.cassandrareaper.ReaperApplication.main(ReaperApplication.java:109)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid column name state because it conflicts with an existing column
        at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)

I also observed that there were two Reaper pods. The reason that there were two Reaper pods is because the default Deployment update strategy is used. It looks like this:

  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate

Here is what the Deployment docs say about updates:

Deployment ensures that only a certain number of Pods are down while they are being updated. By default, it ensures that at least 75% of the desired number of Pods are up (25% max unavailable).

We should be able to resolve this by changing the DeploymentStrategyType to Recreate. This will kill existing pods before creating any new ones.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy k8ssandra 1.0
  2. Upgrade to k8ssandra 1.3
  3. Monitor the Reaper deployment
  4. Check the Reaper logs for the above error

I am seeing this error frequently in GitHub Actions but not so much locally.

Expected behavior
There should only be a single Reaper pod running at any given time.

Environment (please complete the following information):

  • reaper-operator version:
    0.3.4
  • Helm charts version info
    1.3.0

┆Issue is synchronized with this Jira Bug by Unito
┆Affected Versions: k8ssandra-1.3.0
┆Fix Versions: k8ssandra-1.4.0
┆Issue Number: K8SSAND-816
┆Priority: Medium

k8s version test matrix via kuttl

Is your feature request related to a problem? Please describe.

We are seeing test failures on k8s 1.22 (.e.g. #76 ) which makes me think we should probably be moving to test reaper-operator against more versions of k8s. We should at least be testing 1.20 - 1.22 as this is where a lot of API removals occurred.

kuttl will make this easier, as it allows for a declarative definition of what resources should be created and what their status should be in a series of test steps.

Declarative definition of resource state

As our cass-operator deployments have become more complex, there is a risk of timing issues arising (e.g. from cert-manager's webhook not being ready before creating certificate resources).

Defining expected system state via partial manifests makes it clear what criteria each test step must satisfy before moving onto the next step. This helps eliminate timing issues.

In-built management of kind clusters and images

Our current test approaches rely on starting kind manually and then loading the requisite docker images which have been built locally. Kuttl handles these tasks itself and also handles the gathering of all logs from the cluster, which saves us the effort of maintaining a framework to handle this.

This also makes the process of standing up kind clusters easier (both locally and on GHA).

┆Issue is synchronized with this Jira Task by Unito
┆Epic: Reaper on k8s 1.22
┆Issue Number: K8SSAND-1013
┆Priority: Medium

Add a startupProbe to Reaper deployments

Is your feature request related to a problem? Please describe.

At present, when Reaper starts, it is frequently restarted while running the migration process. It appears that it does not return liveness probes during this this process.

We could use a startupProbe in addition to the liveness probe to avoid unneccesary container churn and improve startup times.

Describe the solution you'd like
Add a startup probe to the Reaper deployment.

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-830
┆Priority: Medium

Rework test kustomizations so that they point at cass-operator 1.8.0 resources.

Is your feature request related to a problem? Please describe.

Reaper-operator's integration tests deploy a single node Cassandra cluster via cass-operator. The manifests are directly embedded in this repo.

The cass-operator manifests are based around a 1.7.0 deployment which still relies on CRD API version v1beta1. When integration tests are upgraded to run on k8s 1.22 they will begin to fail because v1beta versioned CRDs are removed in that version, having been deprecated in 1.16.

Describe the solution you'd like

It would be ideal to draw cass-operator manifests, CRDs and perhaps even the cassDC itself from the cass-operator Github repo remotely so that we can update them from a single place.

Note that a fix for reaper-operator #76 is dependant on this ticket.

┆Issue is synchronized with this Jira Task by Unito
┆Epic: Reaper on k8s 1.22
┆Issue Number: K8SSAND-1008
┆Priority: Medium

Add liveness and readiness checks to reaper-operator

reaper-operator's deployment doesn't appear to include liveness or readiness checks at present. We should include these in the deployment and set up an HTTP server in the manager to respond to them appropriately.

┆Issue is synchronized with this Jira Task by Unito
┆Issue Number: K8SSAND-1005
┆Priority: Medium

Add or update an e2e test to verify auto scheduling

#17 added initial support for enabling auto scheduling. It would be good to have test coverage to verify that auto scheduling is actually set up. This can be done via Reaper's REST API. We will need to add onto the API in reaper-client-go for this.

Having this test coverage will be important as we build out more functionality around auto scheduling.

┆Issue is synchronized with this Jiraserver Task by Unito
┆Issue Number: K8SSAND-147
┆Priority: Medium

reaper-operator - update Go to 1.17

Is your feature request related to a problem? Please describe.
Similar to k8ssandra/k8ssandra-operator#310 we should update reaper-operator to use Go v1.17.

Why do we need it?
Keeping up with go versions is required if we want to update to newest Kubernetes libraries (for example, controller-gen creates v1.17 versions).

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1343
┆priority: High

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.