schemahero / schemahero Goto Github PK

A Kubernetes operator for declarative database schema management (gitops for database schemas)

License: Apache License 2.0

Makefile 6.63% Go 90.96% Shell 0.54% Dockerfile 1.86%

database-schema database-management kubernetes kubernetes-operator gitops database-schemas manage-databases hacktoberfest

schemahero's Introduction

What is SchemaHero?

SchemaHero is a Kubernetes Operator for Declarative Schema Management for various databases. SchemaHero has the following goals:

Database table schemas can be expressed as Kubernetes resources that can be deployed to a cluster.
Database schemas can be edited and deployed to the cluster. SchemaHero will calculate the required change (ALTER TABLE statement) and apply it.
SchemaHero can manage databases that are deployed to the cluster, or external to the cluster (RDS, Google CloudSQL, etc).

Getting Started

The recommended way to deploy SchemaHero is to use the kubectl plugin. Full instruction and other installation methods listed in the documentation.

To get started, read the tutorial and the full documentation

Community

For questions about using SchemaHero, there's a Replicated Community forum, and a #schemahero channel in Kubernetes Slack. If you are interested in contributing to Schemahero, you can find more information on how to get started on our Community page.

If you're a user of SchemaHero, add yourself to the ADOPTERS file!

schemahero's People

Stargazers

Watchers

Forkers

emosbaugh zbruhnke perlchild linus5 rogervaas graysonnull dexhorthy pauldthomson swrobel akansh97531 dawookie alrs isgasho teruhiky yuvalherziger divolgin jashandeep-sohi rio laverya thalesfsp hineso11 arin-pang kylewolfe danstough pecigonzalo asquare14 rajasahil nisarg1499 soniasingla arcolife pankaj892 smagill ekinbarut zcubbs cohenjo dzsquared nathanawmk albertlockett cloudnativeentrepreneur moteesh-reddy jonesambrosi ogtrilliams woshizilong hecg119 franklinharry polymath-is pavansokkenagaraj pm-hwks chanupen chazzrobbz cptkirk fryuni verbotenj socioprophet codybswaney buccio limengcanyu fdu-ysp scpcy luis-sousa-pinto alegrowin schmitch-copies hj3938 kaessert garcialuis aakankshabhende khestia ishaankalra facets-cloud iq-scm mohdul dominecore kbanka brunoscaglione phat3

schemahero's Issues

SchemaHero manager is OOMKilled when running with default settings

The default recommends a memory limit of 200Mi. This is causing some OOMKilled restarts. Document what the real recommendation is.

Add support for dropping/altering unique constraints

Indexes that supports a UNIQUE or PRIMARY KEY constraint cannot be dropped this way.

https://www.postgresql.org/docs/10/sql-dropindex.html

Executing query "drop index \"pullrequest_history_pkey\""
Error: pq: cannot drop index pullrequest_history_pkey because constraint pullrequest_history_pkey on table pullrequest_history requires it
pq: cannot drop index pullrequest_history_pkey because constraint pullrequest_history_pkey on table pullrequest_history requires it

Updating a database connection string does not redeploy the watch

When updating a connection string in a database custom resource, the watch pod should be updated to reflect the update.

Currently, it doesn't update.

Add support for set type in mysql

Reenable tests disabled by upstream bug

There are several unit tests that work locally but fail in CI because of kubernetes-sigs/kubebuilder#655

These have been disabled in schemahero until we get the patch to make them work. The disabled tests are all commented with the above issue number.

Prevent race condition in a new environment

When creating a new environment, it's not uncommon to deploy everything with a single kubectl apply command. This includes the database, the schema, the schemahero operator and the application.

When this happens today, schemahero creates the database-watch deployment, which creates a pod that attempts to connect to the database. It's not often initialized this quickly, so SchemaHero exits and ends up in a CrashloopBackoff state. This eventually resolves as components becoming initialized.

How can SchemaHero have a cleaner startup to prevent this?

pg: integer[] tries to update to integer

When I have a column of type integer[], it properly deploys. But on subsequent syncs and runs, schemahero tries to change this column to type integer (drops the array). This fails, but leaves the migration pod in a crash loop backoff.

Cannot delete and then re-add a column

If you attempt to delete a column from a table, it works. But adding the exact same column back fails to apply. The SHA already exists, and there are leftover completed tables in K8s which cause the execution to never get scheduled

Migrations should be deleted when a table is deleted

When SchemaHero creates a migration object, it should set the controller ref to the table, so that when the table object is deleted, all pending migrations are garbage collected also.

Design Proposal: Migrations

One area that makes SchemaHero difficult as a replacement to other tools is that SchemaHero doesn't handle custom, complex data migration scenarios. Tools such as goose and db-migrate support language-specific (go, node in this case) migrations.

There are scenarios in which non-sql code is needed to handle a more complex data migration.

This issue as a proposal to add this.

To start, we already have Table kind and Database kind. I'm proposing using the Migration kind that exists, unused, in the project now.

The CRD will contain a spec:

image: schemahero/golang:1.12  
pullPolicy: Always
database: my-db 
requires:
  table: 
    name: projects
      columns: []
      pks: []
      fks: []
up:
  url: ""
  inline: |
down:
  url: ""
  inlint: |

To start, the image and pullPolicy are references to the container image that the migration will run in. SchemaHero should publish as Node, Go, Python one at least.

The database field references which database to run the migration against.

The requires field defines the required state of the database before this migration is started. This is here to enable new columns and migrations to be simultaneously deployed. For now, we can support migrations that depend on the presence of columns, or the definition of a FK or PK. We should consider how to support the type of a column also.

Finally, there is an up and a down attribute. These contain either a URL that references where to download the migration code, or the code can be inline.

The code should runnable, as is. For compiles languages (go), the code will be compiled and executed.

Generating operator yaml does not include namespace

When running kubectl schemahero install --yaml the kind: Namespace is not included in the output

Add support for Unique . constraints

Show additional columns in migrations output

Currently we have:

kubectl get migrations
NAME      AGE
556a1ca   17h
a5b8347   17h

We should show the database and table name here also. And probably the plannedAt, approvedAt, etc. timestamps

v1alpha3: failed to create configmap, already exists

2020-04-04T12:00:16.904-0700 ERROR migration/migration_controller.go:112 failed to create config map: configmaps "556a1ca" already exists

Steps to reproduce:

Install SchemaHero
Create a database with immediateDeploy: true
Deploy a table, wait for it to be deployed
Edit the table yaml (delete a column)
Apply the table yaml again
The migration controller logs this error continuously.

This brings up a couple of questions:

Why is the hashed / generated name of the migration the same when the table yaml is different?
The config map name is pretty terrible :), maybe we should change this to include the database/table names in it too

"not null" constraint cannot be applied to columns that already have null values

Create a table with a nullable column and add some rows with NULL values:

apiVersion: schemas.schemahero.io/v1alpha2
kind: Table
metadata:
  name: mytable
spec:
  database: mydb
  name: mytable
  schema:
    postgres:
      columns:
      - name: mycolumn
        type: text

When column definition changes:

      columns:
      - name: mycolumn
        type: text
        default: 'some default value'
        constraints:
          notNull: true

The apply will fail with the following error:

Executing query "alter table \"mytable\" alter column \"mycolumn\" set default 'some default value', alter column \"mycolumn\" set not null"
Error: pq: column "mycolumn" contains null values

Provided default value should be used to populate missing data. This was reproduced in Postgres, but probably works in MySQL.

A misconfigured connection string in a Database custom resource is hard to debug

Given the following database resource:

apiVersion: databases.schemahero.io/v1alpha1
kind: Database
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: my-database
connection:
  postgres:
    uri:
      value: postgres://user:password@host:5432/database?sslmode=disable

If there is a problem connecting to "host", the created container shows usage and exits. This should clearly indicate the problem with a better message.

Support creating indexes that aren't PK

Support check constraints

Update integration test framework to run on github actions

Currently the integration tests are slow and run on a closed system (buildkite). Should we consider moving these to github actions so that it's more transparent?

pod names should indicate that they are related to schemahero

Schemahero creates pods with names based on the migration, which can make it difficult to tell that these pods are related to schemahero. Prepending schemahero-migration/ or similar would make this much easier.

Add support for non-unique index creation in fixtures

It was added for unique in #74 but also needs non-unique

Finish migration to go modules

Right now, there's both a Gopkg and a go.mod in this project. It would be good to complete the migration to modules.

When schemahero restarts, it attempts to update all migrations, and fails

It will fail if a migration with the hashed name already exists

kubectl schemahero get tables always returns "Current"

I think this should show the # of pending migrations

Make debugging and troubleshooting schema migrations easier

This is a working proposal to increase transparency and make debugging a broken migration easier.

Problem 1: When a new table is deployed using kubectl apply, it's not currently possible to use kubectl to determine if the table exists in the running schema, if it's still pending, or if it's failing to be applied due to an error.

Solution: The Status field of a Table object can contain the last successfully applied schema. Using kubectl describe <table>, we can expose both the desired schema and the running schema. This could be either a SHA of the schema or a Sequence number. The idea would be that if the cluster manager runs kubectl describe table and sees DesiredSchemaSequence: 14 and AppliedSchemaSequence: 13, they would know that there is a migration missing.

Additionally, we should show the last applied SQL and results somewhere?

Problem 2: There should be a way to dry run apply to determine what change will be applied to the database. What if there was an optional two phase deployment process where kubectl apply -f of a Table object would create a pending schema migration that contains the sql statement. Approving this (somehow?) would apply it to the schema, and updating the AppliedSchemaSequence on the table object.

SchemaHero Manager should be able to enforce an alternate schemahero image

Currently the default for databases is to use schemahero/schemahero image. The database CRD spec includes a spec.schemahero.image field that can be applied to change the image used for plans and applies.

The manager pod should also support an environment variable that will automatically apply a different image name to all databases. This would be useful in an airgapped or secure environment that doesn't have the ability to pull from the internet.

Show additional columns in database object

Currently when running kubectl get datatabases the output is:

NAME     AGE
testdb   93m

It would be great to show some additional columns:

- [] # pending migrations
- [] # is set to `immediateDeploy`
- [] # is in an error state

Consider making SchemaHero also run as a Client side kubectl plugin

DB Driver should detect version of database

There are some differences between (say) pg 9 and 10 (new types, small syntax changes). This should be transparant to the user creating the table/migration, but instead just an implementation detail of schemahero

Add support for ForeignKeys

Pod Sandbox changed, it will be killed and restarted

When deploying a plan, the plan pod often pauses and when running describe pod, it shows:

Events:
  Type    Reason          Age        From               Message
  ----    ------          ----       ----               -------
  Normal  Scheduled       <unknown>  default-scheduler  Successfully assigned default/users-plan to marc
  Normal  Pulling         82s        kubelet, marc      Pulling image "localhost:32000/schemahero/schemahero"
  Normal  Pulled          81s        kubelet, marc      Successfully pulled image "localhost:32000/schemahero/schemahero"
  Normal  Created         81s        kubelet, marc      Created container users
  Normal  Started         81s        kubelet, marc      Started container users
  Normal  SandboxChanged  80s        kubelet, marc      Pod sandbox changed, it will be killed and re-created.

This resumes after a couple of minutes and completes, but what's causing this?

Add support for charset in mysql

When creating or editing a table, setting the charset of the table should be exposed in the schemahero schema.

Perhaps the following yaml could be a model:

apiVersion: schemas.schemahero.io/v1alpha3
kind: Table
metadata:
  name: a
spec:
  database: d
  name: a
  schema:
    mysql:
      defaultCharset: utf-8
      primaryKey:
      - id
      columns:
      - name: id
        type: varchar(255)
      - name: name
        type: varchar(255)

CREATE TABLE `a` (
  `id` varchar(255) NOT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Invalid alter column statements created

On a new deploy with :alpha:

Executing query "alter table \"table_name\" alter column \"column_name\""

{"level":"error","ts":1558435102.2899368,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"table-controller","request":"default/user-feature","error":"pq: syntax error at end of input","stacktrace":"github.com/schemahero/schemahero/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/travis/gopath/src/github.com/schemahero/schemahero/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/schemahero/schemahero/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/travis/gopath/src/github.com/schemahero/schemahero/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\ngithub.com/schemahero/schemahero/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/home/travis/gopath/src/github.com/schemahero/schemahero/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/schemahero/schemahero/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/travis/gopath/src/github.com/schemahero/schemahero/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/schemahero/schemahero/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/travis/gopath/src/github.com/schemahero/schemahero/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/schemahero/schemahero/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/home/travis/gopath/src/github.com/schemahero/schemahero/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Add Vault support

It would be nice to be able to have SchemaHero use Vault to manage credentials, rather than a static URI in a k8s secret.

Vault has some native functionality: https://www.hashicorp.com/blog/dynamic-database-credentials-with-vault-and-kubernetes/#injecting-secrets-into-kubernetes-deployments
to inject credentials into a k8s Deployment, but this would also create a dependency on a pretty recent version of Vault (1.3.1+: https://github.com/hashicorp/vault-k8s#features)

To go down the Vault route also means surfacing some annotations to be passed through to the Deployment that's created when a new Database object is created.

Perhaps these annotations could be added to the existing Database CRD, and setting them would trigger the conditional logic to use Vault for auth?

I was also wondering whether this should attempt to be a more flexible "alternative auth method" solution, or avoid premature optimisation and just keep it Vault-specific for now.

Add support for enum in mysql

Add support for array types in postgres

Please see https://www.postgresql.org/docs/10/arrays.html for specifics.

Log output contains lots of "not a valid metric name" errors

E0625 01:24:35.043057       1 client_go_adapter.go:318] descriptor Desc{fqName: "database-controller_depth", help: "Current depth of workqueue: database-controller", constLabels: {}, variableLabels: []} is invalid: "database-controller_depth" is not a valid metric name
E0625 01:24:35.044463       1 client_go_adapter.go:328] descriptor Desc{fqName: "database-controller_adds", help: "Total number of adds handled by workqueue: database-controller", constLabels: {}, variableLabels: []} is invalid: "database-controller_adds" is not a valid metric name
E0625 01:24:35.129624       1 client_go_adapter.go:339] descriptor Desc{fqName: "database-controller_queue_latency", help: "How long an item stays in workqueuedatabase-controller before being requested.", constLabels: {}, variableLabels: []} is invalid: "database-controller_queue_latency" is not a valid metric name
E0625 01:24:35.140468       1 client_go_adapter.go:350] descriptor Desc{fqName: "database-controller_work_duration", help: "How long processing an item from workqueuedatabase-controller takes.", constLabels: {}, variableLabels: []} is invalid: "database-controller_work_duration" is not a valid metric name
E0625 01:24:35.141685       1 client_go_adapter.go:363] descriptor Desc{fqName: "database-controller_unfinished_work_seconds", help: "How many seconds of work database-controller has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.", constLabels: {}, variableLabels: []} is invalid: "database-controller_unfinished_work_seconds" is not a valid metric name
E0625 01:24:35.142748       1 client_go_adapter.go:374] descriptor Desc{fqName: "database-controller_longest_running_processor_microseconds", help: "How many microseconds has the longest running processor for database-controller been running.", constLabels: {}, variableLabels: []} is invalid: "database-controller_longest_running_processor_microseconds" is not a valid metric name
E0625 01:24:35.143851       1 client_go_adapter.go:384] descriptor Desc{fqName: "database-controller_retries", help: "Total number of retries handled by workqueue: database-controller", constLabels: {}, variableLabels: []} is invalid: "database-controller_retries" is not a valid metric name
{"level":"info","ts":1561425875.1449182,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"database-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1561425875.1451004,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"database-controller","source":"kind source: /, Kind="}
E0625 01:24:35.145210       1 client_go_adapter.go:318] descriptor Desc{fqName: "migration-controller_depth", help: "Current depth of workqueue: migration-controller", constLabels: {}, variableLabels: []} is invalid: "migration-controller_depth" is not a valid metric name
E0625 01:24:35.145982       1 client_go_adapter.go:328] descriptor Desc{fqName: "migration-controller_adds", help: "Total number of adds handled by workqueue: migration-controller", constLabels: {}, variableLabels: []} is invalid: "migration-controller_adds" is not a valid metric name
E0625 01:24:35.147120       1 client_go_adapter.go:339] descriptor Desc{fqName: "migration-controller_queue_latency", help: "How long an item stays in workqueuemigration-controller before being requested.", constLabels: {}, variableLabels: []} is invalid: "migration-controller_queue_latency" is not a valid metric name
E0625 01:24:35.148151       1 client_go_adapter.go:350] descriptor Desc{fqName: "migration-controller_work_duration", help: "How long processing an item from workqueuemigration-controller takes.", constLabels: {}, variableLabels: []} is invalid: "migration-controller_work_duration" is not a valid metric name
E0625 01:24:35.149242       1 client_go_adapter.go:363] descriptor Desc{fqName: "migration-controller_unfinished_work_seconds", help: "How many seconds of work migration-controller has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.", constLabels: {}, variableLabels: []} is invalid: "migration-controller_unfinished_work_seconds" is not a valid metric name
E0625 01:24:35.150332       1 client_go_adapter.go:374] descriptor Desc{fqName: "migration-controller_longest_running_processor_microseconds", help: "How many microseconds has the longest running processor for migration-controller been running.", constLabels: {}, variableLabels: []} is invalid: "migration-controller_longest_running_processor_microseconds" is not a valid metric name
E0625 01:24:35.151400       1 client_go_adapter.go:384] descriptor Desc{fqName: "migration-controller_retries", help: "Total number of retries handled by workqueue: migration-controller", constLabels: {}, variableLabels: []} is invalid: "migration-controller_retries" is not a valid metric name
{"level":"info","ts":1561425875.1524668,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"migration-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1561425875.1526852,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"migration-controller","source":"kind source: /, Kind="}
E0625 01:24:35.152815       1 client_go_adapter.go:318] descriptor Desc{fqName: "table-controller_depth", help: "Current depth of workqueue: table-controller", constLabels: {}, variableLabels: []} is invalid: "table-controller_depth" is not a valid metric name
E0625 01:24:35.153545       1 client_go_adapter.go:328] descriptor Desc{fqName: "table-controller_adds", help: "Total number of adds handled by workqueue: table-controller", constLabels: {}, variableLabels: []} is invalid: "table-controller_adds" is not a valid metric name
E0625 01:24:35.154679       1 client_go_adapter.go:339] descriptor Desc{fqName: "table-controller_queue_latency", help: "How long an item stays in workqueuetable-controller before being requested.", constLabels: {}, variableLabels: []} is invalid: "table-controller_queue_latency" is not a valid metric name
E0625 01:24:35.155757       1 client_go_adapter.go:350] descriptor Desc{fqName: "table-controller_work_duration", help: "How long processing an item from workqueuetable-controller takes.", constLabels: {}, variableLabels: []} is invalid: "table-controller_work_duration" is not a valid metric name
E0625 01:24:35.156806       1 client_go_adapter.go:363] descriptor Desc{fqName: "table-controller_unfinished_work_seconds", help: "How many seconds of work table-controller has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.", constLabels: {}, variableLabels: []} is invalid: "table-controller_unfinished_work_seconds" is not a valid metric name
E0625 01:24:35.157873       1 client_go_adapter.go:374] descriptor Desc{fqName: "table-controller_longest_running_processor_microseconds", help: "How many microseconds has the longest running processor for table-controller been running.", constLabels: {}, variableLabels: []} is invalid: "table-controller_longest_running_processor_microseconds" is not a valid metric name
E0625 01:24:35.158963       1 client_go_adapter.go:384] descriptor Desc{fqName: "table-controller_retries", help: "Total number of retries handled by workqueue: table-controller", constLabels: {}, variableLabels: []} is invalid: "table-controller_retries" is not a valid metric name

Add support for unique fixture generation in mysql

It was added to postgres in #74, but needs mysql support also

Create integration test to validate initial installation experience

The initial installation experience is often deploying a database resource and many table resources. Because the order of these is non-deterministic, schemahero should requeue tables until the database is ready. An integration test validating that this happens, without producing unexpected log errors is needed.