deis / etcd Goto Github PK

Etcd cluster for Deis v2

License: MIT License

Makefile 16.26% Go 81.18% Shell 2.56%

etcd's Issues

Backup and Restore for etcd

In the event of a catastrophic etcd cluster failure, etcd should be able to restart itself and initialize into a previous known-good state.

Cluster failure happens when all of the nodes on an etcd cluster are terminated.

Currently, when a cluster failures, the first node to recover will re-initialize the discovery process with the etcd-dicsover service. But it will not recover the data.

What we want is for one or more nodes in a cluster to ship WAL logs to a known location (and maybe full backups as well) at periodic intervals. Then, when a cluster fails, it should grab the last successful backup and import the data from that file.

I believe that the best way to accomplish this will be to use etcd's snapshot backup/restore system. https://github.com/coreos/etcd/blob/master/Documentation/04_to_2_snapshot_migration.md

make setup fails with glide v0.7.0

><> make setup
glide up --import --delete-flatten
Incorrect Usage.

Running glide up works fine, but just posting this to see if there's something in those two flags that should be reflected in a PR.

make build fails :(

><> make build
go build -o rootfs/bin/boot -a -installsuffix cgo -ldflags "-s -X main.version=0.0.1-20151105131812"  boot.go
vendor/github.com/deis/pkg/etcd/members.go:10:2: cannot find package "github.com/deis/deis/pkg/k8s" in any of:
        /home/bacongobbler/go/src/github.com/deis/etcd/vendor/github.com/deis/deis/pkg/k8s (vendor tree)
        /usr/local/go/src/github.com/deis/deis/pkg/k8s (from $GOROOT)
        /home/bacongobbler/go/src/github.com/deis/deis/pkg/k8s (from $GOPATH)
Makefile:33: recipe for target 'build' failed
make: *** [build] Error 1

Looks like we're missing a dep in glide.yaml

Proposal: containerize the development environment

This is to lower the barrier to entry for hacking on Deis while also standardizing the development environment for all contributors.

Single node etcd cluster possible?

Should it be possible to run one? I tried editing the deis/deis-dev chart to use just one replica and a cluster size of 1, but anything that depended on etcd still went into a CrashLoopBackOff waiting for the etcd cluster to come up.

etcd Alpha Requirements

To prove out the deis v2 integration we need to cut a release of deis/etcd that:

Deployable to K8S: Service def, RC def, repository follows standard Deis pattern
Installable by Helm (opt)
Is useable by deis/workflow components
Installation instructions documented in README
Installation linked to alpha setup in deis/workflow

The implementation for alpha does not need:

HA
Backups
Recovery

make kube-rc should use sed instead of perl

We should change this so it works like the other make files. Instead of using perl to replace the image in the original file we should make tmp files and use those to create the rcs.

Might not be as HA as we think

The RC doesn't do anything to ensure the peers are scheduled to different k8s nodes. I'm not sure k8s even has semantics that permit that. This could be a _big _ problem because if a majority of the etcd pods get assigned to a single k8s node and that node dies (or even just becomes unreachable due to network issues) then quorum is lost. This scenario is not only possible, but it's also not at all unlikely.

etcd fails to contact API server on certain clusters.

@rimusz and @jchauncey have both seen errors like this:

$ kubectl --namespace=deis logs deis-etcd-1-i3ywx
[error] 2015/12/18 20:44:34 Failed aboutme.FromEnv: Get https://10.100.0.1:443/api/v1/namespaces/deis/pods/deis-etcd-1-i3ywx: EOF
[warning] 2015/12/18 20:44:34 Attempting to recover.
[warning] 2015/12/18 20:44:34 No IP found by API query.

What is happening here is that the pod is contacting the Kube API server to get information about itself. The server appears to be returning an empty response. That surprises me, though a failed SSL negotiation would not have.

Really, what we need to do is get into that pod and fire off a request to API to find out what is going on.

Metrics

The Beta requirements specify that etcd should provide:

"Metrics: expose operational metrics to platform monitor component"

Deploy failures on Travis CI

Now that #11 was merged, some problems are apparent that were nearly impossible to test before (since this Worked On My Machine™ and the .travis.yml deploy logic won't run on PRs). Basically there's no docker-machine on Travis, so that assumption needs to be fixed:

make: Entering directory `/home/travis/gopath/src/github.com/deis/etcd'
go build -o rootfs/usr/local/bin/boot -a -installsuffix cgo -ldflags "-s -X main.version=v2-alpha" boot.go
make: docker-machine: Command not found
vendor/github.com/prometheus/client_golang/prometheus/registry.go:36:2: cannot find package "bitbucket.org/ww/goautoneg" in any of:
    /home/travis/gopath/src/github.com/deis/etcd/vendor/bitbucket.org/ww/goautoneg (vendor tree)
    /home/travis/.gimme/versions/go1.5.1.linux.amd64/src/bitbucket.org/ww/goautoneg (from $GOROOT)
    /home/travis/gopath/src/bitbucket.org/ww/goautoneg (from $GOPATH)
make: *** [build] Error 1
make: Leaving directory `/home/travis/gopath/src/github.com/deis/etcd'

Question: What will happen if the discovery pod is broken?

Without any persistent storage, will the whole etcd fails if the discovery pod goes be destroyed? Or by re-creating another pod (automatically by the discovery-rc), The cluster would come back to normal state? Is the state stored on the discovery critical? Or will it be recovered automatically by the re-create?

Should etcd be required for Deis Workflow

As we all well know from v1.x, etcd is a pain point. This is a somewhat radical idea, but do we really need it in v2?

Yes... of course we need the etcd that's under k8s, but my question pertains pretty specifically to this deis/etcd that we intend to run within k8s as a foundational component of v2.

Fundamentally, we use etcd for two things in v1.x-- service discovery and configuration. If k8s has service discovery licked (especially with the dns add-on in play) and Deis already has a database where we could store more platform configuration (than we do today) and already has an API which can be extended to handle those things (and can scale!) then maybe we don't need etcd after all...

Obviously, I haven't thought through every implication of this idea yet, but I think it may not me entirely crazy to entertain.

Hypothetically, it could work something like this:

Create secret for db password in k8s namespace "deis"
Create rc for HA deis database (postgres); will use secret above
Create a service for the database
Create a secret for Deis API token
Create rc for deis API; can discover database via DNS; can mount and use secrets containing db credentials and API token
Create a service for the API
Bring up any/all other Deis components; can discover the API via DNS; can mount and use secret containing API token
Bonus: Would be awesome to create go bindings for the Deis API that can be reused by all the other Deis components

The biggest problem I can anticipate is that I think (but I'm not positive) that deis/postgresql needs etcd to facilitate leader election and failover. Could there be another way to accomplish that? Or alternatively, maybe we could use etcd for that purpose and only that-- just to get the DB going and then let the DB and API take it from there?

It's late, so excuse me if any of this is totally nuts... it stems pretty directly from how much more I enjoy interacting with k8s API than I do futzing with etcd-- and that's left me wishing our API and DB would handle more of the burden of maintaining and storing platform state... and it could mean one less point of failure and one less thing to have to back up.

CrashLoopBackOff seen after deploy via helm

I don't see anything particularly useful when I k describe pod <pod>:

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath               Reason      Message
  ─────────   ────────    ───── ────            ─────────────             ──────      ───────
  29s       29s     1   {kubelet 172.17.8.100}  implicitly required container POD   Pulled      Container image "gcr.io/google_containers/pause:0.8.0" already present on machine
  29s       29s     1   {kubelet 172.17.8.100}  implicitly required container POD   Created     Created with docker id abc8d779455d
  29s       29s     1   {kubelet 172.17.8.100}  implicitly required container POD   Started     Started with docker id abc8d779455d
  29s       29s     1   {scheduler }                            Scheduled   Successfully assigned deis-etcd-1-d8wj6 to 172.17.8.100
  27s       27s     1   {kubelet 172.17.8.100}  spec.containers{deis-etcd-1}        Created     Created with docker id e6a70c4341c1
  27s       27s     1   {kubelet 172.17.8.100}  spec.containers{deis-etcd-1}        Started     Started with docker id e6a70c4341c1
  27s       18s     2   {kubelet 172.17.8.100}  spec.containers{deis-etcd-1}        Pulled      Successfully pulled image "quay.io/deisci/etcd:v2-alpha"
  18s       18s     1   {kubelet 172.17.8.100}  spec.containers{deis-etcd-1}        Created     Created with docker id 09911910ffc9
  18s       18s     1   {kubelet 172.17.8.100}  spec.containers{deis-etcd-1}        Started     Started with docker id 09911910ffc9
  10s       10s     1   {kubelet 172.17.8.100}  spec.containers{deis-etcd-1}        Backoff     Back-off restarting failed docker container
  29s       0s      3   {kubelet 172.17.8.100}  spec.containers{deis-etcd-1}        Pulling     Pulling image "quay.io/deisci/etcd:v2-alpha"

deis / etcd Goto Github PK

etcd's Issues

Backup and Restore for etcd

make setup fails with glide v0.7.0

make build fails :(

Proposal: containerize the development environment

Single node etcd cluster possible?

etcd Alpha Requirements

make kube-rc should use sed instead of perl

Might not be as HA as we think

etcd fails to contact API server on certain clusters.

Metrics

Deploy failures on Travis CI

Question: What will happen if the discovery pod is broken?

Should etcd be required for Deis Workflow

CrashLoopBackOff seen after deploy via helm

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent