deis / etcd Goto Github PK
View Code? Open in Web Editor NEWEtcd cluster for Deis v2
Home Page: http://deis.io
License: MIT License
Etcd cluster for Deis v2
Home Page: http://deis.io
License: MIT License
In the event of a catastrophic etcd cluster failure, etcd should be able to restart itself and initialize into a previous known-good state.
Cluster failure happens when all of the nodes on an etcd cluster are terminated.
Currently, when a cluster failures, the first node to recover will re-initialize the discovery process with the etcd-dicsover service. But it will not recover the data.
What we want is for one or more nodes in a cluster to ship WAL logs to a known location (and maybe full backups as well) at periodic intervals. Then, when a cluster fails, it should grab the last successful backup and import the data from that file.
I believe that the best way to accomplish this will be to use etcd's snapshot backup/restore system. https://github.com/coreos/etcd/blob/master/Documentation/04_to_2_snapshot_migration.md
><> make setup
glide up --import --delete-flatten
Incorrect Usage.
Running glide up
works fine, but just posting this to see if there's something in those two flags that should be reflected in a PR.
><> make build
go build -o rootfs/bin/boot -a -installsuffix cgo -ldflags "-s -X main.version=0.0.1-20151105131812" boot.go
vendor/github.com/deis/pkg/etcd/members.go:10:2: cannot find package "github.com/deis/deis/pkg/k8s" in any of:
/home/bacongobbler/go/src/github.com/deis/etcd/vendor/github.com/deis/deis/pkg/k8s (vendor tree)
/usr/local/go/src/github.com/deis/deis/pkg/k8s (from $GOROOT)
/home/bacongobbler/go/src/github.com/deis/deis/pkg/k8s (from $GOPATH)
Makefile:33: recipe for target 'build' failed
make: *** [build] Error 1
Looks like we're missing a dep in glide.yaml
This is to lower the barrier to entry for hacking on Deis while also standardizing the development environment for all contributors.
Should it be possible to run one? I tried editing the deis/deis-dev chart to use just one replica and a cluster size of 1, but anything that depended on etcd still went into a CrashLoopBackOff waiting for the etcd cluster to come up.
To prove out the deis v2 integration we need to cut a release of deis/etcd that:
The implementation for alpha does not need:
We should change this so it works like the other make files. Instead of using perl to replace the image in the original file we should make tmp files and use those to create the rcs.
The RC doesn't do anything to ensure the peers are scheduled to different k8s nodes. I'm not sure k8s even has semantics that permit that. This could be a _big _ problem because if a majority of the etcd pods get assigned to a single k8s node and that node dies (or even just becomes unreachable due to network issues) then quorum is lost. This scenario is not only possible, but it's also not at all unlikely.
@rimusz and @jchauncey have both seen errors like this:
$ kubectl --namespace=deis logs deis-etcd-1-i3ywx
[error] 2015/12/18 20:44:34 Failed aboutme.FromEnv: Get https://10.100.0.1:443/api/v1/namespaces/deis/pods/deis-etcd-1-i3ywx: EOF
[warning] 2015/12/18 20:44:34 Attempting to recover.
[warning] 2015/12/18 20:44:34 No IP found by API query.
What is happening here is that the pod is contacting the Kube API server to get information about itself. The server appears to be returning an empty response. That surprises me, though a failed SSL negotiation would not have.
Really, what we need to do is get into that pod and fire off a request to API to find out what is going on.
The Beta requirements specify that etcd should provide:
Now that #11 was merged, some problems are apparent that were nearly impossible to test before (since this Worked On My Machine™ and the .travis.yml deploy logic won't run on PRs). Basically there's no docker-machine
on Travis, so that assumption needs to be fixed:
make: Entering directory `/home/travis/gopath/src/github.com/deis/etcd'
go build -o rootfs/usr/local/bin/boot -a -installsuffix cgo -ldflags "-s -X main.version=v2-alpha" boot.go
make: docker-machine: Command not found
vendor/github.com/prometheus/client_golang/prometheus/registry.go:36:2: cannot find package "bitbucket.org/ww/goautoneg" in any of:
/home/travis/gopath/src/github.com/deis/etcd/vendor/bitbucket.org/ww/goautoneg (vendor tree)
/home/travis/.gimme/versions/go1.5.1.linux.amd64/src/bitbucket.org/ww/goautoneg (from $GOROOT)
/home/travis/gopath/src/bitbucket.org/ww/goautoneg (from $GOPATH)
make: *** [build] Error 1
make: Leaving directory `/home/travis/gopath/src/github.com/deis/etcd'
Without any persistent storage, will the whole etcd fails if the discovery pod goes be destroyed? Or by re-creating another pod (automatically by the discovery-rc), The cluster would come back to normal state? Is the state stored on the discovery critical? Or will it be recovered automatically by the re-create?
As we all well know from v1.x, etcd is a pain point. This is a somewhat radical idea, but do we really need it in v2?
Yes... of course we need the etcd that's under k8s, but my question pertains pretty specifically to this deis/etcd that we intend to run within k8s as a foundational component of v2.
Fundamentally, we use etcd for two things in v1.x-- service discovery and configuration. If k8s has service discovery licked (especially with the dns add-on in play) and Deis already has a database where we could store more platform configuration (than we do today) and already has an API which can be extended to handle those things (and can scale!) then maybe we don't need etcd after all...
Obviously, I haven't thought through every implication of this idea yet, but I think it may not me entirely crazy to entertain.
Hypothetically, it could work something like this:
The biggest problem I can anticipate is that I think (but I'm not positive) that deis/postgresql needs etcd to facilitate leader election and failover. Could there be another way to accomplish that? Or alternatively, maybe we could use etcd for that purpose and only that-- just to get the DB going and then let the DB and API take it from there?
It's late, so excuse me if any of this is totally nuts... it stems pretty directly from how much more I enjoy interacting with k8s API than I do futzing with etcd-- and that's left me wishing our API and DB would handle more of the burden of maintaining and storing platform state... and it could mean one less point of failure and one less thing to have to back up.
I don't see anything particularly useful when I k describe pod <pod>
:
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
29s 29s 1 {kubelet 172.17.8.100} implicitly required container POD Pulled Container image "gcr.io/google_containers/pause:0.8.0" already present on machine
29s 29s 1 {kubelet 172.17.8.100} implicitly required container POD Created Created with docker id abc8d779455d
29s 29s 1 {kubelet 172.17.8.100} implicitly required container POD Started Started with docker id abc8d779455d
29s 29s 1 {scheduler } Scheduled Successfully assigned deis-etcd-1-d8wj6 to 172.17.8.100
27s 27s 1 {kubelet 172.17.8.100} spec.containers{deis-etcd-1} Created Created with docker id e6a70c4341c1
27s 27s 1 {kubelet 172.17.8.100} spec.containers{deis-etcd-1} Started Started with docker id e6a70c4341c1
27s 18s 2 {kubelet 172.17.8.100} spec.containers{deis-etcd-1} Pulled Successfully pulled image "quay.io/deisci/etcd:v2-alpha"
18s 18s 1 {kubelet 172.17.8.100} spec.containers{deis-etcd-1} Created Created with docker id 09911910ffc9
18s 18s 1 {kubelet 172.17.8.100} spec.containers{deis-etcd-1} Started Started with docker id 09911910ffc9
10s 10s 1 {kubelet 172.17.8.100} spec.containers{deis-etcd-1} Backoff Back-off restarting failed docker container
29s 0s 3 {kubelet 172.17.8.100} spec.containers{deis-etcd-1} Pulling Pulling image "quay.io/deisci/etcd:v2-alpha"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.