Giter Site home page Giter Site logo

katosys / kato Goto Github PK

View Code? Open in Web Editor NEW
33.0 33.0 4.0 1.8 MB

The magic is underneath.

Home Page: http://kato.one

License: Apache License 2.0

Go 100.00%
alertmanager aws cadvisor cluster confd containers coreos dvdcli etcd go kingpin marathon marathon-lb mesos mesos-dns pritunl prometheus rex-ray rkt zookeeper

kato's People

Contributors

bvis avatar h0tbird avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

kato's Issues

Pre-populate trusted quay.io/kato key on disk

core@kato-1 ~ $ sudo rkt trust --prefix=quay.io/kato
pubkey: prefix: "quay.io/kato"
key: "https://quay.io/aci-signing-key"
gpg key fingerprint is: BFF3 13CD AA56 0B16 A898  7B8F 72AB F5F6 799D 33BC
	Quay.io ACI Converter (ACI conversion signing key) <[email protected]>
Are you sure you want to trust this key (yes/no)?
yes
Trusting "https://quay.io/aci-signing-key" for prefix "quay.io/kato" after fingerprint review.
Added key for prefix "quay.io/kato" at "/etc/rkt/trustedkeys/prefix.d/quay.io/kato/bff313cdaa560b16a8987b8f72abf5f6799d33bc"
core@kato-1 ~ $ find /etc/rkt/
/etc/rkt/
/etc/rkt/trustedkeys
/etc/rkt/trustedkeys/prefix.d
/etc/rkt/trustedkeys/prefix.d/quay.io
/etc/rkt/trustedkeys/prefix.d/quay.io/kato
/etc/rkt/trustedkeys/prefix.d/quay.io/kato/bff313cdaa560b16a8987b8f72abf5f6799d33bc
core@kato-1 ~ $ ls -la /etc/rkt/trustedkeys/prefix.d/quay.io/kato/bff313cdaa560b16a8987b8f72abf5f6799d33bc
-rw-r--r--. 1 root rkt-admin 991 Nov  2 09:15 /etc/rkt/trustedkeys/prefix.d/quay.io/kato/bff313cdaa560b16a8987b8f72abf5f6799d33bc
core@kato-1 ~ $ cat /etc/rkt/trustedkeys/prefix.d/quay.io/kato/bff313cdaa560b16a8987b8f72abf5f6799d33bc
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v2

mQENBFTT6doBCACkVncI+t4HASQdnByRlXCYkwjsPqGOlgTCgenop5I6vgTqFWhQ
PMNhtSaFdFECMt2WKQT4QGVbfVOmIH9CLV+Muqvk4iJIAn3Nh3qp/kfMhwjGaS6m
fWN2ARFCq4RIs9tboCNQOouaD5C26/FsQtIsoqyYcdX+YFaU1a+R1kp0fc2CABDI
k6Iq8oEJO+FOYvqQYIJNfd3c0NHICilMu2jO3yIsw80qzWoFAAblyb0zVq/hudWB
4vdVzPmJe1f4Ymk8l1R413bN65LcbCiOax3hmFWovJoxlkL7WoGTTMfaeb2QmaPL
qcu4Q94v1KG87gyxbkIo5uZdvMLdswQI7yQ7ABEBAAG0RFF1YXkuaW8gQUNJIENv
bnZlcnRlciAoQUNJIGNvbnZlcnNpb24gc2lnbmluZyBrZXkpIDxzdXBwb3J0QHF1
YXkuaW8+iQE5BBMBAgAjBQJU0+naAhsDBwsJCAcDAgEGFQgCCQoLBBYCAwECHgEC
F4AACgkQcqv19nmdM7zKzggAjGFqy7Hcx6TCFXn53/inl5iyKrTu8cuF4K547XuZ
12Dt8b6PgJ+b3z6UnMMTd0wXKGcfOmNeQ2R71xmVnviuo7xB5ZkZIBxHI4M/5uhK
I6GZKr84WJS2ec7ssH2ofFQ5u1l+es9jUwW0KbAoNmES0IcdDy28xfmJpkfOn3oI
P2Bzz4rGlIqJXEjq28Wk+qQu64kJRKYuPNXqiHncPDm+i5jMXUUN1D+pkDukp26x
oLbpol42/jIcM3fe2AFZnflittBCHYLIHjJ51NlpSHJZmf2pQZbdyeKElN2SCNe7
nDcol24zYIC+SX0K23w/LrLzlff4mzbO99ePt1bB9zAiVA==
=SBoV
-----END PGP PUBLIC KEY BLOCK-----

Update to ns1-go.v2

The current build is broken:

github.com/bobtfish/go-nsone-api
github.com/katosys/kato/providers/dns/ns1
# github.com/katosys/kato/providers/dns/ns1
providers/dns/ns1/ns1.go:14: imported and not used: "github.com/bobtfish/go-nsone-api" as ns1
providers/dns/ns1/ns1.go:42: undefined: nsone in nsone.New
providers/dns/ns1/ns1.go:55: undefined: nsone in nsone.NewZone
providers/dns/ns1/ns1.go:93: undefined: nsone in nsone.New
providers/dns/ns1/ns1.go:101: undefined: nsone in nsone.NewRecord
providers/dns/ns1/ns1.go:102: undefined: nsone in nsone.Answer
providers/dns/ns1/ns1.go:103: undefined: nsone in nsone.NewAnswer

New --cluster-state flag

The --cluster-state flag can take one argument with two possible values new and existing (default). This flag will be used to shape the configuration of new quorum nodes.
For instance, if the cluster already exists then new quorum nodes should be started with zookeeper and etcd2 stopped and properly templated.

Early tagging leads to fatal error

INFO[0094] New EC2 elb security group                    cmd=ec2:setup id=sg-e5ad8182
INFO[0094] New EC2 quorum security group                 cmd=ec2:setup id=sg-e3ad8184
INFO[0094] New EC2 master security group                 cmd=ec2:setup id=sg-e2ad8185
INFO[0094] New EC2 worker security group                 cmd=ec2:setup id=sg-efad8188
ERRO[0095] InvalidGroup.NotFound: The security group 'sg-efad8188' does not exist
    status code: 400, request id: 796e7a2e-a323-4c2d-a796-1cf1c7119e80  cmd=ec2:setup file=ec2.go func=ec2.(*Data).tag line=1996
FATA[0095] InvalidGroup.NotFound: The security group 'sg-efad8188' does not exist
    status code: 400, request id: 796e7a2e-a323-4c2d-a796-1cf1c7119e80  cmd=ec2:setup file=ec2.go func=ec2.(*Data).setupEC2Firewall line=801
FATA[0095] exit status 1
[0] ~ >> ./kato-ec2 
INFO[0000] Setup the EC2 environment                     cmd=ec2:deploy id=cell-1.dub.xnood.com
INFO[0000] Connecting to region eu-west-1                cmd=ec2:setup
INFO[0000] Latest CoreOS stable AMI located              cmd=ec2:deploy id=ami-b7cba3c4
INFO[0000] New EC2 VPC created                           cmd=ec2:setup id=vpc-c0833aa4
ERRO[0000] InvalidVpcID.NotFound: The vpc ID 'vpc-c0833aa4' does not exist
    status code: 400, request id: 78023e94-814c-4ba2-9482-bf8617c7ab23  cmd=ec2:setup file=ec2.go func=ec2.(*Data).tag line=1996
FATA[0000] InvalidVpcID.NotFound: The vpc ID 'vpc-c0833aa4' does not exist
    status code: 400, request id: 78023e94-814c-4ba2-9482-bf8617c7ab23  cmd=ec2:setup file=ec2.go func=ec2.(*Data).Setup line=330
FATA[0000] exit status 1                                 cmd=ec2:deploy file=ec2.go func=ec2.(*Data).setupEC2 line=421
[0] ~ >> ./kato-ec2
INFO[0000] Setup the EC2 environment                     cmd=ec2:deploy id=cell-1.dub.xnood.com
INFO[0000] Connecting to region eu-west-1                cmd=ec2:setup
INFO[0000] Latest CoreOS stable AMI located              cmd=ec2:deploy id=ami-b7cba3c4
INFO[0000] New EC2 VPC created                           cmd=ec2:setup id=vpc-81833ae5
INFO[0000] Using existing DNS zone                       cmd=ns1:zone:add id=int.cell-1.dub.xnood.com
INFO[0000] Using existing DNS zone                       cmd=ns1:zone:add id=ext.cell-1.dub.xnood.com
INFO[0000] New main route table added                    cmd=ec2:setup id=rtb-a6bfc1c2
INFO[0001] New etcd bootstrap token requested            cmd=ec2:deploy id=5813b54775cb6091e71532cd06d4bc79
INFO[0001] New external subnet                           cmd=ec2:setup id=subnet-a1b82cd7
INFO[0000] Using existing DNS zone                       cmd=ns1:zone:add id=cell-1.dub.xnood.com
INFO[0001] New internal subnet                           cmd=ec2:setup id=subnet-aeb82cd8
ERRO[0001] InvalidSubnetID.NotFound: The subnet ID 'subnet-aeb82cd8' does not exist
    status code: 400, request id: 685a700e-2415-4ad9-aa36-5617763b7eec  cmd=ec2:setup file=ec2.go func=ec2.(*Data).tag line=1996
FATA[0001] InvalidSubnetID.NotFound: The subnet ID 'subnet-aeb82cd8' does not exist
    status code: 400, request id: 685a700e-2415-4ad9-aa36-5617763b7eec  cmd=ec2:setup file=ec2.go func=ec2.(*Data).setupVPCNetwork line=696
FATA[0001] exit status 1                                 cmd=ec2:deploy file=ec2.go func=ec2.(*Data).setupEC2 line=421
[0] ~ >> ./kato-ec2
INFO[0000] Setup the EC2 environment                     cmd=ec2:deploy id=cell-1.dub.xnood.com
INFO[0000] Connecting to region eu-west-1                cmd=ec2:setup
INFO[0000] New EC2 VPC created                           cmd=ec2:setup id=vpc-a78916c3
ERRO[0000] InvalidVpcID.NotFound: The vpc ID 'vpc-a78916c3' does not exist
    status code: 400, request id: b5cc9e97-7217-49a9-bab2-9e7aaf5dac43  cmd=ec2:setup file=ec2.go func=ec2.(*Data).tag line=1978
FATA[0000] InvalidVpcID.NotFound: The vpc ID 'vpc-a78916c3' does not exist
    status code: 400, request id: b5cc9e97-7217-49a9-bab2-9e7aaf5dac43  cmd=ec2:setup file=ec2.go func=ec2.(*Data).Setup line=363
FATA[0000] exit status 1                                 cmd=ec2:deploy file=ec2.go func=ec2.(*Data).setupEC2 line=454
[1] ~ >> INFO[0000] New DNS zone created                          cmd=ns1:zone:add id=int.cell-1.dub.xnood.com
INFO[0000] New DNS zone created                          cmd=ns1:zone:add id=ext.cell-1.dub.xnood.com

Update to REX-Ray 0.6.x

This task needs thecodeteam/libstorage#325 to be released. Find below the changes needed:

diff --git a/udata/fragments.go b/udata/fragments.go
index aa033a0..a62b0cf 100644
--- a/udata/fragments.go
+++ b/udata/fragments.go
@@ -104,8 +104,14 @@ write_files:`,
 {{- if .RexrayStorageDriver }}
    content: |
     rexray:
-     storageDrivers:
-     - {{.RexrayStorageDriver}}
+     logLevel: warn
+    libstorage:
+     embedded: true
+     service: {{.RexrayStorageDriver}}
+     server:
+      services:
+       virtualbox:
+        driver: virtualbox
     virtualbox:
      endpoint: http://` + d.RexrayEndpointIP + `:18083
      volumePath: ` + os.Getenv("HOME") + `/VirtualBox Volumes
@@ -1025,14 +1031,14 @@ coreos:
      Restart=always
      RestartSec=10
      TimeoutStartSec=0
+     KillMode=process
      EnvironmentFile=/etc/rexray/rexray.env
-     ExecStartPre=-/bin/bash -c '\
-       REXRAY_URL=https://emccode.bintray.com/rexray/stable/0.3.3/rexray-Linux-i386-0.3.3.tar.gz; \
-       [ -f /opt/bin/rexray ] || { curl -sL $${REXRAY_URL} | tar -xz -C /opt/bin; }; \
-       [ -x /opt/bin/rexray ] || { chmod +x /opt/bin/rexray; }'
+     Environment=URL=https://dl.bintray.com/emccode/rexray/stable/0.6.0/rexray-Linux-x86_64-0.6.0.tar.gz
+     ExecStartPre=-/bin/bash -c " \
+       [ -f /opt/bin/rexray ] || { curl -sL ${URL} | tar -xz -C /opt/bin; }; \
+       [ -x /opt/bin/rexray ] || { chmod +x /opt/bin/rexray; }"
      ExecStart=/opt/bin/rexray start -f
      ExecReload=/bin/kill -HUP $MAINPID
-     KillMode=process

      [Install]
      WantedBy=kato.target`,

Add multi-az VPC support.

Experiment with multiple availability zones in EC2:

  • 3 quorum nodes 1 zone each.
  • 3 master nodes 1 zone each.
  • n workers equally distributed.

With 3 AZs the system should be able to tolerate one failure.

Zookeeper quorum health check

Script to be used as ExecStartPre which will check whether a healthy quorum of zookeeper servers are up and running. It will retry a few times before giving up.

zdd: waiting for X pids on HAProxy

Sometimes more than 1 HAProxy processes are running inside the same marathon-lb container. The older process should die after being drained but for some reason it doesn't. This prevents zdd to progress. A healthy system looks like the one below:

core@worker-1 ~ $ loopssh worker "docker exec -i marathon-lb ps auxf | grep 'haproxy -p'"
--[ worker-1.cell-1.dub.xnood.com ]--
root      2367  0.0  0.1  40012  7756 ?        Ss   10:18   0:00 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 2278
--[ worker-2.cell-1.dub.xnood.com ]--
root      2531  0.0  0.1  40012  7708 ?        Ss   10:18   0:00 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 2441
--[ worker-3.cell-1.dub.xnood.com ]--
root      4775  0.0  0.1  40012  7772 ?        Ss   10:18   0:00 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 4676

This task is to update marathon-lb to a patched version whenever mesosphere/marathon-lb#267 is fixed and they cut a new release.
Also check mesosphere/marathon-lb#318

Quadruplet custom parser

The quadruplet custom parser is defined but without parsing implementation.
The aim of this task is to implement the parsing logic.

Install prometheus alertmanager in master nodes.

  • Setup and configure the Alertmanager.
  • Configure Prometheus to talk to the Alertmanager with the -alertmanager.url flag.
  • Create alerting rules in Prometheus.

The -alertmanager.url flag accepts a comma separated list of URLs and/or be set multiple times.

Add awscli to worker nodes

I am not sure about masters and edge nodes.
The awscli will be containerised and wrapped into a shell script.

getcerts: temporary failure in name resolution

CoreOS stable (1122.2.0)
Failed Units: 1
  getcerts.service
core@worker-1 ~ $ systemctl status getcerts.service
● getcerts.service - Get certificates from private S3 bucket
   Loaded: loaded (/etc/systemd/system/getcerts.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2016-09-20 09:20:54 UTC; 13min ago
  Process: 3976 ExecStart=/opt/bin/getcerts (code=exited, status=125)
 Main PID: 3976 (code=exited, status=125)

Sep 20 09:20:36 worker-1.cell-1.dub.xnood.com getcerts[3976]: e1c8150b89d0: Retrying in 2 seconds
Sep 20 09:20:37 worker-1.cell-1.dub.xnood.com getcerts[3976]: e1c8150b89d0: Retrying in 1 seconds
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com getcerts[3976]: e1c8150b89d0: Downloading
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com getcerts[3976]: e1c8150b89d0: Downloading
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com getcerts[3976]: docker: dial tcp: lookup auth.docker.io: Temporary failure in name resoluti
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com getcerts[3976]: See 'docker run --help'.
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com systemd[1]: getcerts.service: Main process exited, code=exited, status=125/n/a
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com systemd[1]: Failed to start Get certificates from private S3 bucket.
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com systemd[1]: getcerts.service: Unit entered failed state.
Sep 20 09:20:54 worker-1.cell-1.dub.xnood.com systemd[1]: getcerts.service: Failed with result 'exit-code'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.