Giter Site home page Giter Site logo

linode / terraform-linode-dcos Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 4.0 18 KB

[WORK-IN-PROGRESS] DC/OS Provisioning Terraform module for Linode

Home Page: https://registry.terraform.io/modules/linode/dcos/linode/

Shell 7.47% HCL 92.53%
mesosphere dcos linode terraform-module

terraform-linode-dcos's Introduction

DC/OS Linode Installer

This Terraform module installs DC/OS on Linode Cloud infrastructure using the Universal Installer.

This is a Work-In-Progress and may not be very usable.

See the examples/ directory for instructions on running this and a demonstrative workload.

terraform-linode-dcos's People

Contributors

displague avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

terraform-linode-dcos's Issues

journalctl shows failures on nearly every node after during installer.sh

There are references to the private IP of the bootstrap node in most of the errors. The bootstrap node itself seems clean of warnings :-/

Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Clearing proxy environment variables
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] No zk.pid last mtime found at /var/lib/dcos/bootstrap/exhibitor_pid_stat
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Shortcut failed, waiting for exhibitor to bring up zookeeper and stabilize
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Expected cluster size: 3
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Waiting for ZooKeeper cluster to stabilize
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Serving hosts: `192.168.175.169`, leader: `192.168.175.169`
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: Traceback (most recent call last):
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]:   File "/opt/mesosphere/bin/bootstrap", line 11, in <module>
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]:     load_entry_point('dcos-internal-utils==0.0.1', 'console_scripts', 'bootstrap')()
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 106, in main
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]:     exhibitor.wait(opts.master_count)
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/exhibitor.py", line 113, in wait
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]:     raise Exception(msg_fmt.format(cluster_size, len(serving), len(leaders)))
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: Exception: Expected 3 servers and 1 leader, got 1 servers and 1 leaders
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-net.service: Control process exited, code=exited status=1
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-net.service: Failed with result 'exit-code'.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Failed to start DC/OS Net: A distributed systems & network overlay orchestration engine.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped DC/OS Authentication (OAuth): authenticates DC/OS users using OpenID Connect and Auth0.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting DC/OS Authentication (OAuth): authenticates DC/OS users using OpenID Connect and Auth0...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting Generate resolv.conf: configures network name resolution...
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Clearing proxy environment variables
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] No zk.pid last mtime found at /var/lib/dcos/bootstrap/exhibitor_pid_stat
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Shortcut failed, waiting for exhibitor to bring up zookeeper and stabilize
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Expected cluster size: 3
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Waiting for ZooKeeper cluster to stabilize
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Serving hosts: `192.168.175.169`, leader: `192.168.175.169`
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: Traceback (most recent call last):
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]:   File "/opt/mesosphere/bin/bootstrap", line 11, in <module>
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]:     load_entry_point('dcos-internal-utils==0.0.1', 'console_scripts', 'bootstrap')()
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 106, in main
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]:     exhibitor.wait(opts.master_count)
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/exhibitor.py", line 113, in wait
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]:     raise Exception(msg_fmt.format(cluster_size, len(serving), len(leaders)))
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: Exception: Expected 3 servers and 1 leader, got 1 servers and 1 leaders
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-diagnostics.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-diagnostics.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-metrics-master.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-metrics-master.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-telegraf.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-telegraf.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped Telegraf: collects and reports metrics.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting Telegraf: collects and reports metrics...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped DC/OS Metrics Master: exposes node metrics.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting DC/OS Metrics Master: exposes node metrics...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped DC/OS Diagnostics Master: aggregates and exposes component health.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting DC/OS Diagnostics Master: aggregates and exposes component health...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Control process exited, code=exited status=1
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Failed with result 'exit-code'.
Jan 09 21:53:33 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Main process exited, code=exited, status=1/FAILURE
Jan 09 21:53:33 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Failed with result 'exit-code'.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Scheduled restart job, restart counter is at 295.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Stopped Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Starting Mesos Agent Public: distributed systems kernel public agent...
Jan 09 21:53:35 linode-dcos-public-agent-00 mesos-agent[11227]: ping: unknown host ready.spartan
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Control process exited, code=exited status=2
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Failed with result 'exit-code'.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Failed to start Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Started OpenSSH per-connection server daemon (172.104.2.4:56881).
Jan 09 21:53:35 linode-dcos-public-agent-00 sshd[11229]: Accepted publickey for core from 172.104.2.4 port 56881 ssh2: RSA SHA256:EJVZYwd79ydCK/ezDALjkz4co1ofNsj9+wEPJrWNqgY
Jan 09 21:53:35 linode-dcos-public-agent-00 sshd[11229]: pam_unix(sshd:session): session opened for user core by (uid=0)
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd-logind[796]: New session 7 of user core.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Started Session 7 of user core.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Scheduled restart job, restart counter is at 294.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Scheduled restart job, restart counter is at 185.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped DC/OS Net: A distributed systems & network overlay orchestration engine.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting DC/OS Net: A distributed systems & network overlay orchestration engine...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped Admin Router Agent: exposes a unified control plane proxy for components and services using NGINX.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting Admin Router Agent: exposes a unified control plane proxy for components and services using NGINX...
Jan 09 21:53:38 linode-dcos-public-agent-00 check-time[11243]: Checking whether time is synchronized using the kernel adjtimex API.
Jan 09 21:53:38 linode-dcos-public-agent-00 check-time[11243]: Time can be synchronized via most popular mechanisms (ntpd, chrony, systemd-timesyncd, etc.)
Jan 09 21:53:38 linode-dcos-public-agent-00 check-time[11243]: Time is in sync!
Jan 09 21:53:38 linode-dcos-public-agent-00 ping[11244]: ping: unknown host ready.spartan
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Control process exited, code=exited status=2
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Failed with result 'exit-code'.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Failed to start Admin Router Agent: exposes a unified control plane proxy for components and services using NGINX.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped Wait for Network to be Configured.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopping Wait for Network to be Configured...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopping Network Service...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped Network Service.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting Network Service...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: spartan: Gained IPv6LL
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: minuteman: Gained IPv6LL
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: eth0: Gained IPv6LL
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: Enumeration completed
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Started Network Service.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting Wait for Network to be Configured...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd-wait-online[11272]: ignoring: lo
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: spartan: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: lo: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: docker0: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: minuteman: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Started Wait for Network to be Configured.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: lo: Configured
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: eth0: DHCPv4 address 45.79.184.248/24 via 45.79.184.1
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: eth0: Configured
Jan 09 21:53:38 linode-dcos-public-agent-00 dcos-net-setup.py[11275]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11282]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11290]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11293]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: [INFO] Unlocked fd 4
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: [INFO] Closing /var/lib/dcos with fd 4
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: Traceback (most recent call last):
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/bin/bootstrap", line 11, in <module>
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     load_entry_point('dcos-internal-utils==0.0.1', 'console_scripts', 'bootstrap')()
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 116, in main
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     bootstrappers[service](b, opts)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 23, in wrapper
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     fun(b, opts)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 54, in dcos_telegraf_agent
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     b.cluster_id('/var/lib/dcos/cluster-id', readonly=True)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/bootstrap.py", line 66, in cluster_id
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     zkid = self._consensus('/cluster-id', zkid, ANYONE_READ)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/bootstrap.py", line 105, in _consensus
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     self.zk.sync(path)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/bootstrap.py", line 41, in zk
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     self._zk.start()
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:   File "/opt/mesosphere/lib/python3.6/site-packages/kazoo/client.py", line 567, in start
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]:     raise self.handler.timeout_exception("Connection time-out")
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: kazoo.handlers.threading.KazooTimeoutError: Connection time-out
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [INFO] Locked fd 4
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-1.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-2.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-3.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-4.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-5.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11297]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 systemd[1]: dcos-telegraf.service: Control process exited, code=exited status=1
Jan 09 21:53:39 linode-dcos-public-agent-00 systemd[1]: dcos-telegraf.service: Failed with result 'exit-code'.
Jan 09 21:53:39 linode-dcos-public-agent-00 systemd[1]: Failed to start Telegraf: collects and reports metrics.
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11303]: net.ipv6.conf.spartan.disable_ipv6 = 0
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11307]: RTNETLINK answers: File exists
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Scheduled restart job, restart counter is at 296.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: Stopped Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: Starting Mesos Agent Public: distributed systems kernel public agent...
Jan 09 21:53:40 linode-dcos-public-agent-00 mesos-agent[11315]: ping: unknown host ready.spartan
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Control process exited, code=exited status=2
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Failed with result 'exit-code'.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: Failed to start Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:40 linode-dcos-public-agent-00 bootstrap[11311]: [INFO] Clearing proxy environment variables
Jan 09 21:53:40 linode-dcos-public-agent-00 bootstrap[11311]: [DEBUG] bootstrapping dcos-net

and so on..

use a count variable for the bootstrap node so it can be removed/added on demand

The bootstrap node is not necessary after the master and agents are provisioned. Instructions should let a user know to set the bootstrap count to 0.

When increasing the node count, the bootstrap count should be increased again.

Is there important state that only resides on the bootstrap node? maybe removing the bootstrap node doesn't make sense.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.