This Terraform module installs DC/OS on Linode Cloud infrastructure using the Universal Installer.
This is a Work-In-Progress and may not be very usable.
See the examples/ directory for instructions on running this and a demonstrative workload.
[WORK-IN-PROGRESS] DC/OS Provisioning Terraform module for Linode
Home Page: https://registry.terraform.io/modules/linode/dcos/linode/
This Terraform module installs DC/OS on Linode Cloud infrastructure using the Universal Installer.
This is a Work-In-Progress and may not be very usable.
See the examples/ directory for instructions on running this and a demonstrative workload.
There are references to the private IP of the bootstrap node in most of the errors. The bootstrap node itself seems clean of warnings :-/
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Clearing proxy environment variables
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] No zk.pid last mtime found at /var/lib/dcos/bootstrap/exhibitor_pid_stat
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Shortcut failed, waiting for exhibitor to bring up zookeeper and stabilize
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Expected cluster size: 3
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Waiting for ZooKeeper cluster to stabilize
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: [INFO] Serving hosts: `192.168.175.169`, leader: `192.168.175.169`
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: Traceback (most recent call last):
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: File "/opt/mesosphere/bin/bootstrap", line 11, in <module>
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: load_entry_point('dcos-internal-utils==0.0.1', 'console_scripts', 'bootstrap')()
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 106, in main
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: exhibitor.wait(opts.master_count)
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/exhibitor.py", line 113, in wait
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: raise Exception(msg_fmt.format(cluster_size, len(serving), len(leaders)))
Jan 09 21:49:44 linode-dcos-master-00 bootstrap[25577]: Exception: Expected 3 servers and 1 leader, got 1 servers and 1 leaders
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-net.service: Control process exited, code=exited status=1
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-net.service: Failed with result 'exit-code'.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Failed to start DC/OS Net: A distributed systems & network overlay orchestration engine.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped DC/OS Authentication (OAuth): authenticates DC/OS users using OpenID Connect and Auth0.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting DC/OS Authentication (OAuth): authenticates DC/OS users using OpenID Connect and Auth0...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting Generate resolv.conf: configures network name resolution...
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Clearing proxy environment variables
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] No zk.pid last mtime found at /var/lib/dcos/bootstrap/exhibitor_pid_stat
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Shortcut failed, waiting for exhibitor to bring up zookeeper and stabilize
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Expected cluster size: 3
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Waiting for ZooKeeper cluster to stabilize
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: [INFO] Serving hosts: `192.168.175.169`, leader: `192.168.175.169`
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: Traceback (most recent call last):
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: File "/opt/mesosphere/bin/bootstrap", line 11, in <module>
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: load_entry_point('dcos-internal-utils==0.0.1', 'console_scripts', 'bootstrap')()
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 106, in main
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: exhibitor.wait(opts.master_count)
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/exhibitor.py", line 113, in wait
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: raise Exception(msg_fmt.format(cluster_size, len(serving), len(leaders)))
Jan 09 21:49:45 linode-dcos-master-00 bootstrap[25586]: Exception: Expected 3 servers and 1 leader, got 1 servers and 1 leaders
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-diagnostics.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-diagnostics.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-metrics-master.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-metrics-master.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-telegraf.service: Service hold-off time over, scheduling restart.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-telegraf.service: Scheduled restart job, restart counter is at 233.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped Telegraf: collects and reports metrics.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting Telegraf: collects and reports metrics...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped DC/OS Metrics Master: exposes node metrics.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting DC/OS Metrics Master: exposes node metrics...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Stopped DC/OS Diagnostics Master: aggregates and exposes component health.
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: Starting DC/OS Diagnostics Master: aggregates and exposes component health...
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Control process exited, code=exited status=1
Jan 09 21:49:45 linode-dcos-master-00 systemd[1]: dcos-oauth.service: Failed with result 'exit-code'.
Jan 09 21:53:33 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Main process exited, code=exited, status=1/FAILURE
Jan 09 21:53:33 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Failed with result 'exit-code'.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Scheduled restart job, restart counter is at 295.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Stopped Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Starting Mesos Agent Public: distributed systems kernel public agent...
Jan 09 21:53:35 linode-dcos-public-agent-00 mesos-agent[11227]: ping: unknown host ready.spartan
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Control process exited, code=exited status=2
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Failed with result 'exit-code'.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Failed to start Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Started OpenSSH per-connection server daemon (172.104.2.4:56881).
Jan 09 21:53:35 linode-dcos-public-agent-00 sshd[11229]: Accepted publickey for core from 172.104.2.4 port 56881 ssh2: RSA SHA256:EJVZYwd79ydCK/ezDALjkz4co1ofNsj9+wEPJrWNqgY
Jan 09 21:53:35 linode-dcos-public-agent-00 sshd[11229]: pam_unix(sshd:session): session opened for user core by (uid=0)
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd-logind[796]: New session 7 of user core.
Jan 09 21:53:35 linode-dcos-public-agent-00 systemd[1]: Started Session 7 of user core.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Scheduled restart job, restart counter is at 294.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-net.service: Scheduled restart job, restart counter is at 185.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped DC/OS Net: A distributed systems & network overlay orchestration engine.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting DC/OS Net: A distributed systems & network overlay orchestration engine...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped Admin Router Agent: exposes a unified control plane proxy for components and services using NGINX.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting Admin Router Agent: exposes a unified control plane proxy for components and services using NGINX...
Jan 09 21:53:38 linode-dcos-public-agent-00 check-time[11243]: Checking whether time is synchronized using the kernel adjtimex API.
Jan 09 21:53:38 linode-dcos-public-agent-00 check-time[11243]: Time can be synchronized via most popular mechanisms (ntpd, chrony, systemd-timesyncd, etc.)
Jan 09 21:53:38 linode-dcos-public-agent-00 check-time[11243]: Time is in sync!
Jan 09 21:53:38 linode-dcos-public-agent-00 ping[11244]: ping: unknown host ready.spartan
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Control process exited, code=exited status=2
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: dcos-adminrouter-agent.service: Failed with result 'exit-code'.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Failed to start Admin Router Agent: exposes a unified control plane proxy for components and services using NGINX.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped Wait for Network to be Configured.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopping Wait for Network to be Configured...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopping Network Service...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Stopped Network Service.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting Network Service...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: spartan: Gained IPv6LL
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: minuteman: Gained IPv6LL
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: eth0: Gained IPv6LL
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: Enumeration completed
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Started Network Service.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Starting Wait for Network to be Configured...
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd-wait-online[11272]: ignoring: lo
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: spartan: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: lo: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: docker0: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: minuteman: Link is not managed by us
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd[1]: Started Wait for Network to be Configured.
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: lo: Configured
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: eth0: DHCPv4 address 45.79.184.248/24 via 45.79.184.1
Jan 09 21:53:38 linode-dcos-public-agent-00 systemd-networkd[11271]: eth0: Configured
Jan 09 21:53:38 linode-dcos-public-agent-00 dcos-net-setup.py[11275]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11282]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11290]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11293]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: [INFO] Unlocked fd 4
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: [INFO] Closing /var/lib/dcos with fd 4
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: Traceback (most recent call last):
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/bin/bootstrap", line 11, in <module>
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: load_entry_point('dcos-internal-utils==0.0.1', 'console_scripts', 'bootstrap')()
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 116, in main
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: bootstrappers[service](b, opts)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 23, in wrapper
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: fun(b, opts)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/cli.py", line 54, in dcos_telegraf_agent
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: b.cluster_id('/var/lib/dcos/cluster-id', readonly=True)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/bootstrap.py", line 66, in cluster_id
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: zkid = self._consensus('/cluster-id', zkid, ANYONE_READ)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/bootstrap.py", line 105, in _consensus
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: self.zk.sync(path)
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/lib/python3.6/site-packages/dcos_internal_utils/bootstrap.py", line 41, in zk
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: self._zk.start()
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: File "/opt/mesosphere/lib/python3.6/site-packages/kazoo/client.py", line 567, in start
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: raise self.handler.timeout_exception("Connection time-out")
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[10603]: kazoo.handlers.threading.KazooTimeoutError: Connection time-out
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [INFO] Locked fd 4
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-1.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-2.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-3.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-4.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 bootstrap[11019]: [WARNING] Cannot resolve zk-5.zk: [Errno -2] Name or service not known
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11297]: RTNETLINK answers: File exists
Jan 09 21:53:39 linode-dcos-public-agent-00 systemd[1]: dcos-telegraf.service: Control process exited, code=exited status=1
Jan 09 21:53:39 linode-dcos-public-agent-00 systemd[1]: dcos-telegraf.service: Failed with result 'exit-code'.
Jan 09 21:53:39 linode-dcos-public-agent-00 systemd[1]: Failed to start Telegraf: collects and reports metrics.
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11303]: net.ipv6.conf.spartan.disable_ipv6 = 0
Jan 09 21:53:39 linode-dcos-public-agent-00 dcos-net-setup.py[11307]: RTNETLINK answers: File exists
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Service hold-off time over, scheduling restart.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Scheduled restart job, restart counter is at 296.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: Stopped Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: Starting Mesos Agent Public: distributed systems kernel public agent...
Jan 09 21:53:40 linode-dcos-public-agent-00 mesos-agent[11315]: ping: unknown host ready.spartan
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Control process exited, code=exited status=2
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: dcos-mesos-slave-public.service: Failed with result 'exit-code'.
Jan 09 21:53:40 linode-dcos-public-agent-00 systemd[1]: Failed to start Mesos Agent Public: distributed systems kernel public agent.
Jan 09 21:53:40 linode-dcos-public-agent-00 bootstrap[11311]: [INFO] Clearing proxy environment variables
Jan 09 21:53:40 linode-dcos-public-agent-00 bootstrap[11311]: [DEBUG] bootstrapping dcos-net
and so on..
The bootstrap node is not necessary after the master and agents are provisioned. Instructions should let a user know to set the bootstrap count to 0.
When increasing the node count, the bootstrap count should be increased again.
Is there important state that only resides on the bootstrap node? maybe removing the bootstrap node doesn't make sense.
the system only seems to be using 300M of RAM
The universal-installer branch attempts to use the dcos/core/template Terraform module:
This branch currently fails to provision:
https://gist.github.com/displague/6c46dcdfa2cf982cb0c2dba8ab1bf1fb (log)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.