Giter Site home page Giter Site logo

aws / amazon-ecs-init Goto Github PK

View Code? Open in Web Editor NEW
196.0 56.0 119.0 3.34 MB

Amazon Elastic Container Service RPM

Home Page: http://aws.amazon.com/ecs

License: Apache License 2.0

Go 84.40% Shell 11.12% Makefile 2.42% Roff 1.10% Dockerfile 0.48% Python 0.50%

amazon-ecs-init's Introduction

Amazon Elastic Container Service RPM

Build Status

The Amazon Elastic Container Service RPM is software developed to support the Amazon ECS Container Agent. The Amazon ECS RPM is packaged for RPM-based systems that utilize Upstart as the init system.

Behavior

The upstart script installed by the Amazon ECS RPM runs at the completion of runlevel 3, 4, or 5 as the system starts. The script will clean up any previous copies of the Amazon ECS Container Agent, and then start a new copy. Logs from the RPM are available at /var/log/ecs/ecs-init.log, while logs from the Amazon ECS Container Agent are available at /var/log/ecs/ecs-agent.log. The Amazon ECS RPM makes the Amazon ECS Container Agent introspection endpoint available at http://127.0.0.1:51678/v1. Configuration for the Amazon ECS Container Agent is read from /etc/ecs/ecs.config. All of the configurations in this file are used as environment variables of the ECS Agent container. Additionally, some configurations can be used to configure other properties of the ECS Agent container, as described below.

Configuration Key Example Value(s) Description Default value
ECS_AGENT_LABELS {"test.label.1":"value1","test.label.2":"value2"} The labels to add to the ECS Agent container.

Additionally, the following environment variable(s) can be used to configure the behavior of the RPM:

Environment Variable Name Example Value(s) Description Default value
ECS_SKIP_LOCALHOST_TRAFFIC_FILTER <true | false> By default, the ecs-init service adds an iptable rule to drop non-local packets to localhost if they're not part of an existing forwarded connection or DNAT, and removes the rule upon stop. If ECS_SKIP_LOCALHOST_TRAFFIC_FILTER is set to true, this rule will not be added/removed. false
ECS_ALLOW_OFFHOST_INTROSPECTION_ACCESS <true | false> By default, the ecs-init service adds an iptable rule to block access to ECS Agent's introspection port from off-host (or containers in awsvpc network mode), and removes the rule upon stop. If ECS_ALLOW_OFFHOST_INTROSPECTION_ACCESS is set to true, this rule will not be added/removed. false
ECS_OFFHOST_INTROSPECTION_INTERFACE_NAME eth0 Primary network interface name to be used for blocking offhost agent introspection port access. By default, this value is the interface that handles the default route (0.0.0.0/0) in kernel routing table (/proc/net/route). If none could be found, we fall back to eth0 - (Resolved at runtime)

The above environment variable(s) can be used in the following way

  • On Amazon Linux 1, the flag ECS_SKIP_LOCALHOST_TRAFFIC_FILTER can be turned on by adding env ECS_SKIP_LOCALHOST_TRAFFIC_FILTER=true to /etc/init/ecs.conf.
  • On Amazon Linux 2, the flag ECS_SKIP_LOCALHOST_TRAFFIC_FILTER can be turned on by adding ECS_SKIP_LOCALHOST_TRAFFIC_FILTER=true to /etc/ecs/ecs.config.

Usage

The upstart script installed by the Amazon Elastic Container Service RPM can be started or stopped with the following commands respectively:

  • sudo start ecs
  • sudo stop ecs

Updates

Updates to the Amazon ECS Container Agent should be performed through the Amazon ECS Container Agent. In the case where an update failed and the Amazon ECS Container Agent is no longer functional, a rollback can be initiated as follows:

  1. sudo stop ecs
  2. sudo /usr/libexec/amazon-ecs-init reload-cache
  3. sudo start ecs

Security disclosures

If you think you’ve found a potential security issue, please do not post it in the Issues. Instead, please follow the instructions here or email AWS security directly.

Development

Building the RPM for test

On your local machine, you can use the docker target to generate an rpm:

make rpm-in-docker

This rpm can then be installed in an amazon linux ami:

# send rpm either through s3 or scp
rpm -i rpm-that-you-built.rpm
sudo systemctl enable ecs
sudo systemctl start ecs

Dev dependencies

Run make get-deps to get dependencies for running tests and generating mocks.

Generating mocks

Mocks can be generated using the make generate Makefile target. NOTE that this must be run on a linux machine.

License

The Amazon Elastic Container Service RPM is licensed under the Apache 2.0 License.

amazon-ecs-init's People

Contributors

aaithal avatar adnxn avatar amazon-ecs-bot avatar angelcar avatar chienhanlin avatar cyastella avatar fenxiong avatar fierlion avatar haikuoliu avatar jahkeup avatar jtoberon avatar juanrhenals avatar mssrivas avatar mythri-garaga avatar nightkhaos avatar nmeyerhans avatar petderek avatar realmonia avatar rickard-von-essen avatar rjschwei avatar samuelkarp avatar sharanyad avatar shubham2892 avatar sparrc avatar suneyz avatar ubhattacharjya avatar vsiddharth avatar yhlee-aws avatar yinyic avatar yumex93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-ecs-init's Issues

testing with suse, ubuntu tags

Today we aren't exercising test paths for non amazon linux operating systems.
Running go test -short -v -cover -tags suse ./... and go test -short -v -cover -tags ubuntu ./... results in test failures for TestStartAgentNoEnvFile and TestStartAgentEnvFile.

We should fix these and also add the tagged test runs to our CI.

RHEL 7 no package ecs-init found

Not sure if I am missing a yum repository in my configs but I am getting "no package ecs-init available". I am using RHEL 7.2. Or should I use ecs agent from docker instead?

Docker awslog driver

Hi I was wondering when ecs might support the docker awslog driver as I've had to a few times now log into ecs cluster servers to pull up docker logs. After digging around into the ECS Optimised OS I came across amazon-ecs-init as the script that starts docker and thought this is the best place to start in adding awslog driver.

I've looked at a few solutions and would ideally like to awslog driver rather than launching cloudwatch containers to handle it container logging.

Stopping ECS agent takes ~3-4 minutes

Hello,
I'm running latest MultiContainer Elastic Beanstalk environment.

I'm trying to increase "ECS_CONTAINER_STOP_TIMEOUT".
I have pre-deployment hook, wich modifies "ecs.config" file and restarting ECS agent.

But I faced the issue that command "initctl stop ecs" being executed ~3-4 minutes, which makes my deployment process 2x longer.

Is it possible, to reread configurations without restarting ECS agent, or is there faster and safe way to do that?

Proxy settings are lost on yum update

Hello... upon performing a yum update, the init script at /etc/init/ecs.conf is replaced, and the proxy settings there are lost and even though ECS itself is properly configured at /etc/ecs/ecs.config, the agent itself gets misconfigured.

Here's what my init looks like:

[root@ip-10-204-232-117 ~]# cat /etc/init/ecs.conf
# Copyright 2014-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the
# "License"). You may not use this file except in compliance
# with the License. A copy of the License is located at
#
#     http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and
# limitations under the License.

description "Amazon EC2 Container Service init"
author "Amazon Web Services"
start on stopped rc RUNLEVEL=[345]

pre-start exec /usr/libexec/amazon-ecs-init pre-start
env http_proxy="http://internal-DIT-FIT-Proxy-1762651060.us-east-1.elb.amazonaws.com:3128"
env https_proxy="http://internal-DIT-FIT-Proxy-1762651060.us-east-1.elb.amazonaws.com:3128"
env no_proxy="169.254.169.254,/var/run/docker.sock,127.0.0.1,localhost,10.204.232.0/24"
exec /usr/libexec/amazon-ecs-init start
pre-stop exec /usr/libexec/amazon-ecs-init pre-stop
post-stop exec /usr/libexec/amazon-ecs-init post-stop

ecs-agent's json log file can get really big

Summary

While the log files in /var/log/ecs are rotated, the json file that lives at /var/lib/docker/containers/<sha>/ does not rotate.

Description

Starting the ecs-agent container differently can mitigate this issue via docker run flags.

sudo start and stop ecs hangs

I have ssh'd into a docker optimized linux distribution and tried to run both sudo start ecs and sudo stop ecs.

Both of which never finish processing. I was curious if there are any other necessary steps to be able to run these commands.

Configuration file /usr/lib/systemd/system/ecs.service is marked executable

Summary

There is a warning in /var/log/messages about ecs.service being marked executable.

Description

Lookin through /var/log/messages I see the following warning:

systemd: Configuration file /usr/lib/systemd/system/ecs.service is marked executable. Please remove executable permission bits. Proceeding anyway.

Expected Behavior

ecs.service shouldn't be executable (according to the warning at least)

Observed Behavior

ecs.service is ececutable

Environment Details

Amazon Linux 2 ECS Optimized AMI
ecs-init 1.21.0

ECS Agent Multi-Region Resiliency

In our corporate AWS configuration we have restricted west instances to only be able to talk to west buckets, east instances to be able to only talk to east buckets since the tarballs are only hosted on the east bucket as seen on
https://github.com/aws/amazon-ecs-init/blob/dev/ecs-init/config/release.go#L20
then we cannot use ecs-init to bootstrat our Amazon Linux instances.

In addition, wouldn't it be better for resiliency in case us-east-1 goes down to have the bucket in multiple regions?

Init should retry failed agent downloads

Currently if ecs-init is unable to download the ecs-agent md5 checksum file or the agent image itself, it exits. In order to make ecs-init robust against transient network or S3 issues, it should retry failed download attempts.

Make supervisor more error tolerant

amazon-ecs-init start doesn't tolerate errors in the client or in docker since it simply exits whenever an error occurs instead of retrying:

return engineError("could not start Agent", err)

I have run into docker bugs such us moby/moby#14738 which have resulted in ecs-agent not starting at all in my instances.

I would propose to either:

ECS Agent 1.14.4 not found in Amazon Linux repositories

I can't seem to find the latest ECS agent 1.14.4 in the Amazon Linux repositories:

$ sudo yum check-update
Loaded plugins: priorities, update-motd, upgrade-helper
amzn-main                                                                                        | 2.1 kB  00:00:00     
amzn-updates                                                                                     | 2.3 kB  00:00:00     

$ sudo yum --showduplicates list ecs-init
Loaded plugins: priorities, update-motd, upgrade-helper
Installed Packages
ecs-init.x86_64                                       1.14.3-1.amzn1                                       @amzn-updates
Available Packages
ecs-init.x86_64                                       1.14.1-1.amzn1                                       amzn-main    
ecs-init.x86_64                                       1.14.2-2.amzn1                                       amzn-updates 
ecs-init.x86_64                                       1.14.3-1.amzn1                                       amzn-updates 

Is there something I'm missing here?

Bridge network inaccessible occasionally after ecs-optimized instance launch

I'm using autoscaling to launch ECS instances and occasionally the containers that are launch on the new host will not have any access to the external network. I'm able to resolve the issue by restarting docker and starting the ECS agent again.

Looking in the docker logs I see the message 'WARNING: IPv4 forwarding is disabled.', which pointed me to this docker issue and PR which appears to be fixed. moby/moby#490
moby/moby#1673

However I'm still seeing this occurring on the amzn-ami-2016.09.f-amazon-ecs-optimized AMI.

I'm not sure if this is the right place to post this. Please let me know if there's a better place.

ecs-agent has stopped logging to /var/log/ecs/ecs-agent.log

Not sure what happened by the ecs agent has stopped logging to /var/log/ecs/ecs-agent.log.*. The init log still appears as expected.

$ ls -al /var/log/ecs/
total 16
drwxr-xr-x 2 root root 4096 Sep 20 01:15 .
drwxr-xr-x 6 root root 4096 Sep 20 01:15 ..
-rw-r--r-- 1 root root 5165 Sep 20 01:34 ecs-init.log.2016-09-20-01

My config is as follows.

ECS_CLUSTER=kitchen-ecs-dev-cluster
ECS_AVAILABLE_LOGGING_DRIVERS=["json-file", "awslogs"]
ECS_CHECKPOINT=true
ECS_DATADIR=/etc/ecs/data
ECS_LOGLEVEL=debug
ECS_LOGFILE=/var/log/ecs/ecs-agent.log
ECS_UPDATES_ENABLED=false
ECS_UPDATE_DOWNLOAD_DIR=/etc/ecs/cache
ECS_RESERVED_MEMORY=256
ECS_SELINUX_CAPABLE=true
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=10m
ECS_CONTAINER_STOP_TIMEOUT=10s
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true
HTTP_PROXY=http://my-proxy:8099
HTTPS_PROXY=http://my-proxy:8099
NO_PROXY=169.254.169.254,/var/run/docker.sock

I can still view the logs fine with the docker logs command.

$ docker logs ecs-agent
2016-09-20T01:34:58Z [INFO] Starting Agent: Amazon ECS Agent - v1.12.2 (ecda8a6)
2016-09-20T01:34:58Z [INFO] Loading configuration
2016-09-20T01:34:58Z [DEBUG] Loaded config: Cluster: kitchen-ecs-dev-cluster, Region: us-east-1, DataDir: /etc/ecs/data, Checkpoint: true, AuthType: , UpdatesEnabled: false, DisableMetrics: false, ReservedMem: 256, TaskCleanupWaitDuration: 1m0s, DockerStopTimeout: 10s
2016-09-20T01:34:58Z [INFO] Checkpointing is enabled. Attempting to load state
2016-09-20T01:34:58Z [INFO] Loading state! module="statemanager"
2016-09-20T01:34:58Z [INFO] Event stream ContainerChange start listening...
2016-09-20T01:34:59Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20 1.21 1.22]
2016-09-20T01:34:59Z [INFO] Registering Instance with ECS
2016-09-20T01:34:59Z [INFO] Registered! module="api client"
2016-09-20T01:34:59Z [INFO] Registration completed successfully. I am running as 'arn:aws:ecs:us-east-1:XXXXXXXXXXXX:container-instance/XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX' in cluster 'kitchen-ecs-dev-cluster'
2016-09-20T01:34:59Z [INFO] Saving state! module="statemanager"
2016-09-20T01:34:59Z [INFO] Beginning Polling for updates
2016-09-20T01:34:59Z [INFO] Initializing stats engine
2016-09-20T01:34:59Z [INFO] Event stream DeregisterContainerInstance start listening...
2016-09-20T01:34:59Z [DEBUG] Could not map container to task, ignoring, err: Could not map docker id to task: 85fbf1cd004bcc1af46562c184cd4a2b2b460ad50d7dc34bee32d8f2a9a id: 85fbf1cd004bcc1af46562c184cd4a2b2b460ad50d7dc34bee32d8f2a9a
2016-09-20T01:34:59Z [INFO] Creating poll dialer, proxy: my-proxy:8099
2016-09-20T01:34:59Z [DEBUG] Connecting to TCS endpoint https://ecs-t-2.us-east-1.amazonaws.com/
2016-09-20T01:34:59Z [INFO] Creating poll dialer, proxy: my-proxy:8099
2016-09-20T01:34:59Z [DEBUG] Starting websocket poll loop module="acs client"
2016-09-20T01:34:59Z [DEBUG] TCS client starting websocket poll loop
2016-09-20T01:34:59Z [DEBUG] Instance is idle. No task metrics to report
2016-09-20T01:34:59Z [DEBUG] TCS client sending payload: {"type":"PublishMetricsRequest","message":{"metadata":{"cluster":"kitchen-ecs-dev-cluster","containerInstance":"arn:aws:ecs:us-east-1:XXXXXXXXXXXX:container-instance/XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX","fin":true,"idle":true,"messageId":"XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX"},"timestamp":1474335299}}

`start ecs` doesn't seem to start ecs-agent

Here is everything in user-data (largely copied from http://docs.datadoghq.com/integrations/ecs/#create-an-ecs-task):

#!/bin/bash
yum install -y aws-cli jq
aws s3 cp s3://xxx/ecs.config /etc/ecs/ecs.config

ECS_CLUSTER=${aws_ecs_cluster.main.name}
DD_TASK_DEFINITON="dd-agent"

echo "before"
docker ps -a

echo ECS_CLUSTER=$ECS_CLUSTER >> /etc/ecs/ecs.config
start ecs

echo "after"
docker ps -a

# @see: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-introspection.html
EC2_AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
EC2_REGION="`echo \"$EC2_AZ\" | sed -e 's:\([0-9][0-9]*\)[a-z]*\$:\\1:'`"
CONTAINER_INSTANCE_ARN=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F/ '{print $NF}' )

echo "before curl"
curl -v http://localhost:51678/v1/metadata
echo "after curl"

# On EC2 boot, start an ECS task using the dd-agent task definition.
echo "
cluster=$ECS_CLUSTER
az=$EC2_AZ
region=$EC2_REGION
aws ecs start-task \
  --cluster $ECS_CLUSTER \
  --task-definition $DD_TASK_DEFINITON \
  --container-instances $CONTAINER_INSTANCE_ARN \
  --region $EC2_REGION
  " >> /etc/rc.local
EOF

This is the tail of cloud-init-output:

...
Dependency Installed:
  freetype.x86_64 0:2.3.11-15.14.amzn1
  jq-libs.x86_64 0:1.5-1.2.amzn1
  libjpeg-turbo.x86_64 0:1.2.90-5.14.amzn1
  mailcap.noarch 0:2.1.31-2.7.amzn1
  oniguruma.x86_64 0:5.9.1-3.1.2.amzn1
  python27-botocore.noarch 0:1.4.86-1.62.amzn1
  python27-colorama.noarch 0:0.2.5-1.7.amzn1
  python27-dateutil.noarch 0:2.1-1.3.amzn1
  python27-docutils.noarch 0:0.11-1.15.amzn1
  python27-futures.noarch 0:3.0.3-1.3.amzn1
  python27-imaging.x86_64 0:1.1.6-19.9.amzn1
  python27-jmespath.noarch 0:0.9.0-1.11.amzn1
  python27-ply.noarch 0:3.4-3.12.amzn1
  python27-pyasn1.noarch 0:0.1.7-2.9.amzn1
  python27-rsa.noarch 0:3.4.1-1.8.amzn1

Complete!
download: s3://xxx/ecs.config to etc/ecs/ecs.config
before
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
ecs start/running, process 3036
after
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
before curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1...
* connect to 127.0.0.1 port 51678 failed: Connection refused
* Failed to connect to localhost port 51678: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 51678: Connection refused
after curl

Additional information:

  1. AMI: ami-6df8fe7a
  2. I'm using terraform which is why you see ECS_CLUSTER=${aws_ecs_cluster.main.name} but assume that ECS_CLUSTER=main
  3. When I try to run curl I seem can't connect
  4. I've tried sudo start ecs but I get the same results
  5. I can curl and get a response after ssh into the EC2 instance
  6. ... and the ecs-agent is up too
[ec2-user@ip-172-20-2-81 ~]$ docker ps -a
CONTAINER ID        IMAGE                            COMMAND             CREATED             STATUS              PORTS               NAMES
7a63f5ba0a8b        amazon/amazon-ecs-agent:latest   "/agent"            3 minutes ago       Up 3 minutes                            ecs-agent

... and the contents of my rc.local file looks like this:

#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

cluster=main
az=us-east-1b
region=us-east-1
aws ecs start-task   --cluster main   --task-definition dd-agent   --container-instances    --region us-east-1

There isn't a value next to --container-instances because I couldn't fetch any metadata from the ecs-agent 😢

Any ideas?

ECS_UPDATE_DOWNLOAD_DIR appears to be hardcoded to /var/cache/ecs.

When I install ecs-init and start the ecs service, the agent.tar appears to always be downloaded to /var/cache/ecs. My /etc/ecs/ecs.config looks like the following.

ECS_CLUSTER=kitchen-ecs-dev-cluster
ECS_AVAILABLE_LOGGING_DRIVERS=["json-file", "awslogs"]
ECS_CHECKPOINT=true
ECS_DATADIR=/etc/ecs/data
ECS_LOGLEVEL=debug
ECS_LOGFILE=/var/log/ecs/ecs-agent.log
ECS_UPDATES_ENABLED=false
ECS_UPDATE_DOWNLOAD_DIR=/etc/ecs/cache
ECS_RESERVED_MEMORY=256
ECS_SELINUX_CAPABLE=true
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=10m
ECS_CONTAINER_STOP_TIMEOUT=10s
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true
HTTP_PROXY=http://my-proxy:8099
HTTPS_PROXY=http://my-proxy:8099
NO_PROXY=169.254.169.254,/var/run/docker.sock

So I expect updates to be downloaded to /etc/ecs/cache. Any thoughts?

docker version

Hi,

More of a question than an issue. The spec (packaging/amazon-linux-ami/ecs-init.spec) file contains:

Requires: docker >= 1.6.0, docker <= 1.6.2

Is there a reason newer versions of docker will not work. Given that docker has not had any backward incompatible changes it appears that

Requires: docker >= 1.6.0

should be sufficient.

Thoughts?

scripts/update-version.sh modifies git-managed files

update-version.sh is used to update various references to the ecs-init version in source code. One thing it does is store the short sha of the current HEAD in ecs-init/version/version.go. However, because this file is owned by git, any changes to the file are immediately invalid because the sha referenced by HEAD changes.

We could either stop tracking version.go and ensure that it gets generated at build time, or come up with some other mechanism to inject the current sha that doesn't involve modifying files (e.g. reading from some environment variable or injecting the output of some command).

No ecs-init package available in Amazon Linux 2

Summary

No ecs-init package available in yum repository on Amazon Linux 2.

Expected Behavior

Should install ecs-init package.

Observed Behavior

bash-4.2# yum install ecs-init
Loaded plugins: ovl, priorities
amzn2-core                                               | 2.0 kB     00:00
No package ecs-init available.
Error: Nothing to do

Environment Details

OS: Amazon Linux 2 (amazonlinux:2017.12)
Kernel: 4.9.60-linuxkit-aufs

Docker Daemon Logs Not Being Rotated?

Hi there,

Noticing this on the ECS Optimised AMI 2017.03.f:

[root@ip-10-14-23-58 log]# du -ha | grep docker
851M	./docker

Just looks to container Docker daemon logs:

[root@ip-10-14-23-58 log]# tail -n 5 docker
time="2017-10-09T22:11:54.979491178Z" level=error msg="Handler for GET /v1.17/containers/2aa368aae18e2c066f5ac375af2779351c9c802a951f205a08536265d61f8813/json returned error: No such container: 2aa368aae18e2c066f5ac375af2779351c9c802a951f205a08536265d61f8813"
time="2017-10-09T22:42:16.244102384Z" level=info msg="Container de644dfc59e4ec25d63d9339b2531d3fef562107384a8e61d38bd3b253ac1774 failed to exit within 30 seconds of signal 15 - using the force"
time="2017-10-09T23:12:16.520693171Z" level=error msg="Handler for GET /v1.17/containers/204ae4cab609c90b9e13c889f6d767082d0404f7d27c5c184b6f4233eafa1c7b/json returned error: No such container: 204ae4cab609c90b9e13c889f6d767082d0404f7d27c5c184b6f4233eafa1c7b"
time="2017-10-09T23:12:16.553052378Z" level=error msg="Handler for GET /v1.17/containers/de644dfc59e4ec25d63d9339b2531d3fef562107384a8e61d38bd3b253ac1774/json returned error: No such container: de644dfc59e4ec25d63d9339b2531d3fef562107384a8e61d38bd3b253ac1774"
time="2017-10-09T23:47:23.114060944Z" level=error msg="Handler for GET /v1.17/containers/b61c41d7ab9e5f3604b7084f5e382d7e5ec278d1b1fba822fac09807dc7ff95e/json returned error: No such container: b61c41d7ab9e5f3604b7084f5e382d7e5ec278d1b1fba822fac09807dc7ff95e"

Where can I find the log rotation settings for this file? As I've experienced it filling up my root disk in past where I have high volumes of traffic running through my containers and an error with a logging driver occurs (i.e. FluentD agent dies or starts generating buffer overflow errors under load, so Docker deamon starts writing the log lines out to disk saying that it couldn't forward the logs):
moby/moby#34974

Any assistance that you can offer on this front would be much appreciated.

ECS agent update process causes downtime

Summary

The ECS docs clearly state

Updating the Amazon ECS container agent does not interrupt running tasks or services on the container instance.

But this does not appear to be the case. All containers on the instance are killed when amazon-ecs-init updates the ECS agent, causing a few minutes of downtime for my services.

Description

I am running the Amazon ECS-Optimized AMI 2017.09.k.

After a new version of the agent is released, at midnight GMT, amazon-ecs-init stops the ECS agent, downloads the new version, unzips it and starts it.

When this happens, every single task for all of my services across both of my 2 ECS clusters is suddenly killed. In other words, all hell breaks loose! Downtime, unhappy users, PagerDuty alerts, unhappy devs, etc.

This has happened a few times over the last few months. The most recent was last night (2018-04-07T00:00:05Z), when the instances picked up version 1.17.3 of the agent. (I understand that 1.17.3 has been out for a few days, but the instances were only started yesterday, so this was the first time for them to run the update process).

Expected Behavior

No downtime

Observed Behavior

All tasks on the instance are killed

Environment Details

Amazon ECS-Optimized AMI 2017.09.k

Supporting Log Snippets

/var/log/ecs/ecs-init.log:

2018-04-07T00:00:05Z [INFO] Agent exited with code 0
2018-04-07T00:00:05Z [DEBUG] Current region (eu-west-1) does not contain the agent's s3 bucket, using the default region (us-east-1) for downloader.
2018-04-07T00:00:05Z [DEBUG] Setting region for s3 client to: us-east-1
2018-04-07T00:00:05Z [DEBUG] Current region (eu-west-1) does not contain the agent's s3 bucket, using the default region (us-east-1) for downloader.
2018-04-07T00:00:05Z [INFO] post-stop
2018-04-07T00:00:05Z [INFO] Cleaning up the credentials endpoint setup for Amazon Elastic Container Service Agent
2018-04-07T00:00:09Z [DEBUG] Current region (eu-west-1) does not contain the agent's s3 bucket, using the default region (us-east-1) for downloader.
2018-04-07T00:00:09Z [DEBUG] Setting region for s3 client to: us-east-1
2018-04-07T00:00:09Z [DEBUG] Current region (eu-west-1) does not contain the agent's s3 bucket, using the default region (us-east-1) for downloader.
2018-04-07T00:00:09Z [INFO] pre-start
2018-04-07T00:00:09Z [INFO] Downloading Amazon Elastic Container Service Agent
2018-04-07T00:00:09Z [DEBUG] Created temporary file for md5sum: /var/cache/ecs/ecs-agent.tar.md5655012714
2018-04-07T00:00:09Z [DEBUG] Removing temp file /var/cache/ecs/ecs-agent.tar.md5655012714
2018-04-07T00:00:09Z [DEBUG] Created temporary file for agent tarball: /var/cache/ecs/ecs-agent.tar013650881
2018-04-07T00:00:14Z [DEBUG] Expected 315e069e3b196807da9d821b24134c6d
2018-04-07T00:00:14Z [DEBUG] Calculated 315e069e3b196807da9d821b24134c6d
2018-04-07T00:00:14Z [DEBUG] Attempting to rename /var/cache/ecs/ecs-agent.tar013650881 to /var/cache/ecs/ecs-agent.tar
2018-04-07T00:00:14Z [INFO] Loading Amazon Elastic Container Service Agent into Docker
2018-04-07T00:00:16Z [DEBUG] Current region (eu-west-1) does not contain the agent's s3 bucket, using the default region (us-east-1) for downloader.
2018-04-07T00:00:16Z [DEBUG] Setting region for s3 client to: us-east-1
2018-04-07T00:00:16Z [DEBUG] Current region (eu-west-1) does not contain the agent's s3 bucket, using the default region (us-east-1) for downloader.
2018-04-07T00:00:16Z [INFO] start
2018-04-07T00:00:16Z [INFO] Container name: /ecs-identity-admin-46-identity-admin-e8b48d9ac986bd9b7e00
2018-04-07T00:00:16Z [INFO] Container name: /ecs-agent
2018-04-07T00:00:16Z [INFO] Removing existing agent container ID: dea0fe026bb21c06a9cc843f8c34cd0c3071dfdd37829e71231d4f012162ee58
2018-04-07T00:00:16Z [INFO] Starting Amazon Elastic Container Service Agent

failing load() calls aren't retried: "could not load .. write: broken pipe"

Summary

too-fast startup(?) is causing ecs-init to never succeed:

2018-10-22T23:57:08Z [INFO] pre-start
2018-10-22T23:57:09Z [ERROR] could not load Amazon Elastic Container Service Agent into Docker: write unix @->/var/run/docker.sock: write: broken pipe

Description

Using Amazon Linux 2, ECS optimized, and seeing this error on startup. If I simply restart ecs-agent, everything is fine. I suspect that docker is starting in systemctl and sending a notify before the sock is open. For some reason the ecs-init just loops on that.

Apologies I don't have a beautiful test case for this. I'm hoping the observation is enough to detect the problem. I'm guessing the LoadImage call needs to be retried when this error occurs similar to how #142 and #78 were done.

Expected Behavior

as above.

Observed Behavior

as above.

Environment Details

ami-0a6be20ed8ce1f055

Kernel Version: 4.14.72-73.55.amzn2.x86_64
Operating System: Amazon Linux 2
OSType: linux
Architecture: x86_64

Supporting Log Snippets

as above.

ecs agent unable to update via cli

Summary
Unable to update ecs agent to latest verson via cli

Description
When updating via the console, it successfully updates ecs-init to latest version, via yum:
$ yum install ecs-init
Package ecs-init-1.19.1-1.amzn1.x86_64 already installed and latest version.

$ curl -s 127.0.0.1:51678/v1/metadata | python -mjson.tool
"Version": "Amazon ECS Agent - v1.19.1 (13a0fabe)"

In the console:
Outdated ECS Agent
One or more container instances are not running the latest version of the Amazon ECS container agent

Upstart service doesn't respawn

I'm trying to use the this package as an upstart service but it doesn't seem to be resilient to docker daemon restarts.

$ uname -a
Linux some-host-name 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty

ytong@some-host-name:~$ initctl status ecs
ecs start/running, process 48412
ytong@some-host-name:~$ sudo service docker restart
docker stop/waiting
docker start/running, process 50427
ytong@some-host-name:~$ initctl status ecs
ecs stop/waiting

Restarting the docker daemon puts the ecs-init daemon into a stop/waiting state. The actual ecs-agent container remains running, but I'd like the ecs-init service to also be running. I've tried tweaking the ecs.conf file but I don't seem to be able to get the right combination of respawn and expect. Do you guys have any ideas?

Thank you.

Missing latest in Amazon Linux repositories

I'm using the latest Amazon Linux AMI, but when I try to install ecs-init it only finds version 1.14.0 instead of 1.14.1.

When running yum --showduplicates list ecs-init this is what I get:

Loaded plugins: priorities, update-motd, upgrade-helper
Installed Packages
ecs-init.x86_64                                      1.14.0-2.amzn1                                      @amzn-updates
Available Packages
ecs-init.x86_64                                      1.12.2-1.amzn1                                      amzn-main
ecs-init.x86_64                                      1.13.0-1.amzn1                                      amzn-updates
ecs-init.x86_64                                      1.13.1-1.amzn1                                      amzn-updates
ecs-init.x86_64                                      1.13.1-2.amzn1                                      amzn-updates
ecs-init.x86_64                                      1.14.0-1.amzn1                                      amzn-updates
ecs-init.x86_64                                      1.14.0-2.amzn1                                      amzn-updates

I tried updating the repos but it didn't help. Am I missing something obvious or 1.14.1 wasn't pushed to the repository?

Unconditionally restarting ECS agent results in throttling errors

If there's a client error with the registration APIs in ECS agent (Example: limits are exceeded during RegisterContainerInstance API), the ECS agent container is restarted immediately. If this happens over and over again, it'll result in consumption of API capacity by the customer's account, resulting in throttling errors for various other APIs.

ECS init and agent should be modified to deal with this more gracefully. A solution would be for the ECS service to return specific error codes for such scenarios so that the ECS agent can handle these error codes to indicate that a delayed restart needs to be performed by ECS init.

IMPORTANT: Broken ecs-init-1.17.3 on @amzn-updates.

Greetings,

Today my ECS Instances stopped connecting to my ECS Cluster after scheduled updates, the culprit seems to be a ecs-init-1.17.3 pushed to @amzn-update repository.

It seems to have been caused by 77120e3 which I am not sure how it has landed there to begin with! it is detached from the master and no release has been tagged, how could it land on @amazn-updates?

Disabling ecs

With the version of Upstart installed on Amazon Linux, you can't disable ecs using an override file. Would it be possible to move it chkconfig? Or, is there something I'm missing with disabling the service?

http proxy settings ignored in /etc/init/ecs.override

ecs.override:
echo "HTTP_PROXY=$PROXY_HOST:$PROXY_PORT" >> /etc/init/ecs.override
echo "HTTPS_PROXY=$PROXY_HOST:$PROXY_PORT" >> /etc/init/ecs.override
echo "NO_PROXY=169.254.169.254,169.254.170.2,/var/run/docker.sock" >> /etc/init/ecs.override

ecs-init fails to download the ecs-agent:
2017-08-08T19:49:57Z [ERROR] could not download Amazon EC2 Container Serivce Agent: Get https://s3.amazonaws.com/amazon-ecs-agent/ecs-agent-v1.14.3.tar.md5: dial tcp 52.216.16.219:443: i/o timeout
(proxy access log does not show this request)

when I set http_proxy env variables, ecs-init does download the ecs-agent properly. but since this is problematic during OS boot, I am wondering if there is an alternate way doing this. documentation isn't very clear how this is supposed to work. tried
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html
with no luck.

Missing dependency on ec2-net-utils

Summary

According to @jahkeup ec2-net-utils' udev rules are crucial for the operation of the awsvpc network mode, however the package is not included as a dependency of ecs-init and is therefore not automatically installed.

Description

I followed the documented instructions for installing ecs-init on Amazon Linux 2.

Expected Behavior

ec2-net-utils should be installed automatically or it should be noted in the documentation as a dependency.

Observed Behavior

ec2-net-utils was not automatically installed.

Environment Details

AMI ID: amzn2-ami-hvm-2.0.20181008-x86_64-gp2

Please don't use dashes in version numbers

The current upstream version of amazon-ecs-init contains a dash in its version numbers which can cause problems when packaging the software downstream.

Trying to build 1.18.0-1 on openSUSE bails out with the following error:

[    3s] + exec rpmbuild -ba --define '_srcdefattr (-,root,root)' --nosignature /home/abuild/rpmbuild/SOURCES/amazon-ecs-init.spec
[    3s] error: line 21: Illegal char '-' (0x2d) in: Version:        1.18.0-1
[    3s] 
[    3s] suse-laptop failed "build amazon-ecs-init.spec" at Tue Jul 17 11:25:49 UTC 2018.
[    3s]

Most distribution package managers use the number after the dash to indicate the package revision in the distribution, so upstream vendors should avoid using version numbers that include dashes.

While downstream distributions can just remove the dash from the version string and set the version 1.18.0, at least on openSUSE this means the upstream tarball will have to be repacked such that the source folder is called amazon-ecs-init-1.18.0 which will destroy the checkum of the upstream tarball.

ecs-init fails when it can't connect to docker

I believe this is a timing issue. I am using the ElasticBeanstalk-blessed image aws-elasticbeanstalk-amzn-2016.09.1.x86_64-ecs-pv-201704030647 (ami-059ea063).

When ever a new instance gets booted up the process fails. I find that the ecs daemon isn't running with the following ecs-init logs:

2017-05-19T09:29:55Z [INFO] pre-start
2017-05-19T09:29:56Z [INFO] start
2017-05-19T09:29:56Z [INFO] No existing agent container to remove.
2017-05-19T09:29:56Z [INFO] Starting Amazon EC2 Container Service Agent
2017-05-19T09:30:37Z [INFO] Agent exited with code 0
2017-05-19T09:30:37Z [INFO] Network error connecting to docker, backing off for '1.14777941s', error: dial unix /var/run/docker.sock: connect: no such file or directory
2017-05-19T09:30:40Z [INFO] post-stop
2017-05-19T09:30:40Z [INFO] Cleaning up the credentials endpoint setup for Amazon EC2 Container Service Agent

Here is what I find in the ecs-agent log:

2017-05-19T09:29:57Z [INFO] Starting Agent: Amazon ECS Agent - v1.14.1 (467c3d7)
2017-05-19T09:29:57Z [INFO] Loading configuration
2017-05-19T09:29:57Z [INFO] Checkpointing is enabled. Attempting to load state
2017-05-19T09:29:57Z [INFO] Loading state! module="statemanager"
2017-05-19T09:29:57Z [INFO] Event stream ContainerChange start listening...
2017-05-19T09:29:57Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20 1.21 1.22 1.23]
2017-05-19T09:29:57Z [INFO] Registering Instance with ECS
2017-05-19T09:29:58Z [INFO] Registered! module="api client"
2017-05-19T09:29:58Z [INFO] Registration completed successfully. I am running as 'arn:aws:ecs:eu-west-1:xxx:container-instance/79e9988c-8c80-4559-9d65-05807e5e63e0' in cluster 'awseb-xxx-b243eyymgn'
2017-05-19T09:29:58Z [INFO] Saving state! module="statemanager"
2017-05-19T09:29:58Z [INFO] Beginning Polling for updates
2017-05-19T09:29:58Z [INFO] Event stream DeregisterContainerInstance start listening...
2017-05-19T09:29:58Z [INFO] Initializing stats engine
2017-05-19T09:29:58Z [INFO] NO_PROXY set:169.254.169.254,169.254.170.2,/var/run/docker.sock
2017-05-19T09:29:59Z [INFO] Handling http request module="Handlers" method="GET" from="127.0.0.1:58406" uri="/v1/metadata"
2017-05-19T09:30:08Z [INFO] Saving state! module="statemanager"

If I then sudo start ecs everything seems to run correctly. I believe this was supposed to be fixed in #78, and it does log out that it backs off, but it doesn't look like it actually retries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.