cloudstax / firecamp Goto Github PK
View Code? Open in Web Editor NEWServerless Platform for the stateful services
Home Page: https://www.cloudstax.io
License: Apache License 2.0
Serverless Platform for the stateful services
Home Page: https://www.cloudstax.io
License: Apache License 2.0
Hi @JuniusLuo . Could you please publish Firecamp upgrade guide to the wiki? I'm running 0.9.6 and would like to upgrade to 1.0 but worrying about the correct procedure.
Don't know how to make a pull request for wiki pages. There's a small typo in Delete the Stateful Service - used create-service
command instead of delete-service
. Please, fix
Hello. Can you please update the Kafka service to use the latest Kafka version - 2.0.0?
...
Reading package lists...
Building dependency tree...
Reading state information...
E: Version '9.6.6-1.pgdg80+1' for 'postgresql-9.6' was not found
E: Version '9.6.6-1.pgdg80+1' for 'postgresql-contrib-9.6' was not found
The command '/bin/sh -c apt-get update && apt-get install -y postgresql-common && sed -ri 's/#(create_main_cluster) .*$/\1 = false/' /etc/postgresql-common/createcluster.conf && apt-get install -y dnsutils postgresql-$PG_MAJOR=$PG_VERSION postgresql-contrib-$PG_MAJOR=$PG_VERSION && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100
make: *** [docker] Error 100
In the test cluster I'm running, there's only one node in ASG. It was in AZ us-east-1a. Before going home yesterday I've updated ASG to 0 nodes. Today I've reverted ASG settings back to 1 node, but this time AWS has started the EC2 instance in AZ us-east-1b.
ECS has started firecamp-manageserver w/o issues, but it wasn't able to start C* due to task placement constraint appeared to be in AZ 1a:
I've checked Task definition for C* - it doesn't have any placement constraints set.
Do you know what's going on and how to get that fixed? I tried to update the C* service with "Force deployment" checkbox set - didn't help.
Please, implement kafka-manager (https://github.com/yahoo/kafka-manager) into Kafka service creation.
A ready-to-use docker image - https://hub.docker.com/r/sheepkiller/kafka-manager/ - can be used. Probably the best way is to add a command line keys like -enable-kafka-manager and -kafka-manager-port=9000 to run it along with the kafka service.
+ target=firecamp-cassandra-init
+ image=mydocker/firecamp-cassandra-init:3.11
+ path=/home/user/go/src/github.com/cloudstax/firecamp/catalog/cassandra/3.11/init-task-dockerfile/
+ cp /home/user/go/src/github.com/cloudstax/firecamp/catalog/waitdns.sh /home/user/go/src/github.com/cloudstax/firecamp/catalog/cassandra/3.11/init-task-dockerfile/
+ docker build -q -t mydocker/firecamp-cassandra-init:3.11 /home/user/go/src/github.com/cloudstax/firecamp/catalog/cassandra/3.11/init-task-dockerfile/
Sending build context to Docker daemon 6.656kB
Step 1/11 : FROM debian:jessie-backports
---> f48e88a3ad1f
Step 2/11 : RUN { echo 'Package: openjdk-* ca-certificates-java'; echo 'Pin: release n=*-backports'; echo 'Pin-Priority: 990'; } > /etc/apt/preferences.d/java-backports
---> Running in 3d6f40fb4105
Removing intermediate container 3d6f40fb4105
---> 1b6e064f6aa5
Step 3/11 : ENV GPG_KEYS 514A2AD631A57A16DD0047EC749D6EEC0353B12C A26E528B271F19B9E5D8E19EA278B781FE4B2BDA
---> Running in 1853bc38b07a
Removing intermediate container 1853bc38b07a
---> 4c2f20d37ef8
Step 4/11 : RUN set -ex; export GNUPGHOME="$(mktemp -d)"; for key in $GPG_KEYS; do gpg --keyserver ha.pool.sks-keyservers.net --recv-keys "$key"; done; gpg --export $GPG_KEYS > /etc/apt/trusted.gpg.d/cassandra.gpg; rm -r "$GNUPGHOME"; apt-key list
---> Running in 569cb039131a
+ mktemp -d
+ export GNUPGHOME=/tmp/tmp.cufASzNBoI
+ gpg --keyserver ha.pool.sks-keyservers.net --recv-keys 514A2AD631A57A16DD0047EC749D6EEC0353B12C
gpg: keyring `/tmp/tmp.cufASzNBoI/secring.gpg' created
gpg: keyring `/tmp/tmp.cufASzNBoI/pubring.gpg' created
gpg: requesting key 0353B12C from hkp server ha.pool.sks-keyservers.net
gpg: /tmp/tmp.cufASzNBoI/trustdb.gpg: trustdb created
gpg: key 0353B12C: public key "T Jake Luciani <[email protected]>" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
+ gpg --keyserver ha.pool.sks-keyservers.net --recv-keys A26E528B271F19B9E5D8E19EA278B781FE4B2BDA
gpg: requesting key FE4B2BDA from hkp server ha.pool.sks-keyservers.net
gpg: key FE4B2BDA: public key "Michael Shuler <[email protected]>" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
+ gpg --export 514A2AD631A57A16DD0047EC749D6EEC0353B12C A26E528B271F19B9E5D8E19EA278B781FE4B2BDA
+ rm -r /tmp/tmp.cufASzNBoI
+ apt-key list
/etc/apt/trusted.gpg.d/cassandra.gpg
------------------------------------
pub 4096R/0353B12C 2014-09-05
uid T Jake Luciani <[email protected]>
sub 4096R/D35F8215 2014-09-05
pub 4096R/FE4B2BDA 2009-07-15
uid Michael Shuler <[email protected]>
uid Michael Shuler <[email protected]>
sub 4096R/25A883ED 2009-07-15
/etc/apt/trusted.gpg.d/debian-archive-jessie-automatic.gpg
----------------------------------------------------------
pub 4096R/2B90D010 2014-11-21 [expires: 2022-11-19]
uid Debian Archive Automatic Signing Key (8/jessie) <[email protected]>
/etc/apt/trusted.gpg.d/debian-archive-jessie-security-automatic.gpg
-------------------------------------------------------------------
pub 4096R/C857C906 2014-11-21 [expires: 2022-11-19]
uid Debian Security Archive Automatic Signing Key (8/jessie) <[email protected]>
/etc/apt/trusted.gpg.d/debian-archive-jessie-stable.gpg
-------------------------------------------------------
pub 4096R/518E17E1 2013-08-17 [expires: 2021-08-15]
uid Jessie Stable Release Key <[email protected]>
/etc/apt/trusted.gpg.d/debian-archive-stretch-automatic.gpg
-----------------------------------------------------------
pub 4096R/F66AEC98 2017-05-22 [expires: 2025-05-20]
uid Debian Archive Automatic Signing Key (9/stretch) <[email protected]>
sub 4096R/B7D453EC 2017-05-22 [expires: 2025-05-20]
/etc/apt/trusted.gpg.d/debian-archive-stretch-security-automatic.gpg
--------------------------------------------------------------------
pub 4096R/8AE22BA9 2017-05-22 [expires: 2025-05-20]
uid Debian Security Archive Automatic Signing Key (9/stretch) <[email protected]>
sub 4096R/331F7F50 2017-05-22 [expires: 2025-05-20]
/etc/apt/trusted.gpg.d/debian-archive-stretch-stable.gpg
--------------------------------------------------------
pub 4096R/1A7B6500 2017-05-20 [expires: 2025-05-18]
uid Debian Stable Release Key (9/stretch) <[email protected]>
/etc/apt/trusted.gpg.d/debian-archive-wheezy-automatic.gpg
----------------------------------------------------------
pub 4096R/46925553 2012-04-27 [expires: 2020-04-25]
uid Debian Archive Automatic Signing Key (7.0/wheezy) <[email protected]>
/etc/apt/trusted.gpg.d/debian-archive-wheezy-stable.gpg
-------------------------------------------------------
pub 4096R/65FFB764 2012-05-08 [expires: 2019-05-07]
uid Wheezy Stable Release Key <[email protected]>
Removing intermediate container 569cb039131a
---> 83e8962b3a75
Step 5/11 : RUN echo 'deb http://www.apache.org/dist/cassandra/debian 311x main' >> /etc/apt/sources.list.d/cassandra.list
---> Running in cac39e19f098
Removing intermediate container cac39e19f098
---> 9fa7a6eac994
Step 6/11 : ENV CASSANDRA_VERSION 3.11.0
---> Running in 353390195efa
Removing intermediate container 353390195efa
---> 7494779b0b6a
Step 7/11 : RUN apt-get update && apt-get install -y curl dnsutils cassandra="$CASSANDRA_VERSION" cassandra-tools="$CASSANDRA_VERSION" && rm -rf /var/lib/apt/lists/*
---> Running in b9d7183129d4
Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
Ign http://deb.debian.org jessie InRelease
Get:2 http://security.debian.org jessie/updates/main amd64 Packages [608 kB]
Get:3 http://deb.debian.org jessie-updates InRelease [145 kB]
Get:4 http://www.apache.org 311x InRelease [3169 B]
Get:5 http://deb.debian.org jessie-backports InRelease [166 kB]
Get:6 http://deb.debian.org jessie Release.gpg [2434 B]
Get:7 http://deb.debian.org jessie Release [148 kB]
Get:8 http://www.apache.org 311x/main amd64 Packages [686 B]
Get:9 http://deb.debian.org jessie-updates/main amd64 Packages [23.1 kB]
Get:10 http://deb.debian.org jessie-backports/main amd64 Packages [1172 kB]
Get:11 http://deb.debian.org jessie/main amd64 Packages [9064 kB]
Fetched 11.4 MB in 7s (1462 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Version '3.11.0' for 'cassandra' was not found
E: Version '3.11.0' for 'cassandra-tools' was not found
The command '/bin/sh -c apt-get update && apt-get install -y curl dnsutils cassandra="$CASSANDRA_VERSION" cassandra-tools="$CASSANDRA_VERSION" && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100
make: *** [docker] Error 100
No issues with firecamp-cassandra.
$ cat cassandra.yaml
...
incremental_backups: false
Is it false intentionally?
Hi. Found a lot of such messages in the logs (CloudWatch):
[2017-11-29 22:43:58,934] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
Is there an option to stop flooding like this?
It would be greate to have update-service operation in the cli tool to manage amount of replicas, volumes size, heap size, etc
Nodes in the ECSClusterStack-ServiceAutoScalingGroup don't seem being able to fech the init.sh
file from S3. Here's the content of my cloud-init-output.log
:
Loaded plugins: priorities, update-motd, upgrade-helper
Package aws-cfn-bootstrap-1.4-26.17.amzn1.noarch already installed and latest version
Nothing to do
+ version=0.9
+ aws s3 cp s3://cloudstax/firecamp/releases/0.9/scripts/init.sh /tmp/init.sh
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
Nov 21 03:36:52 cloud-init[2813]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [1]
Nov 21 03:36:52 cloud-init[2813]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Nov 21 03:36:52 cloud-init[2813]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 0.7.6 finished at Tue, 21 Nov 2017 03:36:52 +0000. Datasource DataSourceEc2. Up 53.08 seconds
When I tried running the S3 command manually, I got the same exception:
[root@ip-10-0-35-248 tmp]# aws s3 cp s3://cloudstax/firecamp/releases/0.9/scripts/init.sh /tmp/init.sh
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
I used Cloudformation template firecamp-existingvpc to roll out the environment. Then created a zookeeper service according to the instruction (for Kafka). Here is what is in the Cloudwatch (firecamp-qa-zoo-qa) logs:
2017-12-01 08:20:31,031 [myid:3] - INFO [zoo-qa-2.firecamp-qa-firecamp.com/172.22.5.201:3888:QuorumPeer$QuorumServer@167] - Resolved hostname: zoo-qa-1.firecamp-qa-firecamp.com to address: zoo-qa-1.firecamp-qa-firecamp.com/172.22.2.62
2017-12-01 08:21:31,029 [myid:3] - INFO [zoo-qa-2.firecamp-qa-firecamp.com/172.22.5.201:3888:QuorumCnxManager$Listener@746] - Received connection request /172.22.2.62:59352
2017-12-01 08:21:31,030 [myid:3] - WARN [zoo-qa-2.firecamp-qa-firecamp.com/172.22.5.201:3888:QuorumCnxManager@588] - Cannot open channel to 2 at election address zoo-qa-1.firecamp-qa-firecamp.com/172.22.2.62:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:479)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:379)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:757)
2017-12-01 08:21:31,033 [myid:3] - INFO [zoo-qa-2.firecamp-qa-firecamp.com/172.22.5.201:3888:QuorumPeer$QuorumServer@167] - Resolved hostname: zoo-qa-1.firecamp-qa-firecamp.com to address: zoo-qa-1.firecamp-qa-firecamp.com/172.22.2.62
2017-12-01 08:22:31,031 [myid:3] - INFO [zoo-qa-2.firecamp-qa-firecamp.com/172.22.5.201:3888:QuorumCnxManager$Listener@746] - Received connection request /172.22.2.62:59356
2017-12-01 08:22:31,032 [myid:3] - WARN [zoo-qa-2.firecamp-qa-firecamp.com/172.22.5.201:3888:QuorumCnxManager@588] - Cannot open channel to 2 at election address zoo-qa-1.firecamp-qa-firecamp.com/172.22.2.62:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:479)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:379)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:757)
Is that OK?
I see that 172.22.2.62 has an established connection with 3rd zookeeper instance (1.119):
[root@ip-172-22-2-62 ec2-user]# netstat -anp|grep 3888
tcp 0 0 ::ffff:172.22.2.62:59908 ::ffff:172.22.5.201:3888 TIME_WAIT -
tcp 0 0 ::ffff:172.22.2.62:52050 ::ffff:172.22.1.119:3888 ESTABLISHED 3637/java
There is a warning in the log:
WARN [main] 2018-02-13 10:16:44,908 StartupChecks.java:203 - OpenJDK is not recommended. Please upgrade to the newest Oracle Java release
Do you think we need to switch to Oracle Java?
I accidentally forgot to change the service name and it still worked:
$ $PREFIX/firecamp-service-cli -op=stop-service -service-type=kafka -region=us-east-1 -cluster=firecamp-$ENV -service-name=zoo-$ENV
Service stopped
(Notice type=kafka and name=zoo)
running the firecamp.template
logging to bastion host
running firecamp-service-cli with -reserve-memory=768
still the service task definition has: "memoryReservation": 1024,
And when testing vs t2.micro there isn't left 1024MB
Couple issues:
# ./firecamp-service-cli -region=us-east-1 -cluster=firecamp-prod -op=create-service -service-type=telegraf -service-name=telegraf-cass -tel-monitor-service-name=cass-prod -max-memory=100
2018-03-19 16:48:11.758071617 +0000 UTC create service error InternalError: ClientException: Invalid setting for container 'firecamp-prod-telegraf-cass-container'. 'memory' must be greater than or equal to 'memoryReservation'.
status code: 400, request id: 4f6dd6d9-2b95-11e8-b3a3-eb68ba90531c
# ./firecamp-service-cli -region=us-east-1 -cluster=firecamp-prod -op=create-service -service-type=telegraf -service-name=telegraf-cass -tel-monitor-service-name=cass-prod
2018-03-19 16:49:31.649930624 +0000 UTC create service error ServiceExist: Service exists
# ./firecamp-service-cli -region=us-east-1 -cluster=firecamp-prod -op=create-service -help
Usage: firecamp-service-cli -op=create-service
-region string
The AWS region
-cluster string
The cluster name. Can only contain letters, numbers, or hyphens. default: mycluster
-service-type string
The catalog service type: mongodb|postgresql|cassandra|zookeeper|kafka|kafkamanager|redis|couchdb|consul|elasticsearch|kibana|logstash|telegraf
-service-name string
The service name. Can only contain letters, numbers, or hyphens. The max length is 58
-max-cpuunits int
The max number of cpu units for the container
-reserve-cpuunits int
The number of cpu units to reserve for the container. default: 256
-max-memory int
The max memory for the container, unit: MB
-reserve-memory int
The memory reserved for the container, unit: MB. default: 256
-volume-type string
The EBS volume type: gp2|io1|st1. default: gp2
-volume-size int
The size of each EBS volume, unit: GB
-volume-iops int
The EBS volume Iops when io1 type is chosen, otherwise ignored. default: 100
-volume-encrypted
whether to create encrypted volume. default: false
It doesn't display the required option -tel-monitor-service-name, while showing volume options which are definitely out of scope.
Hello. Not sure it's a good place to ask questions, but you might move this Q/A to the wiki section if you find it relevant.
What is the best way to get connected to the services like Cassandra/Kafka from other applications (Java code, for example)? I need to give my developers IP addresses of the service endpoints, but since the EC2 instances might be terminated and started new ones at any time, that IP addresses will be changed and an application will lose access to the service.
Hello. This might be a dumb question, I apologies. What is the correct way to restart a service (in terms of the cluster). For example, how to restart Kafka containers safely?
Please, fix
Hello,
Some time ago an alert came from our monitoring system that showed kafka service is not available. I looked at EC2 console and found that one of 3 firecamp brokers has alarm for Instance Status Checks. Wondering why that led to completely inaccessible Kafka service.
Here is how Kafka is checked from the monitoring host:
# /bin/docker run --rm harisekhon/cassandra-dev check_kafka.pl -B kafka-uat-0.firecamp-uat-firecamp.com:9092,kafka-uat-1.firecamp-uat-firecamp.com:9092,kafka-uat-2.firecamp-uat-firecamp.com:9092 -T testtopic -vvv
verbose mode on
check_kafka.pl version 0.3 => Hari Sekhon Utils version 1.18.9
broker host: kafka-uat-0.firecamp-uat-firecamp.com
broker port: 9092
broker host: kafka-uat-1.firecamp-uat-firecamp.com
broker port: 9092
broker host: kafka-uat-2.firecamp-uat-firecamp.com
broker port: 9092
host: kafka-uat-0.firecamp-uat-firecamp.com
port: 9092
topic: testtopic
required acks: 1
send-max-attempts: 1
receive-max-attempts: 1
retry-backoff: 200
sleep: 0.5
setting timeout to 10 secs
connecting to Kafka brokers kafka-uat-0.firecamp-uat-firecamp.com:9092,kafka-uat-1.firecamp-uat-firecamp.com:9092,kafka-uat-2.firecamp-uat-firecamp.com:9092
CRITICAL: Error: Cannot get metadata: topic='<undef>'
Trace begun at /usr/local/share/perl5/site_perl/Kafka/Connection.pm line 1592
Kafka::Connection::_error('Kafka::Connection=HASH(0x55caf194e5a0)', -1007, 'topic=\'<undef>\'') called at /usr/local/share/perl5/site_perl/Kafka/Connection.pm line 693
Kafka::Connection::get_metadata('Kafka::Connection=HASH(0x55caf194e5a0)') called at /github/nagios-plugins/check_kafka.pl line 257
main::__ANON__ at /github/nagios-plugins/lib/HariSekhonUtils.pm line 565
eval {...} at /github/nagios-plugins/lib/HariSekhonUtils.pm line 565
HariSekhonUtils::try('CODE(0x55caf19559d8)') called at /github/nagios-plugins/check_kafka.pl line 383
kafka-uat.log.gz
zoo-uat.log.gz
Zookeeper and Kafka logs attached.
After some time it's all got back to working state, but no Kafka service worked during ~15 minutes.
Please, take a look and let me know if you need anything else.
There was an issue with one of EC2 instances, so I've terminated it and the ASG has started new one. For some reason, the containers are not starting up on the new instance. They fail with (from /var/log/docker):
time="2018-03-02T10:07:55Z" level=info msg="2018/03/02 10:07:55 http: panic serving @: runtime error: invalid memory address or nil pointer dereference" plugin=3c95129f659d2d162550065c4200980c15d8d2ce25c002c9f01f96c84f3ea636
time="2018-03-02T10:07:55.565099089Z" level=warning msg="Unable to connect to plugin: /run/docker/plugins/3c95129f659d2d162550065c4200980c15d8d2ce25c002c9f01f96c84f3ea636/firecampvol.sock/VolumeDriver.Mount: Post http://%2Frun%2Fdocker%2Fplugins%2F3c95129f659d2d162550065c4200980c15d8d2ce25c002c9f01f96c84f3ea636%2Ffirecampvol.sock/VolumeDriver.Mount: EOF, retrying in 1s"
So, no volumes are mounted.
[root@ip-172-22-2-212 log]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
634ef2c0ea2f cloudstax/firecamp-amazon-ecs-agent:latest "/agent" 5 hours ago Up 5 hours ecs-agent
[root@ip-172-22-2-212 log]# docker plugin ls
ID NAME DESCRIPTION ENABLED
3c95129f659d cloudstax/firecamp-volume:latest firecamp volume plugin for docker true
656134559eb0 cloudstax/firecamp-log:latest firecamp log plugin for docker: consume lo... true
The only thing I did against this cluster recently was killing firecamp-manageserver task to make it updated to the latest.
Other two cluster nodes work w/o issues.
The only difference I see is the agent version:
grep Agent /var/log/firecamp/firecamp-dockervolume.INFO
"Amazon ECS Agent - v1.16.2 (*55b7b5f)" - at the new (non-working) instance
"Amazon ECS Agent - v1.16.0 (*e24ae08)" - at working instances
Please, help me to figure that out!
Used template firecamp-existingvpc.template. At some point cloudformation stack creation fails at the ASG initialization. Was able to find what's going on:
[root@ip-172-31-38-80 tmp]# docker plugin install --grant-all-permissions cloudstax/firecamp-log:0.9.2 CLUSTER="firecamp-prod"
0.9.2: Pulling from cloudstax/firecamp-log
481765f73fa2: Download complete
Digest: sha256:fb14fb10d55f7e78d65b1e874ee036844a0f5552074c9c94b1715695286e723a
Status: Downloaded newer image for cloudstax/firecamp-log:0.9.2
Error response from daemon: setting "CLUSTER" not found in the plugin configuration
The latest release installs w/o issues:
[root@ip-172-31-38-80 tmp]# docker plugin install --grant-all-permissions cloudstax/firecamp-log:latest CLUSTER="firecamp-prod"
latest: Pulling from cloudstax/firecamp-log
1c827c905aed: Download complete
Digest: sha256:809700ced49e4b477f5ff42f9a6ec4bdb3e649982bf91913f3f865afad932a2d
Status: Downloaded newer image for cloudstax/firecamp-log:latest
Installed plugin cloudstax/firecamp-log:latest
Please, fix
You have set for vpc and subnet
subnet-[0-9a-z]{8}
which is not corrent since they may be longer than 8 chars
change to subnet-[0-9a-z]{8,}
or anything else that would work
is can't be run now
The S3 CLI link as stated on master
's README.md uses CLI version 0.8.0; however, following through the tutorial on setting up a Cassandra cluster, there's a flag that only gets supposed in the 0.9.0 release: -journal-volume-size
firecamp-service-cli -op=create-service -service-type=cassandra -region=us-east-1 -cluster=t1 -service-name=mycas -replicas=3 -volume-size=100 -journal-volume-size=10
I've been trying to build the CLI from source on my local machine yet have been seeing issues with the aws/session
package:
root@8d9d88564e5c:/usr/src/firecamp-service-cli# go build -v
_/usr/src/firecamp-service-cli
# _/usr/src/firecamp-service-cli
./main.go:1182:36: cannot use sess (type *"github.com/aws/aws-sdk-go/aws/session".Session) as type *"github.com/cloudstax/firecamp/vendor/github.com/aws/aws-sdk-go/aws/session".Session in argument to awsroute53.NewAWSRoute53
@JuniusLuo can you help with providing the latest cli
executable or let me know how I'd fix the issue above? I came across Firecamp last night and this is a major roadblock. Thanks.
After some manipulations it appears volumes of a Cassandra replica were accidentally deleted. At least, I see the following in the firecamp logs (/var/log/firecamp/firecamp-dockervolume.ERROR) on one of EC2 instances:
E1212 12:36:32.039874 13 volume.go:851] detach journal volume from last owner error NotFound requuid 172.22.5.224-bda1319c0a71481456f7689bb2b61571-1513082191 {vol-00fd036927e65754a /dev/xvdj vol-0d1b18609d7b32e1c /dev/xvdk} &{bda1319c0a71481456f7689bb2b61571 2 cass-qa-2 us-east-1c arn:aws:ecs:us-east-1:ID:task/e36d526c-1007-4cf4-a3ca-ff962674c632 arn:aws:ecs:us-east-1:ID:container-instance/f822e87a-47c1-4a68-a8e8-9ccbe23e9009 i-0bd3125d1e463d369 1513070833991537870 {vol-00fd036927e65754a /dev/xvdj vol-0d1b18609d7b32e1c /dev/xvdk} 127.0.0.1 [0xc4202eaa20 0xc4202eaa80 0xc4202eaab0 0xc4202eab10]}
E1212 12:36:32.039896 13 volume.go:729] Mount failed, get service member error NotFound, serviceUUID bda1319c0a71481456f7689bb2b61571, requuid 172.22.5.224-bda1319c0a71481456f7689bb2b61571-1513082191
E1212 12:36:43.873859 13 ec2.go:222] failed to DescribeVolumes vol-0d1b18609d7b32e1c error InvalidVolume.NotFound: The volume 'vol-0d1b18609d7b32e1c' does not exist.
status code: 400, request id: a3acc2b9-f47a-4ec2-8364-74b627cc89c0 requuid 172.22.5.224-bda1319c0a71481456f7689bb2b61571-1513082203
E1212 12:36:43.873876 13 ec2.go:177] GetVolumeInfo vol-0d1b18609d7b32e1c error InvalidVolume.NotFound: The volume 'vol-0d1b18609d7b32e1c' does not exist.
status code: 400, request id: a3acc2b9-f47a-4ec2-8364-74b627cc89c0 requuid 172.22.5.224-bda1319c0a71481456f7689bb2b61571-1513082203
E1212 12:36:43.873884 13 ec2.go:162] GetVolumeState vol-0d1b18609d7b32e1c error InvalidVolume.NotFound: The volume 'vol-0d1b18609d7b32e1c' does not exist.
status code: 400, request id: a3acc2b9-f47a-4ec2-8364-74b627cc89c0 requuid 172.22.5.224-bda1319c0a71481456f7689bb2b61571-1513082203
E1212 12:36:43.873893 13 volume.go:1227] GetVolumeState error NotFound volume vol-0d1b18609d7b32e1c ServerInstanceID i-0bd3125d1e463d369 device /dev/xvdk requuid 172.22.5.224-bda1319c0a71481456f7689bb2b61571-1513082203
This leads to the following task event:
Status reason | CannotStartContainerError: API error (500): error while mounting volume '/var/lib/docker/plugins/4f11459ccd04e2f94009d96f631266758d8c3bc4fb120e1f9376a9bd568c1792/rootfs': VolumeDriver.Mount: Mount failed, get service member error NotFound, serviceUUID bda
Is there a way to re-create the failed replica without re-launching the whole Cassandra service from scratch?
WARN [main] 2018-02-13 10:16:44,919 StartupChecks.java:271 - Maximum number of memory map areas per process (vm.max_map_count) 262144 is too low, recommended value: 1048575, you can change it with sysctl.
Is that OK to change the default to 1048575?
+ target=firecamp-zookeeper
+ image=mydocker/firecamp-zookeeper:3.4
+ path=/home/user/go/src/github.com/cloudstax/firecamp/catalog/zookeeper/3.4/dockerfile/
+ docker build -q -t mydocker/firecamp-zookeeper:3.4 /home/user/go/src/github.com/cloudstax/firecamp/catalog/zookeeper/3.4/dockerfile/
Sending build context to Docker daemon 12.29kB
Step 1/12 : FROM openjdk:8-jre-alpine
8-jre-alpine: Pulling from library/openjdk
ff3a5c916c92: Pull complete
5de5f69f42d7: Pull complete
fa7536dd895a: Pull complete
Digest: sha256:d3468b0fab294db03b4a67cabdaccf9c47a635ad14429ad43a0cce522e1ca8b3
Status: Downloaded newer image for openjdk:8-jre-alpine
---> b1bd879ca9b3
Step 2/12 : RUN apk add --no-cache bash su-exec
---> Running in ecf20e16270f
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/7) Installing pkgconf (1.3.10-r0)
(2/7) Installing ncurses-terminfo-base (6.0_p20171125-r0)
(3/7) Installing ncurses-terminfo (6.0_p20171125-r0)
(4/7) Installing ncurses-libs (6.0_p20171125-r0)
(5/7) Installing readline (7.0.003-r0)
(6/7) Installing bash (4.4.12-r2)
Executing bash-4.4.12-r2.post-install
(7/7) Installing su-exec (0.2-r0)
Executing busybox-1.27.2-r7.trigger
OK: 90 MiB in 57 packages
Removing intermediate container ecf20e16270f
---> 65d417967c5c
Step 3/12 : ENV ZOO_USER=zookeeper
---> Running in 00c0ddf63307
Removing intermediate container 00c0ddf63307
---> d9897087278f
Step 4/12 : RUN set -x && adduser -D "$ZOO_USER"
---> Running in 3fccfad7a6e0
+ adduser -D zookeeper
Removing intermediate container 3fccfad7a6e0
---> 56fb9f7d254e
Step 5/12 : ARG GPG_KEY=C823E3E5B12AF29C67F81976F5CECB3CB5E9BD2D
---> Running in 1f4fcb41f88f
Removing intermediate container 1f4fcb41f88f
---> 0bd92d063a3b
Step 6/12 : ARG DISTRO_NAME=zookeeper-3.4.10
---> Running in 33b3fe94ede7
Removing intermediate container 33b3fe94ede7
---> 325071bad640
Step 7/12 : RUN set -x && apk add --no-cache --virtual .build-deps gnupg && wget -q "http://www.apache.org/dist/zookeeper/$DISTRO_NAME/$DISTRO_NAME.tar.gz" && wget -q "http://www.apache.org/dist/zookeeper/$DISTRO_NAME/$DISTRO_NAME.tar.gz.asc" && export GNUPGHOME="$(mktemp -d)" && gpg --keyserver ha.pool.sks-keyservers.net --recv-key "$GPG_KEY" && gpg --batch --verify "$DISTRO_NAME.tar.gz.asc" "$DISTRO_NAME.tar.gz" && tar -xzf "$DISTRO_NAME.tar.gz" && rm -r "$GNUPGHOME" "$DISTRO_NAME.tar.gz" "$DISTRO_NAME.tar.gz.asc" && apk del .build-deps
---> Running in cc737c771109
+ apk add --no-cache --virtual .build-deps gnupg
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/16) Installing libgpg-error (1.27-r1)
(2/16) Installing libassuan (2.4.4-r0)
(3/16) Installing libcap (2.25-r1)
(4/16) Installing pinentry (1.0.0-r0)
Executing pinentry-1.0.0-r0.post-install
(5/16) Installing libgcrypt (1.8.1-r0)
(6/16) Installing gmp (6.1.2-r1)
(7/16) Installing nettle (3.3-r0)
(8/16) Installing libunistring (0.9.7-r0)
(9/16) Installing gnutls (3.6.1-r0)
(10/16) Installing libksba (1.3.5-r0)
(11/16) Installing db (5.3.28-r0)
(12/16) Installing libsasl (2.1.26-r11)
(13/16) Installing libldap (2.4.45-r3)
(14/16) Installing npth (1.5-r1)
(15/16) Installing gnupg (2.2.3-r0)
(16/16) Installing .build-deps (0)
Executing busybox-1.27.2-r7.trigger
OK: 102 MiB in 73 packages
+ wget -q http://www.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
+ wget -q http://www.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz.asc
+ mktemp -d
+ export GNUPGHOME=/tmp/tmp.JaNImj
+ gpg --keyserver ha.pool.sks-keyservers.net --recv-key C823E3E5B12AF29C67F81976F5CECB3CB5E9BD2D
gpg: keybox '/tmp/tmp.JaNImj/pubring.kbx' created
gpg: /tmp/tmp.JaNImj/trustdb.gpg: trustdb created
gpg: key F5CECB3CB5E9BD2D: public key "Rakesh Radhakrishnan (CODE SIGNING KEY) <[email protected]>" imported
gpg: Total number processed: 1
gpg: imported: 1
+ gpg --batch --verify zookeeper-3.4.10.tar.gz.asc zookeeper-3.4.10.tar.gz
gpg: Signature made Thu Mar 23 11:45:03 2017 UTC
gpg: using RSA key F5CECB3CB5E9BD2D
gpg: BAD signature from "Rakesh Radhakrishnan (CODE SIGNING KEY) <[email protected]>" [unknown]
The command '/bin/sh -c set -x && apk add --no-cache --virtual .build-deps gnupg && wget -q "http://www.apache.org/dist/zookeeper/$DISTRO_NAME/$DISTRO_NAME.tar.gz" && wget -q "http://www.apache.org/dist/zookeeper/$DISTRO_NAME/$DISTRO_NAME.tar.gz.asc" && export GNUPGHOME="$(mktemp -d)" && gpg --keyserver ha.pool.sks-keyservers.net --recv-key "$GPG_KEY" && gpg --batch --verify "$DISTRO_NAME.tar.gz.asc" "$DISTRO_NAME.tar.gz" && tar -xzf "$DISTRO_NAME.tar.gz" && rm -r "$GNUPGHOME" "$DISTRO_NAME.tar.gz" "$DISTRO_NAME.tar.gz.asc" && apk del .build-deps' returned a non-zero code: 1
make: *** [docker] Error 1
Hi There,
I'm trying to spin up the zookeeper service with three replicas and the service is only deploying two with the third throwing errors for not being able to mount the volume. I've confirmed the ebs volume was created and available. I deleted the service and terminated the bad node, tried again once the ASG spun a new one up and redeployed the zookeeper service... same error happened.
Please let me know if there's any more info i can provide to help identify where the issue is happening and if it's something i need to change on my end. I'm using the normal cloud formation template in aws with three nodes, one in each of my defined three availability zones.
Thank you
E0315 18:24:34.022496 6 volume.go:592] findIdleMember error InternalError requuid 10.0.43.217-931a5f81f9ce40ae5bc0ccde07a8747c-1521138273 service &{931a5f81f9ce40ae5bc0ccde07a8747c ACTIVE 1521136846996011291 3 firecamp-stage firecamp-stage-zookeeper {/dev/xvdg {gp2 10 100 false} { 0 0 false}} true firecamp-stage-firecamp.com /hostedzone/Z1826MR4G8CQU6 false 0xc4202b5800 {0 256 0 4096} }
E0315 18:24:34.022513 6 volume.go:546] Mount failed, get service member error InternalError, serviceUUID 931a5f81f9ce40ae5bc0ccde07a8747c, requuid 10.0.43.217-931a5f81f9ce40ae5bc0ccde07a8747c-1521138273
2018-03-15T18:29:55Z [INFO] TaskHandler: batching container event: arn:aws:ecs:us-east-1:050772179124:task/4e24694b-02f9-46fe-9714-e4cce7f7a900 firecamp-stage-firecamp-stage-zookeeper-container -> STOPPED, Reason CannotStartContainerError: API error (500): error while mounting volume '/var/lib/docker/plugins/0bb436c154f10d5a0318180d992dfaf0f66dec1cbd8e1d83a8fb1888e8e3ccf1/rootfs': VolumeDriver.Mount: Mount failed, get service member error InternalError, serviceUUID 931a5f81f9ce40ae5bc0ccde07a8747c, requuid 10.0.43.217-931a5f81f9ce40ae5bc0ccde07a8747c-1521138595
, Known Sent: NONE
2018-03-15T18:29:55Z [INFO] TaskHandler: Adding event: TaskChange: [arn:aws:ecs:us-east-1:050772179124:task/4e24694b-02f9-46fe-9714-e4cce7f7a900 -> STOPPED, Known Sent: NONE, PullStartedAt: 2018-03-15 18:29:55.284603066 +0000 UTC, PullStoppedAt: 2018-03-15 18:29:55.39755933 +0000 UTC, ExecutionStoppedAt: 2018-03-15 18:29:55.604614019 +0000 UTC, arn:aws:ecs:us-east-1:050772179124:task/4e24694b-02f9-46fe-9714-e4cce7f7a900 firecamp-stage-firecamp-stage-zookeeper-container -> STOPPED, Reason CannotStartContainerError: API error (500): error while mounting volume '/var/lib/docker/plugins/0bb436c154f10d5a0318180d992dfaf0f66dec1cbd8e1d83a8fb1888e8e3ccf1/rootfs': VolumeDriver.Mount: Mount failed, get service member error InternalError, serviceUUID 931a5f81f9ce40ae5bc0ccde07a8747c, requuid 10.0.43.217-931a5f81f9ce40ae5bc0ccde07a8747c-1521138595
# ./firecamp-service-cli -op=create-service -service-type=cassandra -region=us-east-1 -cluster=test-fc -service-name=cass-test-fc -replicas=1 -volume-size=10 -journal-volume-size=1
create cassandra service error EOF
Cassandra service is starting though w/o issues. No such issues with Zookeeper.
The cli and manageserver are the latest.
Please, fix
EC2 instance has failed (AWS issue), so the firecamp's ASG has it terminated and fired up another one. After it came up, no tasks were able to start up. The error in the AWS ECS console is:
Status reason | CannotStartContainerError: API error (500): error while mounting volume '/var/lib/docker/plugins/2ec1ac405b2314e7a06c414ab0323a74187b49f1b9e9d7dcefb670bff13f599d/rootfs': VolumeDriver.Mount: Mount failed, get service member error Timeout, serviceUUID d5cc
Fortunately, after some time (~37 minutes) the tasks have been started w/o any interactions from my side.
Not sure, but the reason might be connected with long time the failed instance were in shutting-down state.
It fails with:
$ pwd
/home/user/go/src/github.com/cloudstax/firecamp/catalog/kafkamanager/1.3.3/dockerfile
$ docker build -t 111111111111.dkr.ecr.us-east-1.amazonaws.com/firecamp-kafka-manager:1.3.3-1.0 .
Sending build context to Docker daemon 11.57MB
Step 1/10 : FROM debian:jessie-backports
---> 3c66f9166174
...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: FAILED DOWNLOADS ::
[warn] :: ^ see resolution messages for details ^ ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.xerial.snappy#snappy-java;1.1.7.1!snappy-java.jar
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[info] Wrote /tmp/kafka-manager/target/scala-2.11/kafka-manager_2.11-1.3.3.18.pom
sbt.ResolveException: download failed: org.xerial.snappy#snappy-java;1.1.7.1!snappy-java.jar'
...
Evert time it fails to download different files.
$ PREFIX=0.9.6
$ $PREFIX/firecamp-service-cli -op=create-service -service-type=kafka -region=us-east-1 -cluster=firecamp-$ENV -replicas=3 -volume-size=$VOLSZ -service-name=kafka-$ENV -kafka-zk-service=zoo-$ENV -kafka-heap-size=512
The Kafka heap size is less than 6144. Please increase it for production system
2018-05-18 11:00:12.399656608 +0000 UTC The kafka service is created, jmx user password
...
I was trying to build firecamp from sources and got
$ make
./scripts/install.sh
+ protoc -I db/controldb/protocols/ db/controldb/protocols/controldb.proto --go_out=plugins=grpc:db/controldb/protocols
+ cd syssvc/firecamp-controldb
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/firecamp-dockervolume
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/firecamp-dockerlog
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/firecamp-manageserver
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/firecamp-service-cli
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/firecamp-swarminit
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd /home/user/go/bin
+ tar -zcf firecamp-service-cli.tgz firecamp-service-cli
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd /home/user/go/bin
+ tar -zcf firecamp-swarminit.tgz firecamp-swarminit
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd containersvc/k8s/firecamp-initcontainer/
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd containersvc/k8s/firecamp-stopcontainer/
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/tools/firecamp-volume-replace
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd /home/user/go/bin
+ tar -zcf firecamp-volume-replace.tgz firecamp-volume-replace
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/examples/firecamp-cleanup
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
+ cd syssvc/examples/firecamp-service-creation-example
+ go install
+ cd -
/home/user/go/src/github.com/cloudstax/firecamp
./scripts/builddocker.sh latest all
+ set -e
++ pwd
+ export TOPWD=/home/user/go/src/github.com/cloudstax/firecamp
+ TOPWD=/home/user/go/src/github.com/cloudstax/firecamp
+ version=latest
+ buildtarget=all
+ org=cloudstax/
+ system=firecamp
+ '[' all = all ']'
+ BuildPlugin
+ path=/home/user/go/src/github.com/cloudstax/firecamp/scripts/plugin-dockerfile
+ target=firecamp-pluginbuild
+ image=cloudstax/firecamp-pluginbuild
+ echo '### docker build: builder image'
### docker build: builder image
+ docker build -q -t cloudstax/firecamp-pluginbuild /home/user/go/src/github.com/cloudstax/firecamp/scripts/plugin-dockerfile
sha256:3012744b0ef7ec940803657b6e665b201f2c01395bf3d76af7248ca8cd25aca2
+ echo '### docker run: builder image with source code dir mounted'
### docker run: builder image with source code dir mounted
+ containername=firecamp-buildtest
+ docker rm firecamp-buildtest
Error: No such container: firecamp-buildtest
+ true
+ docker run --name firecamp-buildtest -v /home/user/go/src/github.com/cloudstax/firecamp:/go/src/github.com/cloudstax/firecamp cloudstax/firecamp-pluginbuild
total 4
drwxr-xr-x 1 root root 22 Jan 30 10:22 .
drwxr-xr-x 1 root root 23 Jan 30 10:21 ..
drwxr-xr-x 19 556 500 4096 Jan 30 10:19 firecamp
build firecamp-dockervolume
build firecamp-dockerlog
firecamp-dockerlog
firecamp-dockervolume
+ volumePluginPath=/home/user/go/src/github.com/cloudstax/firecamp/syssvc/firecamp-dockervolume/dockerfile
+ volumePluginImage=cloudstax/firecamp-volume
+ echo '### docker build: rootfs image with firecamp-dockervolume'
### docker build: rootfs image with firecamp-dockervolume
+ docker cp firecamp-buildtest:/go/bin/firecamp-dockervolume /home/user/go/src/github.com/cloudstax/firecamp/syssvc/firecamp-dockervolume/dockerfile
+ docker build -q -t cloudstax/firecamp-volume:rootfs /home/user/go/src/github.com/cloudstax/firecamp/syssvc/firecamp-dockervolume/dockerfile
sha256:36cd3e91de833750a4c2c7174e32adee196625134ae16c1a73b390b99a036be0
+ rm -f /home/user/go/src/github.com/cloudstax/firecamp/syssvc/firecamp-dockervolume/dockerfile/firecamp-dockervolume
+ echo '### create the plugin rootfs directory'
### create the plugin rootfs directory
+ volumePluginBuildPath=/home/user/go/src/github.com/cloudstax/firecamp/build/volumeplugin
+ mkdir -p /home/user/go/src/github.com/cloudstax/firecamp/build/volumeplugin/rootfs
+ docker rm -vf tmp
Error: No such container: tmp
+ true
+ docker create --name tmp cloudstax/firecamp-volume:rootfs
a9f683af85aa3d35c5fdd703c8b4fb463ce693a596c367a205f762943ee5752d
+ tar -x -C /home/user/go/src/github.com/cloudstax/firecamp/build/volumeplugin/rootfs
+ docker export tmp
+ cp /home/user/go/src/github.com/cloudstax/firecamp/syssvc/firecamp-dockervolume/config.json /home/user/go/src/github.com/cloudstax/firecamp/build/volumeplugin
+ docker rm -vf tmp
tmp
+ echo '### create new plugin cloudstax/firecamp-volume:latest'
### create new plugin cloudstax/firecamp-volume:latest
+ docker plugin rm -f cloudstax/firecamp-volume:latest
Error: No such plugin: cloudstax/firecamp-volume:latest
+ true
+ docker plugin create cloudstax/firecamp-volume:latest /home/user/go/src/github.com/cloudstax/firecamp/build/volumeplugin
cloudstax/firecamp-volume:latest
+ docker plugin push cloudstax/firecamp-volume:latest
The push refers to repository [docker.io/cloudstax/firecamp-volume]
01ca22324601: Preparing
denied: requested access to the resource is denied
make: *** [docker] Error 1
Let's add these keys to JVM settings for all services:
-XX:+PrintCommandLineFlags -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
Thus we wouldn't need to limit the java heap size, it will be done automatically by reading container limits.
1. Build your own ecs agent docker image.
Checkout cloudstax amazon-ecs-agent branch, git clone https://github.com/cloudstax/amazon-ecs-agent.git, and change the "org" in Makefile and agent/engine/f
irecamp_task_engine.go to "mydockeraccount/". Then simply 'make' to build and upload the docker image.
There's no "org" defined at all:
$ pwd
/home/user/go/src/github.com/cloudstax/amazon-ecs-agent
$ grep org Makefile agent/engine/firecamp_task_engine.go
Makefile: go get golang.org/x/tools/cmd/cover
Makefile: go get golang.org/x/tools/cmd/goimports
$ grep cloudstax -r *
agent/engine/firecamp_task_engine.go:// Define here again to avoid the dependency on githut.com/cloudstax/firecamp
agent/engine/firecamp_task_engine.go: volumeDriver = "cloudstax/firecamp-volume"
agent/engine/firecamp_task_engine.go: logDriver = "cloudstax/firecamp-log"
Makefile: @docker build -f scripts/dockerfiles/Dockerfile.release -t "cloudstax/firecamp-amazon-ecs-agent:latest" .
Makefile: @echo "Built Docker image \"cloudstax/firecamp-amazon-ecs-agent:latest\""
$ wget https://s3.amazonaws.com/cloudstax/firecamp/releases/0.9.6/packages/firecamp-volume-replace.tgz --2018-05-17 16:36:13-- https://s3.amazonaws.com/cloudstax/firecamp/releases/0.9.6/packages/firecamp-volume-replace.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.134.53
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.134.53|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2018-05-17 16:36:13 ERROR 403: Forbidden.
Please, upload!
I am likely missing something, however I am getting a failed stack build for these errors:
Embedded stack arn:aws:cloudformation:xxxx/CloudStax-FireCamp-VPCStack-xxxx/xxxxxx was not successfully created: The following resource(s) failed to create: [NATGateway3, NATGateway2, NATGateway1].
I am evaluating this for potential use for us, however I am not able to get passed the cloud formation stackup. if this isn't a quick fix, is there a better solution for installing your software on linux?
thank you
Just would like to discuss ideas on the best way to have that implemented. I thought to integrate Netflix's Priam, but it doesn't seem to work as a backup/restore solution only.
Another cool tool is https://github.com/pearsontechnology/cassandra_snap. However, it needs ssh access to each instance and requires to enlist all nodes to take backup from, rather than figure that out automatically.
What are your thoughts?
firecamp-service-cli version is 0.9.2
# ./firecamp-service-cli -cluster=firecamp-qa -region=us-east-1 -service-name=kafka-qa -op=get-service
{ServiceUUID:ae9a07f638c145866458232d81edbead ServiceStatus:ACTIVE LastModified:1513076088004634004 Replicas:3 ClusterName:firecamp-qa ServiceName:kafka-qa Volumes:{PrimaryDeviceName:/dev/xvdm PrimaryVolume:{VolumeType:gp2 VolumeSizeGB:100 Iops:100} JournalDeviceName: JournalVolume:{VolumeType: VolumeSizeGB:0 Iops:0}} RegisterDNS:true DomainName:firecamp-qa-firecamp.com HostedZoneID:/hostedzone/Z1OA04B9KUSH29 RequireStaticIP:false UserAttr:<nil> Resource:{MaxCPUUnits:0 ReserveCPUUnits:0 MaxMemMB:0 ReserveMemMB:0}}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6a8efe]
goroutine 1 [running]:
main.getService(0x7f36078f80e0, 0xc420014410, 0xc4200fbda0)
/home/junius/work/go/src/github.com/cloudstax/firecamp/syssvc/firecamp-service-cli/main.go:1472 +0x28e
main.main()
/home/junius/work/go/src/github.com/cloudstax/firecamp/syssvc/firecamp-service-cli/main.go:526 +0xc04
Hello,
Is it possible to deploy a FireCamp cluster into an existing VPC?
Thanks
Cal
Created new firecamp cluster from scratch (used firecamp.template). Tried to start up kafkamanager service:
./firecamp-service-cli -cluster=firecamp-qa -region=us-east-1 -op=create-service -service-type=kafkamanager -service-name=kafkamanager-qa -km-heap-size=512 -km-zk-service=zoo-qa -km-user=user -km-passwd=pass
The Kafka Manager heap size is less than 4096. Please increase it for production system
2018-03-05 15:18:48.889010183 +0000 UTC The kafka manager service is created, wait for all containers running
2018-03-05 15:18:48.929875163 +0000 UTC wait the service containers running, RunningCount 0
...
2018-03-05 15:23:50.807640377 +0000 UTC not all service containers are running after 5m0s
firecamp-managesever log:
I0305 15:18:48.631132 1 route53.go:146] find hosted zone /hostedzone/ZL36HKC3OHITW for domain firecamp-qa-firecamp.com vpc vpc-d44e1eb1 us-east-1 requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.631143 1 route53.go:58] get hostedZoneID /hostedzone/ZL36HKC3OHITW for domain firecamp-qa-firecamp.com vpc vpc-d44e1eb1 us-east-1 requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.631154 1 service.go:135] get hostedZoneID /hostedzone/ZL36HKC3OHITW for domain firecamp-qa-firecamp.com vpc vpc-d44e1eb1 requuid req-649f1bce27b440564fceda5f5d983ae6 &{us-east-1 firecamp-qa kafkamanager-qa stateless}
I0305 15:18:48.638979 1 dynamodb_service.go:44] created service &{firecamp-qa kafkamanager-qa 4615cc0394144d224729850cfe4db686} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.639002 1 service.go:695] created service &{firecamp-qa kafkamanager-qa 4615cc0394144d224729850cfe4db686} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.644111 1 dynamodb_serviceattr.go:107] created service attr &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.644147 1 service.go:798] created service attr in db &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.644197 1 service.go:166] created service attr, requuid req-649f1bce27b440564fceda5f5d983ae6 &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless}
I0305 15:18:48.644241 1 dynamodb_serviceattr.go:144] update service status from CREATING to INITIALIZING requuid req-649f1bce27b440564fceda5f5d983ae6 &{4615cc0394144d224729850cfe4db686 INITIALIZING 1520263128644215963 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless}
I0305 15:18:48.649431 1 dynamodb_serviceattr.go:216] updated service attr &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} to &{4615cc0394144d224729850cfe4db686 INITIALIZING 1520263128644215963 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.649471 1 service.go:185] successfully created service, requuid req-649f1bce27b440564fceda5f5d983ae6 &{4615cc0394144d224729850cfe4db686 INITIALIZING 1520263128644215963 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless}
I0305 15:18:48.701299 1 cloudwatch.go:152] created log group firecamp-qa-kafkamanager-qa-4615cc0394144d224729850cfe4db686 requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.734275 1 ecs.go:294] service is inactive kafkamanager-qa cluster firecamp-qa
I0305 15:18:48.760506 1 ecs.go:341] ListTaskDefinitionFamilies prefix firecamp-qa-kafkamanager-qa token <nil> resp {
Families: ["firecamp-qa-kafkamanager-qa"]
}
ECS console displays this error:
Status reason | CannotStartContainerError: API error (500): failed to initialize logging driver: ResourceNotFoundException: The specified log group does not exist. status code: 400, request id: ed451ab0-2088-11e8-a5e4-6f3c66053865
Looks like the service is still trying to create a log group in the outdated format:
"requestParameters": { "logGroupName": "firecamp-firecamp-qa-kafkamanager-qa-b5291cb97e624299744ef6d9b9ce5ad9", "logStreamName": "kafkamanager-qa/firecamp-qa-kafkamanager-qa-container/544c5f99-e1e9-46b1-b50d-fe05a91aaaf7" },
Hi. Please, implement volume encryption at rest for AWS environment. Not sure if journal volumes should be encrypted. Probably not unless they contain sensitive data.
I'm sorry for bothering you, but this 0.9.2 release is a headache for me. Can you please check if you can start zookeeper in ECS with the following command:
# ./firecamp-service-cli -op=create-service -service-type=zookeeper -region=us-east-1 -cluster=firecamp-prod -service-name=zoo-prod -replicas=3 -volume-size=20 -zk-heap-size=512
I'm getting:
The ZooKeeper heap size is less than 4096. Please increase it for production system
The zookeeper service is created, wait for all containers running
wait the service containers running, RunningCount 0
...
wait the service containers running, RunningCount 1
not all service containers are running after 120
And finally have one zookeeper container running only.
Service events show:
85171af8-094f-48c8-95c1-8ddc1406cfd3
2018-01-25 20:51:55 +0300
service zoo-prod was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 986e672b-838a-4215-94c0-1ae8d8cf783b encountered error "memberOf constraint unsatisfied". For more information, see the Troubleshooting section.
cec4fbb1-b192-48d1-8747-a34c160a8481
2018-01-25 20:51:42 +0300
service zoo-prod has started 1 tasks: task d4e9c873-4629-4688-8b5b-9f8b1fcda874.
Firecamp log ends up with:
...
I0125 17:54:39.218207 1 server.go:688] get service status &{1 3} requuid req-722d99f1ef0c470c463dee0fe2e1dfea &{us-east-1 firecamp-prod zoo-prod}
I0125 17:54:44.219742 1 server.go:105] request Method GET URL /?Get-Service-Status ?Get-Service-Status Host firecamp-manageserver.firecamp-prod-firecamp.com:27040 requuid req-9a1353ee054041c96c770a55a24813c3 headers map[Accept-Encoding:[gzip] User-Agent:[Go-http-client/1.1] Content-Length:[73]]
I0125 17:54:44.236612 1 ecs.go:759] service zoo-prod has 1 running containers, desired 3
I0125 17:54:44.236634 1 server.go:688] get service status &{1 3} requuid req-9a1353ee054041c96c770a55a24813c3 &{us-east-1 firecamp-prod zoo-prod}
I0125 17:54:49.238279 1 server.go:105] request Method GET URL /?Get-Service-Status ?Get-Service-Status Host firecamp-manageserver.firecamp-prod-firecamp.com:27040 requuid req-97454a904f2b4c8161b8cf499e72d06a headers map[User-Agent:[Go-http-client/1.1] Content-Length:[73] Accept-Encoding:[gzip]]
I0125 17:54:49.256414 1 ecs.go:759] service zoo-prod has 1 running containers, desired 3
I0125 17:54:49.256441 1 server.go:688] get service status &{1 3} requuid req-97454a904f2b4c8161b8cf499e72d06a &{us-east-1 firecamp-prod zoo-prod}
Any ideas what's going on?
Would like to open discussion on this topic. Some suggestions:
Please, share your thoughts.
http://jolokia.org/agent/jvm.html
https://www.datadoghq.com/blog/how-to-monitor-cassandra-performance-metrics/
http://cassandra.apache.org/doc/latest/operating/metrics.html
I1124 15:39:09.383539 1 route53.go:133] zone is not for domain FireCamp-UAT-firecamp.com zone {
CallerReference: "FireCamp-Route53H-1TC3D958X5KBG",
Config: {
PrivateZone: true
},
Id: "/hostedzone/ABCD",
Name: "firecamp-uat-firecamp.com.",
ResourceRecordSetCount: 2
} requuid
E1124 15:39:09.383573 1 route53.go:52] CreateHostedZone error DomainNotFound domain FireCamp-UAT-firecamp.com vpc vpc-0000000 us-east-1 requuid
E1124 15:39:09.383587 1 server_start.go:49] GetOrCreateHostedZoneIDByName error DomainNotFound domain FireCamp-UAT-firecamp.com vpcID vpc-000000
F1124 15:39:09.383596 1 main.go:171] StartServer error DomainNotFound
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.