Giter Site home page Giter Site logo

mesosphere-backup / cassandra-mesos-deprecated Goto Github PK

View Code? Open in Web Editor NEW
184.0 160.0 68.0 1.65 MB

[DEPRECATED] This project is deprecated. It will be archived on December 1, 2017.

Home Page: https://github.com/mesosphere/mesosphere/dcos-commons/frameworks/cassandra.

License: Apache License 2.0

Java 96.50% Shell 1.89% FreeMarker 0.12% Ruby 0.28% HTML 0.57% CSS 0.64%

cassandra-mesos-deprecated's Introduction

DEPRECATED This project has been replaced by DC/OS Cassandra Service https://github.com/mesosphere/mesosphere/dcos-commons/frameworks/cassandra.

Cassandra Mesos Framework

------------

DISCLAIMER This is a very early version of Cassandra-Mesos framework. This document, code behavior, and anything else may change without notice and/or break older installations.


Documentation

Cassandra-Mesos documentation is available on the Cassandra-Mesos GitHub pages site.

Contributing

We heartily welcome external contributions to Cassandra-Mesos's documentation. Documentation should be committed to the master branch and published to our GitHub pages site using the instructions in docs/README.md.

Design

The design document outlining the features and characteristics being targeted by the Cassandra Mesos Framework can be found at the index of the docs.

Current Status

Implemented

  • The framework can register with Mesos, providing a failover timeout so that if the framework disconnects from Mesos tasks will continue to run.
  • The number of nodes, amount of resources (cpu, ram, disk and ports) are all configurable and evaluated when resources offers from Mesos are taken into consideration.
  • cassandra.yaml and varaibles for cassandra-env.sh are provided by the scheduler as part of the task definition.
  • Health checks are performed by the executor and results are sent back to the scheduler using messaging mechanisms provided by Mesos.
  • The Framework can restart and reregister with mesos without killing tasks.
  • The scheduler can send tasks to nodes to perform 'nodetool repair'
  • The scheduler can send tasks to nodes to perform 'nodetool cleanup'
  • The Framework can easily be launched by Marathon allowing for easy installation
  • Repair Job coordination
  • Cleanup Job coordination
  • Replace Node
  • Rolling restart
  • Improved heap calculation to allow for memory mapped files

Near Term Tasks

  • Integration tests
  • Create stress tests to try and simulate real world workloads and to identify bugs in fault tolerance handling

Running the Framework

Currently the recommended way to run the Cassandra-Mesos Framework is via Marathon. A marathon.json from the latest build can be found here.

Once you've downloaded the marathon.json update the MESOS_ZK URL and any other parameters you would like to change. Then POST the marathon.json to your marathon instance and the framework will boostrap itself.

Mesos Node Configuration

You will need to expand the port range managed by Mesos on each node so that it includes the standard Cassandra ports.

This can be done by passing the following flag to the mesos-slave process:

--resources='ports:[31000-32000,7000-7001,7199-7199,9042-9042,9160-9160]'

Configuration

All configuration is handled through environment variables (this lends itself well to being easy to configure marathon to run the framework).

Framework Runtime Configuration

The following environment variables can be used to bootstrap the configuration of the framework. After first run, configuration is read from the framework state in Zookeeper to be consistent across restarts.

# name of the cassandra cluster, this will be part of the framework name in Mesos
CASSANDRA_CLUSTER_NAME=dev-cluster

# Mesos ZooKeeper URL to locate leading master
MESOS_ZK=zk://localhost:2181/mesos

# ZooKeeper URL to be used to store framework state
CASSANDRA_ZK=zk://localhost:2181/cassandra-mesos

# The number of nodes in the cluster (default 3)
CASSANDRA_NODE_COUNT=3

# The number of seed nodes in the cluster (default 2)
# set this to 1, if you only want to spawn one node
CASSANDRA_SEED_COUNT=2

# The number of CPU Cores for each Cassandra Node (default 2.0)
CASSANDRA_RESOURCE_CPU_CORES=2.0

# The number of Megabytes of RAM for each Cassandra Node (default 2048)
CASSANDRA_RESOURCE_MEM_MB=2048

# The number of Megabytes of Disk for each Cassandra Node (default 2048)
CASSANDRA_RESOURCE_DISK_MB=2048

# The number of seconds between each health check of the cassandra node (default 60)
CASSANDRA_HEALTH_CHECK_INTERVAL_SECONDS=60

# The default bootstrap grace time - the minimum interval between two node starts
# You may set this to a lower value in pure local development environments.
CASSANDRA_BOOTSTRAP_GRACE_TIME_SECONDS=120

# The number of seconds that should be used as the mesos framework timeout (default 604800 seconds / 7 days)
CASSANDRA_FAILOVER_TIMEOUT_SECONDS=604800

# The mesos role to used to reserve resources (default *). If this is set, the framework accepts offers that have resources for that role or the default role *
CASSANDRA_FRAMEWORK_MESOS_ROLE=*

# A pre-defined data directory specifying where cassandra should write it's data. 
# Ensure that this directory can be created by the user the framework is running as (default . [mesos sandbox]).
# NOTE:
# This field will be removed once MESOS-1554 is released and the framework will
# be able to allocate the data volume itself.
CASSANDRA_DATA_DIRECTORY=.

System configuration

Cassandra requires some operating system settings. The recommended production settings are described in on the page [Cassandra 2.1 recommended production settings] - please follow this guideline seriously for the operating system user running Cassandra.

Cassandra memory usage

Memory used by Cassandra can be roughly categorized into:

  • Java heap memory. The amount of memory used by the Java VM for heap memory.
  • Off heap memory. Off heap is used for several reasons by Cassandra:
    • index-summary (default: 5% of the heap size) configured in cassandra.yaml - see index_summary_capacity_in_mb default to 5% of the heap size (may exceed)
    • key-cache (default: 5% of the heap size) configured in cassandra.yaml - see key_cache_size_in_mb default to 5% of the heap size
    • row-cache (default: off) configured in cassandra.yaml - see row_cache_size_in_mb (must be explicitly enabled in taskEnv) default to 0
    • counter-cache (default: min(2.5% of Heap (in MB), 50MB)) configured in cassandra.yaml - see counter_cache_size_in_mb default: min(2.5% of Heap (in MB), 50MB) ; 0 means no cache
    • memtables (default on-heap) configured in cassandra.yaml - see file_cache_size_in_mb default to the smaller of 1/4 of heap or 512MB
    • file-cache (default: min(25% of Heap (in MB), 512MB)) configured in cassandra.yaml - see file_cache_size_in_mb default to the smaller of 1/4 of heap or 512MB
    • overhead during flushes/compactions/cleanup implicitly defined by workload
  • OS buffer cache. The amount of (provisioned) memory reserved for the operating system for disk block buffers.

The default configuration simply assumes that you need as much off-heap memory than Java heap memory. It basically divides the provisioned amount of memory by 2 and assigns it to the Java heap.

A good planned production system is sized to meet its workload requirements. That does mean proper values for Cassandra process environment, cassandra.yaml and memory sizing.

You should not run Cassandra (even in test environments) with less than 4 GB configured in memMb. A recommended minimum value for memMb is 16GB. In times where RAM is getting cheaper, provision as much as you can afford - with 8 to 16 GB for memJavaHeapMb. Remember to figure out the really required numbers in load and stress tests with your application.

Rest API

See the Rest API Doc

Build

The Cassandra Mesos Framework is a maven project with modules for the Framework, Scheduler, Executor and Model. Standard maven convention applies. The Framework and Executor are both built as jar-with-dependencies in addition to their standalone jar, so that they are easy to run and distribute.

Install Maven

The Cassandra Mesos Framework requires an install of Maven 3.2.x.

Setup Maven toolchain for protoc

  1. Download version 2.5.0 of protobuf here

  2. Install

  3. Linux (make sure g++ compiler is installed) 1. Run the following commands to build protobuf

    ```
    tar xzf protobuf-2.5.0.tar.gz
    cd protobuf-2.5.0
    ./configure
    make
    ```
    
  4. Create ~/.m2/toolchains.xml with the following contents, Update PROTOBUF_HOME to match the directory you ran make in

<?xml version="1.0" encoding="UTF-8"?>
<toolchains>
  <toolchain>
    <type>protobuf</type>
    <provides>
      <version>2.5.0</version>
    </provides>
    <configuration>
      <protocExecutable>$PROTOBUF_HOME/src/protoc</protocExecutable>
    </configuration>
  </toolchain>
</toolchains>

Resources

Running unit tests

mvn clean test

Packaging artifacts

mvn clean package

If you want to skip running tests when developing locally and rebuilding the packages run the following:

mvn -Dmaven.test.skip=true package

Framework Package

There is a packaging script package.bash that can be used to package the framework and create a marathon.json to run the framework on Marathon

./package.bash package

Generating the marathon.json is dependent upon the great JSON command line tool jq. jq allows for accurate JSON document manipulation using the pipelineing functionality it provides. See package.bash for an example.

Development

For development of the Cassandra Framework you will need access to a Mesos Cluster (for help setting up a cluster see Setting up a Mesosphere Cluster).

The main class of the framework, io.mesosphere.mesos.frameworks.cassandra.framework.Main, can safely be ran from you IDE if that is your preferred development environment.

Run dev-run.bash to startup the framework. You should then be able to see tasks being launched in your Mesos UI.

Configuration

The following environment variables (with example values) should be specified for local development:

# The port the http server used for serving assets to tasks should use.
# In normal operations this dynamic port will be provided by Marathon as part of the task that
# will run the framework
## Any port will do, just so long as it can be bound on your dev machine and is accessible from
## the mesos slaves.
PORT0=18080

# The file path to where the cassandra-mesos-executor jar-with-dependencies is on the local file system
# This file will be served by the built-in http server so that tasks will be able to easily access
# the jar.
EXECUTOR_FILE_PATH=${PROJECT_DIR}/cassandra-mesos-executor/target/cassandra-mesos-executor-0.2.1-SNAPSHOT-jar-with-dependencies.jar

# The file path to where a tar of the Oracle JRE version 7 update 75 is on the local file system.
# This file will be served by the build-in http server so that tasks will be able to easily access
# the jre, and it doesn't have to be provided by the slave host.
JRE_FILE_PATH=${PROJECT_DIR}/target/framework-package/jdk.tar.gz

# The file path to where a tar of Apache Cassandra 2.1.4 is on the local file system.
# This file will be served by the build-in http server so that tasks will be able to easily access
# the cassandra server, and it doesn't have to be provided by the slave host.
CASSANDRA_FILE_PATH=${PROJECT_DIR}/target/framework-package/cassandra.tar.gz

Using Cassandra tools

Support for standard command line tools delivered with Apache Cassandra against clusters running on Apache Mesos is provided using the provided shell scripts starting with com-. These tools use the live nodes API discussed below.

These are:

  • com-cqlsh to invoke cqlsh without bothering about actual endpoints. It connects to any (random) live Cassandra node.
  • com-nodetool to invoke nodetool without bothering about actual endpoints. It connects to any (random) live Cassandra node.
  • com-stress to invoke cassandra-stress without bothering about actual endpoints. It connects to any (random) live Cassandra node.

All these tools are configured using environment variables and special command line options. These command line options must be specified directly after the command name.

Environment variables:

  • CASSANDRA_HOME path to where your local unpacked Apache Cassandra distribution lives. Defaults to .
  • API_HOST host name where the Cassandra-Mesos scheduler is running. Defaults to 127.0.0.1
  • API_PORT port on which the Cassandra-Mesos scheduler is listening. Defaults to 18080

Command line options:

  • --limit N the number of live nodes to use. Has no effect for cqlsh or nodetool.

Important security notice

CVE-2015-0225 describes a security vulnerability in Cassandra, which allows an attacker to execute arbitrary code via JMX/RMI.

Some non-critical tools of Cassandra-Mesos framework rely on some functionality via JMX to be available remotely.

  1. com-nodetool (as nodetool itself) requires the JMX port to be open from outside.
  2. com-qa-report uses com-nodetool for some functionality, means, it requires the JMX port to be open from outside.

Do not open JMX port without authentication and SSL and proper firewall rules unless you know exactly what you are doing! Opening the JMX port will make your Cassandra nodes vulnerable to the security risk. If you are really sure that you explicitly want to expose the JMX port, you can pass the environment variables CASSANDRA_JMX_LOCAL=false and CASSANDRA_JMX_NO_AUTHENTICATION=true to the framework upon initial invocation (i.e. when the framework first registers).

However CASSANDRA-9089 is meant to let JMX listen to a specific IP address, but is is not included in Cassandra 2.1.4.

References:

Resources

Cassandra 2.1 recommended production settings

cassandra-mesos-deprecated's People

Contributors

benwhitehead avatar dmitrypekar avatar elingg avatar flosell avatar gabrielhartmann avatar joel-hamill avatar keithchambers avatar mohitsoni avatar pitluga avatar sascala avatar snazy avatar ssk2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra-mesos-deprecated's Issues

Error in startup of framework

Hi,

I tried the latest version of the cassandra-mesos framework. When I start the job with a marathon job I got following message in the cassandra-mesos.log : io.mesosphere.mesos.util.ProtoUtils$RuntimeInvalidProtocolBufferException: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: healthCheckIntervalSeconds, defaultConfigRole

I tried to add "CASSANDRA_HEALTH_CHECK_INTERVAL_SECONDS" : "60" and "CASSANDRA_FRAMEWORK_MESOS_ROLE": "*" environment variable to the marathon json file, but that didn't changed the behaviour.

Any clue what the issue could be ?

Regards,

Ben

Node instance is failing to come up after the reboot and scheduler restart

Here's the scenario that's not working for me:

  1. scaled the number of scheduler instances to 0 in Marathon
  2. scaled the number of scheduler instances to 1 in Marathon. Scheduler is now running on different mesos slave/port.
  3. restarted cassandra node host. Node is failing to start because it's trying to pull java from the location where the original scheduler used to reside.

Has anyone else seen this behavior? Are there any ways to fix this?

Thank you!

Trouble with starting Cassandra cluster

Hello,

I try to start Cassandra Mesos (0.2.0-1) on Messos framework(0.26.0) with Marathon (0.13.0) but there is problem with retrieval of the executor metadata and the scheduler declines the offers constantly:
https://gist.github.com/rvesselinov/31a2bae2e52f7d940682
Same happens also with cassandra-mesos 0.2.1-SNAPSHOT-608-master.

The Marathon task for Cassandra is with following configuration:
https://gist.github.com/rvesselinov/8c44ede4727652ed1041

These are cassandra-mesos logs:
cassandra-mesos.log.zip
And system output:
stdout.txt
stderr.txt

And this is the slaves state on which Mesos is running:
state.zip

Regards, Rado

Feature: Allow to configure minimal no of seed servers that will trigger start of ConfigServer

Currently noOfHWNodes is used as minimal set of servers that will trigger ConfigServer, however if we want to run big amount of instances, the start of ConfigServer may not happen, because MesosTasks will get killed (not running ConfigServer) and ConfigServer won't be run because not enough servers.

What do you think about that?

Having sth like

cassandra.noOfHwNodes: 100
cassandra.noOfSeedNodes: 10

would make sense?

I can do PR.

Mesos roles

Hi guys,

I'm just checking if Mesos roles will be implemented, so the scheduler and executors can accept different roles.

TASK_LOST message: "Task uses invalid resources: disk(*):0" - Resolved by a temp fix

I downloaded the marathon.json from the link on the README and revised MESOS_ZK and CASSANDRA_ZK with my zk location.

I have also revised my mesos slave nodes's resources to be: ports:[31000-32000,7000-7001,7199-7199,9042-9042,9160-9160]

After I submit the marathon.json, I got the TASK_LOST message below:
2015-03-25 16:25:09,559 TRACE [Thread-467 ] i.m.m.f.cassandra.CassandraScheduler - {taskId:cassandra.node.1.executor} < statusUpdate(driver : org.apache.mesos.MesosSchedulerDriver@30268da6, status : task_id { value: "cassandra.node.1.executor" } state: TASK_LOST message: "Task uses invalid resources: disk(*):0" slave_id { value: "20150305-041301-1120865222-5050-1076-S9" } timestamp: 1.427318772853009E9 source: SOURCE_MASTER reason: REASON_TASK_INVALID)

The Mesos version is: 0.21.1.

Any suggestions on what went wrong? And how to resolve it?

Really appreciate your help.

Yang Lei

GC overhead limit exceeded

Hi,
We're experiencing GC overhead limit exceeded every couple of days. Scheduler dies and it's not restarted by marathon.

So:
Mesos framework log:

    2015-09-11 09:53:33,589:6395(0x7f197eff5700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=master.mesos:2181 sessionTimeout=10000 watcher=0x7f19bd4d8860 sessionId=0 sessionPasswd=<null> context=0x7f19440015b0 flags=0
    2015-09-11 09:53:33,591:6395(0x7f1969ddd700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.0.137:2181]
    2015-09-11 09:53:33,594:6395(0x7f1969ddd700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.0.137:2181], sessionId=0xd84f3f334beb0051, negotiated timeout=10000
    I0911 09:53:33.594823  6480 group.cpp:313] Group process (group(1)@10.0.0.188:57079) connected to ZooKeeper
    I0911 09:53:33.594868  6480 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
    I0911 09:53:33.594898  6480 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
    I0911 09:53:33.596679  6483 detector.cpp:138] Detected a new leader: (id='126')
    I0911 09:53:33.596853  6470 group.cpp:659] Trying to get '/mesos/info_0000000126' in ZooKeeper
    I0911 09:53:33.597592  6477 detector.cpp:452] A new leading master ([email protected]:5050) is detected
    I0911 09:53:33.597689  6479 sched.cpp:254] New master detected at [email protected]:5050
    I0911 09:53:33.597853  6479 sched.cpp:264] No credentials provided. Attempting to register without authentication
    I0911 09:53:33.600744  6470 sched.cpp:448] Framework registered with 20150703-081340-2281701386-5050-48159-0000
    2015-09-11 17:18:48,022:6395(0x7f197dff3700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 12ms
    Exception in thread "Thread-162360" java.lang.OutOfMemoryError: GC overhead limit exceeded
            at java.lang.String.substring(Unknown Source)
            at java.lang.String.subSequence(Unknown Source)
            at com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526)
            at com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:334)
            at com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
            at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
            at com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248)
            at com.google.protobuf.TextFormat.print(TextFormat.java:71)
            at com.google.protobuf.TextFormat.printToString(TextFormat.java:118)
            at com.google.protobuf.AbstractMessage.toString(AbstractMessage.java:106)
            at java.lang.String.valueOf(Unknown Source)
            at java.lang.StringBuilder.append(Unknown Source)
            at java.util.AbstractCollection.toString(Unknown Source)
            at io.mesosphere.mesos.util.ProtoUtils.protoToString(ProtoUtils.java:50)
            at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraScheduler.resourceOffers(CassandraScheduler.java:112)
    I0914 05:07:29.821909  6475 sched.cpp:1623] Asked to abort the driver
    I0914 05:07:29.825697  6475 sched.cpp:856] Aborting framework '20150703-081340-2281701386-5050-48159-0000'
    I0914 05:07:30.358397  6396 sched.cpp:1589] Asked to stop the driver
    I0914 05:07:30.361979  6473 sched.cpp:831] Stopping framework '20150703-081340-2281701386-5050-48159-0000'

Then In marathon we're getting: (But it never get's scaled to 1)

[2015-09-14 05:07:30,415] INFO Need to scale /cassandra/cluster from 0 up to 1 instances (mesosphere.marathon.SchedulerActions:529)

The json for marathon is: (We only copied cassandra.tar.gz and java to local web server from downloads.mesosphere.io)

    {
      "id": "/cassandra/cluster",
      "instances": 1,
      "cpus": 0.5,
      "mem": 512,
      "ports": [10050],
      "uris": [
        "http://repo.marathon.mesos:9022/cassandra-mesos-0.2.0-1.tar.gz",
        "http://repo.marathon.mesos:9022/jre-7u76-linux-x64.tar.gz"
      ],
      "env": {
        "MESOS_ZK": "zk://master.mesos:2181/mesos",
        "JAVA_OPTS": "-Xms256m -Xmx256m",
        "CASSANDRA_CLUSTER_NAME": "kong-cluster",
        "CASSANDRA_ZK": "zk://master.mesos:2181/cassandra-mesos",
        "CASSANDRA_NODE_COUNT": "3",
        "CASSANDRA_RESOURCE_CPU_CORES": "2.0",
        "CASSANDRA_RESOURCE_MEM_MB": "2048",
        "CASSANDRA_RESOURCE_DISK_MB": "2048",
        "CASSANDRA_HEALTH_CHECK_INTERVAL_SECONDS": "60",
        "CASSANDRA_ZK_TIMEOUT_MS": "10000"
      },
      "cmd": "$(pwd)/jre*/bin/java $JAVA_OPTS -classpath cassandra-mesos-framework.jar io.mesosphere.mesos.frameworks.cassandra.framework.Main",
      "constraints": [
            ["net_video", "CLUSTER", "false"]
      ],
      "healthChecks": [
        {
          "gracePeriodSeconds": 120,
          "intervalSeconds": 30,
          "maxConsecutiveFailures": 0,
          "path": "/health/cluster",
          "portIndex": 0,
          "protocol": "HTTP",
          "timeoutSeconds": 5
        },
        {
          "gracePeriodSeconds": 120,
          "intervalSeconds": 30,
          "maxConsecutiveFailures": 3,
          "path": "/health/process",
          "portIndex": 0,
          "protocol": "HTTP",
          "timeoutSeconds": 5
        }
      ]
    }

To get it up and running again we had to manually restart the cassandra framework app in marathon.

Is this a Marathon problem or the framework ?
thanks.

Scheduler error java.util.concurrent.ExecutionException: Failed to create '/cassandraMesos' in ZooKeeper: no node

I cloned the master branch of cassandra-mesos today, built the tar ball but the scheduler gives the below error. Also do you have any tag, release, branch that reflects the working tar ball http://downloads.mesosphere.io/cassandra/cassandra-mesos-2.0.5-1.tgz ??
++++++++++
I0218 05:17:23.007205 31 group.cpp:659] Trying to get '/mesos/info_0000000000' in ZooKeeper
I0218 05:17:23.008177 31 detector.cpp:433] A new leading master ([email protected]:5050) is detected
I0218 05:17:23.008255 31 sched.cpp:234] New master detected at [email protected]:5050
I0218 05:17:23.008893 31 sched.cpp:242] No credentials provided. Attempting to register without authentication
I0218 05:17:23.013358 31 sched.cpp:408] Framework registered with 20150218-050543-20775178-5050-1-0002
212 [Thread-1] INFO mesosphere.cassandra.CassandraScheduler - Framework registered as 20150218-050543-20775178-5050-1-0002
Exception in thread "Thread-1" java.util.concurrent.ExecutionException: Failed to create '/cassandraMesos' in ZooKeeper: no node
at org.apache.mesos.state.AbstractState.__store_get(Native Method)
at org.apache.mesos.state.AbstractState.access$900(AbstractState.java:34)
at org.apache.mesos.state.AbstractState$2.get(AbstractState.java:103)
at org.apache.mesos.state.AbstractState$2.get(AbstractState.java:82)
at mesosphere.utils.StateStore.store(StateStore.scala:31)
at mesosphere.cassandra.CassandraScheduler.registered(CassandraScheduler.scala:238)
I0218 05:17:23.275414 31 sched.cpp:1320] Asked to abort the driver
I0218 05:17:23.275565 31 sched.cpp:777] Aborting framework '20150218-050543-20775178-5050-1-0002โ€™
++++++++

Cassandra log file missing on cassandra-mesos-2.1.2-1

Cassandra-mesos-2.1.2 build from master are working great, except the is no log file form Cassandra. Cassandra will emit the warning below.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/mesos/slaves/20141216-201216-1694607552-5050-3917-S4/frameworks/20141216-215902-1694607552-5050-6890-0000/executors/cassandra.58031e09-460a-4df6-8ff2-fbcf7a1da6bd/runs/17479497-99fb-432a-a947-77702c8200e8/cassandra-mesos-2.1.2-1/lib/cassandra-mesos-2.1.2-1-deps.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/tmp/mesos/slaves/20141216-201216-1694607552-5050-3917-S4/frameworks/20141216-215902-1694607552-5050-6890-0000/executors/cassandra.58031e09-460a-4df6-8ff2-fbcf7a1da6bd/runs/17479497-99fb-432a-a947-77702c8200e8/cassandra-mesos-2.1.2-1/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger     (org.apache.cassandra.service.CassandraDaemon).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

After deleting cassandra-mesos-2.1.2-1-deps.jar, cassandra binary works well and does emit log. However this will cause problem for cassandra-mesos.
Cassandra-mesos-2.0.5 does not have the same problem.

Thankyou
Wish everyone merry Christmas!

Cassandra tasks getting LOST status in mesos

I have deployed DCOS cluster and installed here cassandra and spark.
I'm running spark job on one of masters dcos spark run --submit-args='--class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.4.0-SNAPSHOT.jar 10' and after it's finishs execution few cassandra executors fail. In mesos it looks like this:

ID                                      Name                                              State Started Stopped Host
driver-20150714085321-0003              Driver for org.apache.spark.examples.SparkPi    FINISHED    2015-07-14T11:53:25+0300    2015-07-14T11:53:35+0300    dcos-slave-03.novalocal 
cassandra.dcos.node.0.executor.server   cassandra.dcos.node                                LOST 2015-07-14T11:30:45+0300    2015-07-14T11:53:40+0300    dcos-slave-01.novalocal 
cassandra.dcos.node.0.executor          cassandra.dcos.node.0.executor                 LOST 2015-07-14T11:30:43+0300    2015-07-14T11:53:40+0300    dcos-slave-01.novalocal

Spark job ran successfully: (stdout)

Registered executor on dcos-slave-02.novalocal
Starting task driver-20150714092001-0005
/bin/sh -c exit `docker wait mesos-80c81d68-6b06-47aa-92b2-70f908b09201` 
Forked command at 26770
Pi is roughly 3.141704
Command exited with status 0 (pid: 26770)

Some cassandra executors can't even stand up after this and keeps geting LOST status every few seconds with next stderr:

/opt/mesosphere/packages/mesos--5018921cbb873aea2a0db00a407d77a8de419f63/libexec/mesos/mesos-fetcher: /lib64/libcurl.so.4: no version information available (required by /opt/mesosphere/packages/mesos--5018921cbb873aea2a0db00a407d77a8de419f63/libexec/mesos/mesos-fetcher)
I0714 09:22:28.348204 27718 logging.cpp:172] INFO level logging started!
I0714 09:22:28.349845 27718 fetcher.cpp:214] Fetching URI 'http://<dcos-slave-01 IP address>:10000/jre-7-linux.tar.gz'
I0714 09:22:28.349892 27718 fetcher.cpp:125] Fetching URI 'http://<dcos-slave-01 IP address>:10000/jre-7-linux.tar.gz' with os::net
I0714 09:22:28.349912 27718 fetcher.cpp:135] Downloading 'http://<dcos-slave-01 IP address>:10000/jre-7-linux.tar.gz' to '/var/lib/mesos/slave/slaves/20150713-143901-787031981-5050-14056-S6/frameworks/20150713-143901-787031981-5050-14056-0008/executors/cassandra.dcos.node.2.executor/runs/ee26d1d9-9bac-4f1d-9cb2-cc566f5faa99/jre-7-linux.tar.gz'
E0714 09:22:28.350571 27718 fetcher.cpp:138] Error downloading resource: Couldn't connect to server
Failed to fetch: http://<dcos-slave-01 IP address>:10000/jre-7-linux.tar.gz
Failed to synchronize with slave (it's probably exited)

Can anyone help me with this?

ZOO_ERROR@handle_socket_error_msg Conexiรณn rehusada

MesosStoreTest:
2014-05-28 13:00:00,023:24157(0x613fdb40):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:2181] zk retcode=-4, errno=111(Conexiรณn rehusada): server refused to accept the client

Hi,
I tried to install cassandra-mesos, then I got this error, anyone could help me?

Can't get Cassandra-Mesos working with dev-run.bash

I'm following the instructions for running locally inside a vm that already has zookeeper/mesos/marathon/mesos-dns....

I have installed cassandra and git cloned the cassandra-mesos project. When I run dev-run.bash maven runs for a bit and then throws an error complaining about a missing toolchain:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-toolchains-plugin:1.1:toolchain (default) on project cassandra-mesos-model: Cannot find matching toolchain definitions for the following toolchain types:
[ERROR] protobuf [ version='2.5.0' ]

Any suggestions?

Support configurable logging level

Add support for runtime (re)configuring Cassandra log level to enable operators to troubleshoot Cassandra. i.e., reconfigure the log level from INFO to DEBUG

Because logging can impact Cassandra performance, as part of this feature we need to consider if the logging level will be a global setting (all nodes) or a per node setting.

More details on Cassandra logging and configuration are available here: http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configLoggingLevels_t.html

mvn package fails on Linux

Failed to load native Mesos library from /usr/local/lib/libmesos.dylib
*** RUN ABORTED ***
  java.lang.UnsatisfiedLinkError: Can't load library: /usr/local/lib/libmesos.dylib

Build Fails with mesosphere.mesos.util.* not found

Running build.sh from head, I get the following failure (Mac OS 10.9.3)

[ERROR] /Users/xxx/Developer/cassandra/cassandra-mesos/src/main/scala/mesosphere/cassandra/CassandraScheduler.scala:4: error: object ScalarResource is not a member of package mesosphere.mesos.util
[ERROR] import mesosphere.mesos.util.ScalarResource
[ERROR] ^
[ERROR] /Users/xxx/Developer/cassandra/cassandra-mesos/src/main/scala/mesosphere/cassandra/Logger.scala:3: error: object log4j is not a member of package org.apache
[ERROR] import org.apache.log4j.{Priority, Level, Logger}
[ERROR] ^
[ERROR] /Users/xxx/Developer/cassandra/cassandra-mesos/src/main/scala/mesosphere/cassandra/Logger.scala:12: error: not found: value Logger
[ERROR] lazy val logger = Logger.getLogger(getClass)
[ERROR] ^
[ERROR] /Users/xxx/Developer/cassandra/cassandra-mesos/src/main/scala/mesosphere/cassandra/CassandraScheduler.scala:138: error: not found: value ScalarResource
[ERROR] case (k, v) => ScalarResource(k, v).toProto
[ERROR] ^

Adjust resources(memory, cpu etc) allocated to each cassandra node after cassandra-mesos gets deployed.

It seems right now there is no way to adjust resources(memory, cpu etc) allocated to each cassandra node after cassandra-mesos gets deployed. Currently I have to tear down the whole cassandra-mesos framework and redeploy it in order to increase memory for each cassandra node. In this case, all the cassandra data will be gone.

A good example to follow will be the kafka-mesos framework. https://github.com/mesos/kafka#updating-broker-configurations.
Its rest api allow us to stop a node, update configuration (including memory/cpu etc.) and restart it.

Can't stand up Cassandra Ring with firewalld enabled on node that runs Cassandra Cluster

I'm trying to get the cassandra-mesos project to run in conjunction with a configured firewalld service. I can't seem to find a way to get the scheduler to work when firewalld is running. Here's what I'm seeing:

  • The entire range of mesos ports 31000-32000 are open in firewalld
  • When i start the cassandra cluster on the node i see the following logging repeating (every minute) in the mesos-cassandra.log file:

2015-10-30 07:52:56,671 INFO [main] o.g.g.http.server.NetworkListener - {} Started listener bound to [bodega-cnt7-247.auto4.labs.aspect.com:31252]
2015-10-30 07:52:56,680 INFO [main] o.g.grizzly.http.server.HttpServer - {} [HttpServer] Started.
2015-10-30 07:52:56,682 DEBUG [main] i.m.m.f.c.scheduler.SeedManager - {} Scheduling background syncing task to run every 60 seconds
2015-10-30 07:52:56,686 INFO [pool-1-thread-1] i.m.m.f.c.scheduler.SeedManager - {} Syncing seeds ...
2015-10-30 07:53:11,937 DEBUG [Grizzly-worker(2)] i.m.m.f.c.s.h.HealthReportService - {} > generateClusterHealthReport()
2015-10-30 07:53:12,035 TRACE [Grizzly-worker(2)] i.m.m.f.c.s.h.HealthReportService - {} < generateClusterHealthReport() = ClusterHealthReport{healthy=false, results=[ClusterHealthEvaluationResult{name='nodeCount', ok=false, expected=3, actual=0}, ClusterHealthEvaluationResult{name='seedCount', ok=false, expected=2, actual=0}, ClusterHealthEvaluationResult{name='allHealthy', ok=false, expected=[true, true, true], actual=[]}, ClusterHealthEvaluationResult{name='operatingModeNormal', ok=false, expected=[Optional.of(NORMAL), Optional.of(NORMAL), Optional.of(NORMAL)], actual=[]}, ClusterHealthEvaluationResult{name='lastHealthCheckNewerThan', ok=false, expected=[1446209291974, 1446209291974, 1446209291974], actual=[]}, ClusterHealthEvaluationResult{name='nodesHaveServerTask', ok=false, expected=[true, true, true], actual=[]}]}
2015-10-30 07:53:12,035 INFO [Grizzly-worker(2)] i.m.m.f.c.s.a.HealthCheckController - {} Cluster Health Report Generated. result: healthy = false
2015-10-30 07:53:41,144 DEBUG [Grizzly-worker(3)] i.m.m.f.c.s.h.HealthReportService - {} > generateClusterHealthReport()
2015-10-30 07:53:41,144 TRACE [Grizzly-worker(3)] i.m.m.f.c.s.h.HealthReportService - {} < generateClusterHealthReport() = ClusterHealthReport{healthy=false, results=[ClusterHealthEvaluationResult{name='nodeCount', ok=false, expected=3, actual=0}, ClusterHealthEvaluationResult{name='seedCount', ok=false, expected=2, actual=0}, ClusterHealthEvaluationResult{name='allHealthy', ok=false, expected=[true, true, true], actual=[]}, ClusterHealthEvaluationResult{name='operatingModeNormal', ok=false, expected=[Optional.of(NORMAL), Optional.of(NORMAL), Optional.of(NORMAL)], actual=[]}, ClusterHealthEvaluationResult{name='lastHealthCheckNewerThan', ok=false, expected=[1446209321144, 1446209321144, 1446209321144], actual=[]}, ClusterHealthEvaluationResult{name='nodesHaveServerTask', ok=false, expected=[true, true, true], actual=[]}]}
2015-10-30 07:53:41,144 INFO [Grizzly-worker(3)] i.m.m.f.c.s.a.HealthCheckController - {} Cluster Health Report Generated. result: healthy = false
2015-10-30 07:53:56,686 INFO [pool-1-thread-1] i.m.m.f.c.scheduler.SeedManager - {} Syncing seeds ...
2015-10-30 07:54:11,169 DEBUG [Grizzly-worker(1)] i.m.m.f.c.s.h.HealthReportService - {} > generateClusterHealthReport()
2015-10-30 07:54:11,170 TRACE [Grizzly-worker(1)] i.m.m.f.c.s.h.HealthReportService - {} < generateClusterHealthReport() = ClusterHealthReport{healthy=false, results=[ClusterHealthEvaluationResult{name='nodeCount', ok=false, expected=3, actual=0}, ClusterHealthEvaluationResult{name='seedCount', ok=false, expected=2, actual=0}, ClusterHealthEvaluationResult{name='allHealthy', ok=false, expected=[true, true, true], actual=[]}, ClusterHealthEvaluationResult{name='operatingModeNormal', ok=false, expected=[Optional.of(NORMAL), Optional.of(NORMAL), Optional.of(NORMAL)], actual=[]}, ClusterHealthEvaluationResult{name='lastHealthCheckNewerThan', ok=false, expected=[1446209351170, 1446209351170, 1446209351170], actual=[]}, ClusterHealthEvaluationResult{name='nodesHaveServerTask', ok=false, expected=[true, true, true], actual=[]}]}
2015-10-30 07:54:11,170 INFO [Grizzly-worker(1)] i.m.m.f.c.s.a.HealthCheckController - {} Cluster Health Report Generated. result: healthy = false

  • If i stop firewalld on the node running the scheduler nothing changes.
  • if I check the ports in use by the cassandra scheduler i see this:

[root@bodega-cnt7-246 ~]# ps -ef | grep cassan
root 4266 4246 31 07:07 ? 00:00:09 /tmp/mesos/slaves/58d76858-1d00-4ce7-8803-dfcb726f3859-S8/frameworks/58d76858-1d00-4ce7-8803-dfcb726f3859-0000/executors/monk_cassandra-scheduler.ba82b85d-7efe-11e5-8468-00155dc80d0d/runs/4ca0733d-005f-4fc1-9e62-6a2da45d296f/jre1.7.0_76/bin/java -Xms256m -Xmx256m -classpath cassandra-mesos-framework.jar io.mesosphere.mesos.frameworks.cassandra.framework.Main
root 4419 4294 0 07:07 pts/0 00:00:00 grep --color=auto cassan

[root@bodega-cnt7-246 ~]# lsof -i | grep 4266
java 4266 root 10u IPv4 27275 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:56453->bodega-cnt7-243.auto4.labs.aspect.com:eforward (ESTABLISHED)
java 4266 root 12u IPv4 28911 0t0 TCP *:39220 (LISTEN)
java 4266 root 15u IPv4 28241 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:36634->bodega-cnt7-245.auto4.labs.aspect.com:eforward (ESTABLISHED)
java 4266 root 16u IPv6 28970 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:31483 (LISTEN)
java 4266 root 29u IPv4 28361 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:36635->bodega-cnt7-245.auto4.labs.aspect.com:eforward (ESTABLISHED)
java 4266 root 30u IPv4 28362 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:39459->bodega-cnt7-243.auto4.labs.aspect.com:mmcc (ESTABLISHED)
java 4266 root 31u IPv4 29104 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:39220->bodega-cnt7-243.auto4.labs.aspect.com:37815 (ESTABLISHED)
java 4266 root 32u IPv6 29942 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:31483->bodega-cnt7-245.auto4.labs.aspect.com:39495 (ESTABLISHED)
java 4266 root 33u IPv6 29954 0t0 TCP bodega-cnt7-246.auto4.labs.aspect.com:31483->bodega-cnt7-245.auto4.labs.aspect.com:39508 (ESTABLISHED)
[root@bodega-cnt7-246 ~]#

the port 39220 that the scheduler listens on seems to be a random port and it's not in the mesos range:

java 4266 root 12u IPv4 28911 0t0 TCP *:39220 (LISTEN)

I've tried adding an exclusion for this port after the scheduler is up and running and it still doesn't rectify the problem.

  • the only way that i've been able to standup a working cassandra ring is to make sure that the firewalld service is turned off on the node that runs the cassandra scheduler PRIOR TO IT STARTING. NOTE: the firewall is configured and enabled on the nodes that run the cassandra nodes and executors. The firewall exclusions for those applications seem to be working fine.

Any tips on what it takes to get the scheduler to run with an enabled firewall?

I'm using the following versions:
marathon-0.11.1-1.0.432.el7.x86_64
mesos-0.25.0-0.2.70.centos701406.x86_64
mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_64
cassandra-mesos-0.2.0-1.tar.gz
jre-7u76-linux-x64.tar.gz
apache-cassandra-2.1.4

Thanks!

Allow specifying the hostname/IP to use for the HTTP server base URI

I am working on a set-up where the hostname of the machine does not resolve to the IP via which the machine can be reached by the Mesos slaves and masters. Thus, Cassandra-Mesos binds its HTTP server to the -- for my set-up -- wrong IP and fails to receive resource grants. And even if it would receive resources, slaves would not be able to download Java their requirements. I would thus suggest to use the variable API_HOST via which one can configure the IP to bind to. If unset, fall back to the existing solution of using InetAddress.getLocalHost() should be performed.

Cassandra tasks are not restarted when the Mesos cluster is shutdown

The cassandra-mesos tasks were able to run when the framework is first introduced with a cassandra clustername. However, whenever I restart the mesos cluster, the Cassandra-mesos tasks do not re-run even when the cassandra-mesos scheduler receives sufficient resource offers from the slave mesos nodes.

An example of the output from the mesos nodes is :

445421 [Thread-85] INFO mesosphere.cassandra.CassandraScheduler - Got new resource offers ArrayBuffer(ip-10-153-171-197.ec2.internal, ip-10-144-185-186.ec2.internal, ip-10-187-28-30.ec2.internal, ip-10-146-199-114.ec2.internal)
445438 [Thread-85] INFO mesosphere.cassandra.CassandraScheduler - resources offered: List((cpus,2.0), (mem,6493.0), (disk,95545.0), (ports,0.0))
445438 [Thread-85] INFO mesosphere.cassandra.CassandraScheduler - resources required: List((cpus,0.3), (mem,2048.0), (disk,40000.0))
445441 [Thread-85] INFO mesosphere.cassandra.CassandraScheduler - resources offered: List((cpus,2.0), (mem,6493.0), (disk,95545.0), (ports,0.0))

Note to self : Will need to examine the Scheduler criteria for acceptance to understand why it is not accepting the offers.

Tutorial of using vagrant and/or virtualbox to setup dev cluster

It seems like more and more Mesos frameworks start to provide an initial Vagrant file for setting a local dev instance in a VM (with virtualbox as first choice). It's a pretty convenient way for newbies to get a local dev environment set up and familiarize with the framework.

An example from myriad: https://github.com/apache/incubator-myriad/blob/master/docs/vagrant.md

Has this been ever brought up with cassandra-mesos? If you think could be useful for the community, I can try to pursue this a little bit?

Thanks!

upgrading the framework on running cluster.

Hi,

We're currently running 0.1.1-SNAPSHOT-514-master-a6a94fbb. (through marathon)

How should we approach upgrade to 0.2.1 ? Just update the app in marathon ?
I would like to have things running like they are right now...

cassandra in unhealthy state in dcos even if cluster is fine

steps to reproduce from clean state in standard aws cluster (5 slaves).

  1. dcos install cassandra
  2. wait for healthy state
  3. terminate 1 EC instance that has a Cassandra executor running
  4. wait for auto-scaling to spin up another instance to replace the terminate one
  5. call POST /node/{terminated node}/terminate/
  6. wait for Cassandra to spin up another executor to get back to 3

After this we have an essential health cluster with an unreachable node. No follow up REST call will remove the terminated node from the cluster and bring the cluster back into healthy state in dcos. Calls tried are node/{}/make-non-seed, node/{}/replace, /repair/start (repair is hanging).
Removing the node with nodetool noderemove also doesn't change the status from unhealthy in dcos even if the underlying Cassandra cluster is in perfect health.

cluster health report: https://gist.github.com/kevinschmidt/352e2a3a9c056699fa44
output of nodetool: https://gist.github.com/kevinschmidt/17b50760963199a5e005

zookeeper_init fails for multi-master Mesos setup

mesos.yaml:


...
mesos.master.url: 'zk://zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/mesos'
state.zk: 'zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/cassandra-mesos'
...


executing bin/cassandra-mesos gives:
...
0 [main] INFO mesosphere.cassandra.Main$ - Starting Cassandra on Mesos.
2014-06-05 02:30:05,017:2627(0x7f40167fc700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2014-06-05 02:30:05,017:2627(0x7f40167fc700):ZOO_INFO@log_env@716: Client environment:host.name=zk1.example.com
2014-06-05 02:30:05,017:2627(0x7f40167fc700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2014-06-05 02:30:05,018:2627(0x7f40167fc700):ZOO_INFO@log_env@724: Client environment:os.arch=3.2.0-63-virtual
2014-06-05 02:30:05,018:2627(0x7f40167fc700):ZOO_INFO@log_env@725: Client environment:os.version=#95-Ubuntu SMP Thu May 15 23:24:31 UTC 2014
2014-06-05 02:30:05,018:2627(0x7f40167fc700):ZOO_INFO@log_env@733: Client environment:user.name=ubuntu
2014-06-05 02:30:05,018:2627(0x7f40167fc700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2014-06-05 02:30:05,018:2627(0x7f40167fc700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ubuntu/cassandra-mesos-2.0.5-1
2014-06-05 02:30:05,018:2627(0x7f40167fc700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=null:2181 sessionTimeout=20000 watcher=0x7f4022f7f530 sessionId=0 sessionPasswd= context=0x7f4018002360 flags=0
2014-06-05 02:30:05,020:2627(0x7f40167fc700):ZOO_ERROR@getaddrs@599: getaddrinfo: No such file or directory

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0605 02:30:05.020211 2643 zookeeper.cpp:74] Failed to create ZooKeeper, zookeeper_init: No such file or directory [2]
*** Check failure stack trace: ***
bin/cassandra-mesos: line 14: 2627 Aborted (core dumped) java -classpath $CLASSPATH mesosphere.cassandra.Main "$@"

Debugging Cassandra startup

I'm going through bringup with Cassandra-Mesos and while I see the framework displayed in Mesos UI and can access the REST APIs, I don't see any indication in the logs that Cassandra has started or was attempting to start. The logs I am looking in are those in the sandbox for cassandra-mesos.log, stderr, and stdout. Trying to figure out how best to debug Cassandra not starting.

Cassandra-Mesos and SSL?

Is it possible to use cassandra-mesos in such a way that Cassandra is exposed securely? Since Cassandra is a database our implementation will require that we use SSL between clients and cassandra as well as the inter node communication.

Any tips on how to make Cassandra secure when using the cassandra-mesos plugin?

If it's not supported yet do you have a projection as to when it might be supported?

Thanks!

Trouble connecting to Cassandra cluster

Hi,

I build the latest cassandra-mesos from the rewrite branch. I started a Cassandra cluster with marathon. I see the Cassandra 'master' coming up and registering the framework 'cassandra.dev-test' in the mesos cluster. Afterwards 3 executor nodes are created. Everything seems fine. The cassandra cluster is up. But when I try to connect with csqlh to the cluster nothing happens.
What does seems weird is that netstat -anp doesn't show any of the standard cassandra ports. I did add these ports in the mesos-agent config.
Is my marathon json wrong ?

{
  "id": "/cassandra-mesos",
  "instances": 1,
  "cpus": 0.5,
  "mem": 512,
  "ports": [0],
  "uris": [
    "http://some-host/cassandra-0.1.0-SNAPSHOT-jar-with-dependencies.jar",
    "http://some-host/cassandra-executor.jar",
    "http://some-host/cassandra-framework.jar",
    "http://some-host/cassandra.tar.gz",
    "http://some-host/jre-7u76-linux-x64.tar.gz",
    "https://downloads.mesosphere.io/cassandra-mesos/jdk/jdk-7u75-linux-x64.tar.gz"
  ],
  "env": {
    "MESOS_ZK": "zk://zk1:2181,zk2:2181,zk3:2181/mesos",
    "JAVA_OPTS": "-Xms256m -Xmx256m ",
    "CASSANDRA_CLUSTER_NAME": "dev-test",
    "CASSANDRA_ZK": "zk://zk1:2181,zk2:2181,zk3:2181/cassandra-mesos",
    "CASSANDRA_NODE_COUNT": "3",
    "CASSANDRA_RESOURCE_CPU_CORES": "2.0",
    "CASSANDRA_RESOURCE_MEM_MB": "2048",
    "CASSANDRA_RESOURCE_DISK_MB": "2048",
    "CASSANDRA_HEALTH_CHECK_INTERVAL_SECONDS": "60"
  },
  "cmd": "$(pwd)/jdk*/bin/java $JAVA_OPTS -classpath cassandra-framework.jar    io.mesosphere.mesos.frameworks.cassandra.Main"
}

Do I need to specify the standard cassandra ports (9042,9160,...) in the marathon json ?
Or is the framework not ready yet to connect to?

Regards,

Benoรฎt

how to stop / restart cassandra-mesos

We may use multiple-ssh (http://grokbase.com/t/cassandra/user/117ncdcha7/how-to-stop-the-whole-cluster-start-the-whole-cluster-like-in-hadoop-hbase) or direct container manipulation (https://github.com/jimenezrick/docker-cassandra-cluster/blob/master/scripts/run-cassandra-cluster) to issue stop/restart command for cassandra instances on several nodes in cluster.

It seems that the kill of cassandra-mesos processes doesn't update appropriate notice in zookeeper tree, so it's needed to update the tree manually (zk command line or exhibitor UI editor) to completely clean up the state.

It seems that some basic commands on zk tree to delete latest cassandra-mesos data (on cassandra-mesos stop) needs to be provided.

Not able to deploy framework to Mesos

mesos Version 0.20.1
Zookeeper version: 3.4.6-1569965
cassandra-mesos latest as of date on master branch

I tried to setup cassandra-mesos on my local mesos+zookeeper setup on OS X, and see following in the logs. Appears as if scheduler is stuck there and eventually times out with zookeeper connection. The mesos UI does not show this framework as registered

2015-01-16 13:32:48,482:94019(0x11a85d000):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-01-16 13:32:48,482:94019(0x11a85d000):ZOO_INFO@log_env@716: Client environment:host.name=vishr-mbpro.local
2015-01-16 13:32:48,482:94019(0x11a85d000):ZOO_INFO@log_env@723: Client environment:os.name=Darwin
2015-01-16 13:32:48,482:94019(0x11a85d000):ZOO_INFO@log_env@724: Client environment:os.arch=13.4.0
2015-01-16 13:32:48,483:94019(0x11a85d000):ZOO_INFO@log_env@725: Client environment:os.version=Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64
2015-01-16 13:32:48,483:94019(0x11a85d000):ZOO_INFO@log_env@733: Client environment:user.name=vishr
2015-01-16 13:32:48,483:94019(0x11a85d000):ZOO_INFO@log_env@741: Client environment:user.home=/Users/vishr
2015-01-16 13:32:48,483:94019(0x11a85d000):ZOO_INFO@log_env@753: Client environment:user.dir=/Users/vishr/xxxx/tools/cassandra-mesos/cassandra-mesos-2.0.5-1
I0116 13:32:48.483031 447647744 sched.cpp:139] Version: 0.20.1
2015-01-16 13:32:48,483:94019(0x11a85d000):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x1190d31fc sessionId=0 sessionPasswd= context=0x7fcae9a02590 flags=0
2015-01-16 13:32:48,492:94019(0x11abb7000):ZOO_INFO@check_events@1703: initiated connection to server [::1:2181]
2015-01-16 13:32:48,493:94019(0x11abb7000):ZOO_INFO@check_events@1750: session establishment complete on server [::1:2181], sessionId=0x14aef610055000d, negotiated timeout=10000
I0116 13:32:48.494086 442830848 group.cpp:313] Group process (group(1)@192.168.59.3:59332) connected to ZooKeeper
I0116 13:32:48.494113 442830848 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0116 13:32:48.494127 442830848 group.cpp:385] Trying to create path '/mesos' in ZooKeeper

Zookeeper logs show this warning when cassandra-mesos connects. Not sure if relevant

2015-01-16 13:32:48 ZooKeeperServer [WARN] Connection request from old client /fe80:0:0:0:0:0:0:1%1:59333; will be dropped if server is in r-o mode
2015-01-16 13:32:48 ZooKeeperServer [WARN] Connection request from old client /0:0:0:0:0:0:0:1:59334; will be dropped if server is in r-o mode

Configuring Multi-Datacenter Support

From the documentation (http://mesosphere.github.io/cassandra-mesos/docs/multi-dc.html), I need to set CASSANDRA_EXTERNAL_DC_=. Per the example:

CASSANDRA_EXTERNAL_DC_dc1=http://multi-dc.cassandra.marathon.mesos1:10001

What port should I use? I'm assuming it's the port that is dynamically assigned by Marathon for the REST API/Command Executor. Which then makes it extremely hard to configure because I can't guess what it's going to be. I've tried using the PORT0 variable but then the command executor becomes unstable and is frequently restarted. Marathon reports it as continuously deploying.

ZooKeeper: 3.4.6
Mesos: 0.24.1
Marathon: 0.10.0
cassandra-mesos: 0.2.1
java: 7u76

Configuration in Zookeeper seems to overwrite Framework Runtime Configuration in environment variables

When initially starting the framework, we did not set the CASSANDRA_FRAMEWORK_MESOS_ROLE environment variable. When we later tried to change this, we noticed the framework was still using the default *-role to reserve resources.
After we deleted the framework-state in zookeeper, the role configured in the environment-variable was used.

Is this behaviour intentional? If yes, maybe this should be documented more clearly. We would have assumed that configuration in environment variables would be applied in any case.

No seed nodes launched

Hello,

When using the marathon json file that is provided we are able to launch 3 executors, they run perfectly however we find the following error in the log:

2015-11-11 16:29:25,198 DEBUG [Thread-28] i.m.m.f.c.scheduler.CassandraCluster - {offerId:20150831-091805-1562757130-5050-31315-O1390810,hostname:c01-n10.surfsara.nl} Attempting to launch server task for node.
2015-11-11 16:29:25,198 INFO  [Thread-28] i.m.m.f.c.scheduler.CassandraCluster - {offerId:20150831-091805-1562757130-5050-31315-O1390810,hostname:c01-n10.surfsara.nl} Cannot start server task because no seed node is running.

This causes all our server tasks to fail. The logs from the server are clean and no reason for failures, but the framework logs describes the failure above.

We looked through the cassandra.yml configuration (even unpacked all tar's and searched in any file for seed), there are no seed nodes configured. After trying to package the tar's we also do not find the default seed node count that the document described.

Sidenote: We tried to run dev-run.bash but maven reports the error: cassandra-mesos-dist FAILURE
Failed to execute goal com.googlecode.maven-download-plugin:download-maven-plugin:1.2.1:wget

Downloading: https://downloads.mesosphere.io/cassandra-mesos/cassandra/apache-cassandra-2.1.4-bin.tar.gz
javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

Anyone else got this? A normal wget on commandline works perfect

We would really appreciate any suggestions how to advance further.

Invalid cluster name specified in the cassandra.yaml

Executor adds "cassandra." prefix to the cluster name specified in the marathon.json


marathon.json:
"env": {
....
"CASSANDRA_CLUSTER_NAME": "ClusterName1",


cassandra.yaml ( on the worker node ):
....
cluster_name: cassandra.ClusterName1

...

It's possible for the scheduler to crash due to GC Overhead Limit

Exception in thread "Thread-96988" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.AbstractStringBuilder.<init>(Unknown Source)
    at java.lang.StringBuilder.<init>(Unknown Source)
    at ch.qos.logback.classic.pattern.TargetLengthBasedClassNameAbbreviator.abbreviate(TargetLengthBasedClassNameAbbreviator.java:28)
    at ch.qos.logback.classic.pattern.NamedConverter.convert(NamedConverter.java:53)
    at ch.qos.logback.classic.pattern.NamedConverter.convert(NamedConverter.java:18)
    at ch.qos.logback.core.pattern.FormattingConverter.write(FormattingConverter.java:37)
    at ch.qos.logback.core.pattern.PatternLayoutBase.writeLoopOnConverters(PatternLayoutBase.java:119)
    at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:149)
    at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:39)
    at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:134)
    at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:194)
    at ch.qos.logback.core.FileAppender.writeOut(FileAppender.java:209)
    at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:219)
    at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:182)
    at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
    at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
    at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
    at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:273)
    at ch.qos.logback.classic.Logger.callAppenders(Logger.java:260)
    at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:442)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:396)
    at ch.qos.logback.classic.Logger.info(Logger.java:620)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraCluster._getTasksForOffer(CassandraCluster.java:1245)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraCluster.getTasksForOffer(CassandraCluster.java:342)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraScheduler.evaluateOffer(CassandraScheduler.java:270)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraScheduler.resourceOffers(CassandraScheduler.java:93)

Framework always fails, containers wont start - mesos fetcher

On following the github page and submitting the marathon Job -> https://teamcity.mesosphere.io/guestAuth/repository/download/Oss_Mesos_Cassandra_CassandraFramework/.lastSuccessful/marathon.json

It always fails with error attached -. I also tried changing --executor_registration_timeout=30mins. Still no luck. always fails after fetcher - Fetching URIs using command '/usr/libexec/mesos/mesos-fetcher'

I0708 04:34:50.181612  1280 containerizer.cpp:484] Starting container '5e4df988-3296-4602-a9b6-eaf79a2db535' for executor 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' of framework '20150707-093403-16842879-5050-1190-0000'
I0708 04:34:50.182454  1280 launcher.cpp:130] Forked child with pid '1734' for container '5e4df988-3296-4602-a9b6-eaf79a2db535'
I0708 04:34:50.182653  1280 containerizer.cpp:694] Checkpointing executor's forked pid 1734 to '/tmp/mesos/meta/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000/executors/cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799/runs/5e4df988-3296-4602-a9b6-eaf79a2db535/pids/forked.pid'
I0708 04:34:50.181267  1279 slave.cpp:1401] Queuing task 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' for executor cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework '20150707-093403-16842879-5050-1190-0000
I0708 04:34:50.192180  1276 fetcher.cpp:238] Fetching URIs using command '/usr/libexec/mesos/mesos-fetcher'
I0708 04:35:11.595506  1278 slave.cpp:3648] Current disk usage 4.70%. Max allowed age: 5.971056596412430days
I0708 04:36:05.575178  1279 slave.cpp:3165] Monitoring executor 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' of framework '20150707-093403-16842879-5050-1190-0000' in container '5e4df988-3296-4602-a9b6-eaf79a2db535'
I0708 04:36:05.595361  1279 slave.cpp:2164] Got registration for executor 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' of framework 20150707-093403-16842879-5050-1190-0000 from executor(1)@127.0.1.1:36629
I0708 04:36:05.598816  1280 slave.cpp:1555] Sending queued task 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' to executor 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:05.606936  1278 slave.cpp:2531] Handling status update TASK_RUNNING (UUID: 2eb703c8-98e0-4692-bd66-ceaa380b3eda) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000 from executor(1)@127.0.1.1:36629
I0708 04:36:05.607069  1278 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: 2eb703c8-98e0-4692-bd66-ceaa380b3eda) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:05.607205  1278 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 2eb703c8-98e0-4692-bd66-ceaa380b3eda) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:05.612787  1278 slave.cpp:2776] Forwarding the update TASK_RUNNING (UUID: 2eb703c8-98e0-4692-bd66-ceaa380b3eda) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000 to [email protected]:5050
I0708 04:36:05.613082  1278 slave.cpp:2709] Sending acknowledgement for status update TASK_RUNNING (UUID: 2eb703c8-98e0-4692-bd66-ceaa380b3eda) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000 to executor(1)@127.0.1.1:36629
I0708 04:36:05.645581  1277 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 2eb703c8-98e0-4692-bd66-ceaa380b3eda) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:05.645910  1277 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: 2eb703c8-98e0-4692-bd66-ceaa380b3eda) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:06.109319  1279 slave.cpp:2531] Handling status update TASK_FAILED (UUID: 10cfcf43-198a-46a6-9653-533373e22bff) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000 from executor(1)@127.0.1.1:36629
I0708 04:36:06.110167  1279 status_update_manager.cpp:317] Received status update TASK_FAILED (UUID: 10cfcf43-198a-46a6-9653-533373e22bff) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:06.110316  1279 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_FAILED (UUID: 10cfcf43-198a-46a6-9653-533373e22bff) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:06.113353  1279 slave.cpp:2776] Forwarding the update TASK_FAILED (UUID: 10cfcf43-198a-46a6-9653-533373e22bff) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000 to [email protected]:5050
I0708 04:36:06.113605  1279 slave.cpp:2709] Sending acknowledgement for status update TASK_FAILED (UUID: 10cfcf43-198a-46a6-9653-533373e22bff) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000 to executor(1)@127.0.1.1:36629
I0708 04:36:06.129639  1275 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 10cfcf43-198a-46a6-9653-533373e22bff) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:06.129724  1275 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_FAILED (UUID: 10cfcf43-198a-46a6-9653-533373e22bff) for task cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:07.201041  1278 containerizer.cpp:1123] Executor for container '5e4df988-3296-4602-a9b6-eaf79a2db535' has exited
I0708 04:36:07.201120  1278 containerizer.cpp:918] Destroying container '5e4df988-3296-4602-a9b6-eaf79a2db535'
I0708 04:36:07.206856  1278 slave.cpp:3223] Executor 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' of framework 20150707-093403-16842879-5050-1190-0000 exited with status 0
I0708 04:36:07.207015  1278 slave.cpp:3332] Cleaning up executor 'cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' of framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:07.207200  1275 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000/executors/cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799/runs/5e4df988-3296-4602-a9b6-eaf79a2db535' for gc 6.99999760242074days in the future
I0708 04:36:07.207290  1275 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000/executors/cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' for gc 6.99999760197926days in the future
I0708 04:36:07.207317  1275 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000/executors/cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799/runs/5e4df988-3296-4602-a9b6-eaf79a2db535' for gc 6.99999760180444days in the future
I0708 04:36:07.207337  1275 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000/executors/cassandra_dev-test.ab87dcd9-252a-11e5-8927-56847afe9799' for gc 6.99999760167704days in the future
I0708 04:36:07.207350  1278 slave.cpp:3411] Cleaning up framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:07.207702  1275 status_update_manager.cpp:279] Closing status update streams for framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:07.207707  1278 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000' for gc 6.9999975962163days in the future
I0708 04:36:07.208115  1278 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000' for gc 6.999
I0708 04:36:11.596693  1275 slave.cpp:3648] Current disk usage 5.42%. Max allowed age: 5.920668743363287days
I0708 04:36:12.220798  1273 slave.cpp:1144] Got assigned task cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799 for framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:12.221675  1273 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000' from gc
I0708 04:36:12.221750  1273 gc.cpp:84] Unscheduling '/tmp/mesos/meta/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000' from gc
I0708 04:36:12.221801  1273 slave.cpp:1254] Launching task cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799 for framework 20150707-093403-16842879-5050-1190-0000
I0708 04:36:12.225669  1273 slave.cpp:4208] Launching executor cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799 of framework 20150707-093403-16842879-5050-1190-0000 in work directory '/tmp/mesos/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000/executors/cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799/runs/2b4cb86b-c96b-481a-926c-01dfdc4a591f'
I0708 04:36:12.226181  1273 slave.cpp:1401] Queuing task 'cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799' for executor cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799 of framework '20150707-093403-16842879-5050-1190-0000
I0708 04:36:12.226363  1278 docker.cpp:598] No container info found, skipping launch
I0708 04:36:12.227066  1277 containerizer.cpp:484] Starting container '2b4cb86b-c96b-481a-926c-01dfdc4a591f' for executor 'cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799' of framework '20150707-093403-16842879-5050-1190-0000'
I0708 04:36:12.228831  1277 launcher.cpp:130] Forked child with pid '1793' for container '2b4cb86b-c96b-481a-926c-01dfdc4a591f'
I0708 04:36:12.231519  1277 containerizer.cpp:694] Checkpointing executor's forked pid 1793 to '/tmp/mesos/meta/slaves/20150708-042711-16842879-5050-1225-S0/frameworks/20150707-093403-16842879-5050-1190-0000/executors/cassandra_dev-test.dc74e31a-252a-11e5-8927-56847afe9799/runs/2b4cb86b-c96b-481a-926c-01dfdc4a591f/pids/forked.pid'
I0708 04:36:12.234097  1277 fetcher.cpp:238] Fetching URIs using command '/usr/libexec/mesos/mesos-fetcher'

problem deploying cassandra-mesos with marathon

I'm trying cassandra-mesos on my private cluster os mesos. I'm using Readme instruction to deploy with marathon but some error occure on wake up task:

Stderr output:

I0925 15:55:06.185039  8383 fetcher.cpp:214] Fetching URI 'https://downloads.mesosphere.io/cassandra-mesos/artifacts/0.2.1-SNAPSHOT-589-master-4c6502b0a6/cassandra-mesos-0.2.1-SNAPSHOT-589-master-4c6502b0a6.tar.gz'
I0925 15:55:06.185165  8383 fetcher.cpp:125] Fetching URI 'https://downloads.mesosphere.io/cassandra-mesos/artifacts/0.2.1-SNAPSHOT-589-master-4c6502b0a6/cassandra-mesos-0.2.1-SNAPSHOT-589-master-4c6502b0a6.tar.gz' with os::net
I0925 15:55:06.185180  8383 fetcher.cpp:135] Downloading 'https://downloads.mesosphere.io/cassandra-mesos/artifacts/0.2.1-SNAPSHOT-589-master-4c6502b0a6/cassandra-mesos-0.2.1-SNAPSHOT-589-master-4c6502b0a6.tar.gz' to '/tmp/mesos/slaves/20150925-135709-503717292-5050-2136-S0/frameworks/20150925-144548-503717292-5050-5066-0001/executors/cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799/runs/e0aaadd9-9c86-47e1-8df8-d2c437f960f4/cassandra-mesos-0.2.0-1.tar.gz'
I0925 15:56:07.968350  8383 fetcher.cpp:78] Extracted resource '/tmp/mesos/slaves/20150925-135709-503717292-5050-2136-S0/frameworks/20150925-144548-503717292-5050-5066-0001/executors/cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799/runs/e0aaadd9-9c86-47e1-8df8-d2c437f960f4/cassandra-mesos-0.2.0-1.tar.gz' into '/tmp/mesos/slaves/20150925-135709-503717292-5050-2136-S0/frameworks/20150925-144548-503717292-5050-5066-0001/executors/cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799/runs/e0aaadd9-9c86-47e1-8df8-d2c437f960f4'
I0925 15:56:07.971684  8383 fetcher.cpp:214] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0925 15:56:07.971709  8383 fetcher.cpp:125] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' with os::net
I0925 15:56:07.971725  8383 fetcher.cpp:135] Downloading 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/tmp/mesos/slaves/20150925-135709-503717292-5050-2136-S0/frameworks/20150925-144548-503717292-5050-5066-0001/executors/cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799/runs/e0aaadd9-9c86-47e1-8df8-d2c437f960f4/jre-7u76-linux-x64.tar.gz'
I0925 15:56:51.630692  8383 fetcher.cpp:78] Extracted resource '/tmp/mesos/slaves/20150925-135709-503717292-5050-2136-S0/frameworks/20150925-144548-503717292-5050-5066-0001/executors/cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799/runs/e0aaadd9-9c86-47e1-8df8-d2c437f960f4/jre-7u76-linux-x64.tar.gz' into '/tmp/mesos/slaves/20150925-135709-503717292-5050-2136-S0/frameworks/20150925-144548-503717292-5050-5066-0001/executors/cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799/runs/e0aaadd9-9c86-47e1-8df8-d2c437f960f4'
I0925 15:56:51.720883  8381 exec.cpp:132] Version: 0.22.1
I0925 15:56:51.723655  8426 exec.cpp:206] Executor registered on slave 20150925-135709-503717292-5050-2136-S0
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@716: Client environment:host.name=vcmms.domain.com
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@724: Client environment:os.arch=3.19.3-1.el6.x86_64
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Mon Mar 30 13:50:16 EDT 2015
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@733: Client environment:user.name=(null)
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/mesos/slaves/20150925-135709-503717292-5050-2136-S0/frameworks/20150925-144548-503717292-5050-5066-0001/executors/cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799/runs/e0aaadd9-9c86-47e1-8df8-d2c437f960f4
2015-09-25 15:56:52,376:8432(0x7f3cb2ffd700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=172.29.6.30:2181 sessionTimeout=10000 watcher=0x7f3cee4ceeb0 sessionId=0 sessionPasswd=<null> context=0x7f3c9c000930 flags=0
2015-09-25 15:56:52,377:8432(0x7f3cb17fa700):ZOO_INFO@check_events@1703: initiated connection to server [172.29.6.30:2181]
2015-09-25 15:56:52,382:8432(0x7f3cb17fa700):ZOO_INFO@check_events@1750: session establishment complete on server [172.29.6.30:2181], sessionId=0x15005d328430014, negotiated timeout=10000

and Stdout output:

Registered executor on vcmms.domain.com
Starting task cassandra-dev-test.51620664-63bf-11e5-b350-56847afe9799
Forked command at 8432
sh -c '$(pwd)/jre*/bin/java $JAVA_OPTS -classpath cassandra-mesos-framework.jar io.mesosphere.mesos.frameworks.cassandra.framework.Main'
Command exited with status 10 (pid: 8432)

My cluster has:
mesos masters: 1
mesos slaves: 3
marathon
docker

how to stop / restart cassandra-mesos

We may use multiple-ssh (http://grokbase.com/t/cassandra/user/117ncdcha7/how-to-stop-the-whole-cluster-start-the-whole-cluster-like-in-hadoop-hbase) or direct container manipulation (https://github.com/jimenezrick/docker-cassandra-cluster/blob/master/scripts/run-cassandra-cluster) to issue stop/restart command for cassandra instances on several nodes in cluster.

It seems that the kill of cassandra-mesos processe doesn't update appropriate notice in zookeeper tree, so it's needed to update the tree manually (zk command line or exhibitor UI editor) to completely clean up the state.

It seems that some basic commands on zk tree to delete latest cassandra-mesos data (on cassandra-mesos stop) may be introduce.

Cassandra ignores set-up parameters

Hi guys,

Cassandra DCOS package ignores all configuration parameters and doesn't allow to change settings.

Any options regarding data directory, RAM, CPU and so on are ignored, there is no way to customize DCOS Cassandra package installation atm.

UninitializedMessageException when repairing cluster

I sent a POST request to /cluster/repair/start and my cassandra.dcos is continuously failing to start.

Here is the relevant portion of the stderr:

1927 sched.cpp:448] Framework registered with 20150625-151318-3212775596-5050-2156-0001
Exception in thread "Thread-2" com.google.protobuf.UninitializedMessageException: Message missing required fields: jobType, startedTimestamp
    at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
    at io.mesosphere.mesos.frameworks.cassandra.CassandraFrameworkProtos$ClusterJobStatus$Builder.build(CassandraFrameworkProtos.java:11456)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.NodeTaskClusterJobHandler.rejectNode(NodeTaskClusterJobHandler.java:112)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.NodeTaskClusterJobHandler.handleTaskOffer(NodeTaskClusterJobHandler.java:78)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraCluster.handleClusterTask(CassandraCluster.java:689)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraCluster._getTasksForOffer(CassandraCluster.java:1154)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraCluster.getTasksForOffer(CassandraCluster.java:342)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraScheduler.evaluateOffer(CassandraScheduler.java:270)
    at io.mesosphere.mesos.frameworks.cassandra.scheduler.CassandraScheduler.resourceOffers(CassandraScheduler.java:93)

I haven't grokked the whole project, but if I had to guess, I think the problem might be here

I think the builder might need a jobType and startedTimestamp set the same way it was set here

I hope this is somewhat helpful. I'm still trying to figure out how to fix my cassandra.dcos service without uninstalling it all together. Please advise.

could not connect to the scheduler host and port using cqlsh

cqlsh can connect to the live node and port, but not the scheduler 's host and port.

cqlsh
Connection error: ('Unable to connect to any servers', {'': OperationTimedOut('errors=Timed out creating connection, last_host=None',)})

Thanks. Yang

Launch on Mesos/Marathon created with docker compose

I have Mesos/Zookeeper/Marathon/slave nodes created in docker containers, and am trying to add cassandra-mesos with Marathon. The application is stuck in Deploying in Marathon. In Mesos, I can see the task while it's staging and can see through the sandbox that it downloads the necessary files. However, the task gets killed shortly after with no indication of what went wrong in stderr/stdout. Tailing the logs of the docker slave node, I can see the following:

I1109 19:25:57.967133     8 slave.cpp:1270] Got assigned task cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961 for framework f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000
I1109 19:25:57.967419     8 slave.cpp:1386] Launching task cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961 for framework f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000
I1109 19:25:58.021013     8 slave.cpp:4852] Launching executor cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961 of framework f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/tmp/mesos/slaves/f5aed497-2f9e-4b5f-862e-e6a4ad94e306-S0/frameworks/f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000/executors/cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961/runs/95424c8c-77f9-4ff8-becb-4e4cf1e1cfaa'
I1109 19:25:58.022163    10 docker.cpp:734] No container info found, skipping launch
I1109 19:25:58.023249    10 containerizer.cpp:640] Starting container '95424c8c-77f9-4ff8-becb-4e4cf1e1cfaa' for executor 'cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961' of framework 'f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000'
I1109 19:25:58.028839     8 slave.cpp:1604] Queuing task 'cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961' for executor cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961 of framework 'f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000
I1109 19:25:58.030902    10 linux_launcher.cpp:352] Cloning child process with flags =
I1109 19:25:58.036380    10 containerizer.cpp:873] Checkpointing executor's forked pid 861 to '/tmp/mesos/meta/slaves/f5aed497-2f9e-4b5f-862e-e6a4ad94e306-S0/frameworks/f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000/executors/cassandra_dev-test.b40c7218-8717-11e5-98a7-760869961961/runs/95424c8c-77f9-4ff8-becb-4e4cf1e1cfaa/pids/forked.pid'
E1109 19:26:08.176664    11 slave.cpp:3342] Container 'eb670e50-c40a-43ee-b577-9a76df5fab96' for executor 'cassandra_dev-test.ab14e9d7-8717-11e5-98a7-760869961961' of framework 'f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000' failed to start: Container destroyed during launch
E1109 19:26:08.182646    11 slave.cpp:3424] Termination of executor 'cassandra_dev-test.ab14e9d7-8717-11e5-98a7-760869961961' of framework 'f5aed497-2f9e-4b5f-862e-e6a4ad94e306-0000' failed:     Unknown container: eb670e50-c40a-43ee-b577-9a76df5fab96`

Can cassandra-mesos be launched in this manner (dockerized)? Below if the docker compose template I'm using (on OS X).

zk:
  image: bobrik/zookeeper
  net: host
  ports:
   - 2181:2181
  environment:
    ZK_CONFIG: tickTime=2000,initLimit=10,syncLimit=5,maxClientCnxns=128,forceSync=no,clientPort=2181
    ZK_ID: 1

master:
  image: mesosphere/mesos-master:0.25.0-0.2.70.ubuntu1404
  net: host
  environment:
    MESOS_ZK: zk://192.168.99.100:2181/mesos
    MESOS_QUORUM: 1
    MESOS_CLUSTER: docker-compose
    MESOS_WORK_DIR: /var/lib/mesos

slave:
  image: mesosphere/mesos-slave:0.25.0-0.2.70.ubuntu1404
  net: host
  privileged: true
  environment:
    MESOS_MASTER: zk://192.168.99.100:2181/mesos
    MESOS_CONTAINERIZERS: docker,mesos
    MESOS_PORT: 5051
    MESOS_RESOURCES: ports(*):[31000-32000,7000-7001,7199-7199,9042-9042,9160-9160];mem(*):5000;cpus(*):4
  volumes:
    - /sys/fs/cgroup:/sys/fs/cgroup
    - /usr/local/bin/docker:/usr/bin/docker
    - /var/run/docker.sock:/var/run/docker.sock

marathon:
  image: mesosphere/marathon:v0.11.0
  net: host
  environment:
    MARATHON_MASTER: zk://192.168.99.100:2181/mesos
    MARATHON_DECLINE_OFFER_DURATION: 3600000

Failed to allocate tasks on DigitalOcean, and more

Several issues:

  1. Upon spinning up a test cluster in digital ocean environment and installing cassandra-mesos as described...
# Downloaded and unpacked archive
$ tar xzf cassandra-mesos-*.tgz

# Updated configuration
$ vi conf/mesos.yaml # updated mesos.master.url with mesos-master IP

# Register framework
# bin/cassandra-mesos 

... Task allocation failed due to mesos attempting to access unknown host; wget failed.

--2015-01-21 23:54:37--  http://development-3657-d15:8282/cassandra.yaml
Resolving development-3657-d15 (development-3657-d15)... failed: Name or service not known.
wget: unable to resolve host address `development-3657-d15'

Checking /etc/hosts on the slave contains no entry for mesos-master development-3657-d15, so wget failed. I was able to rescue this by explicitly providing the mesos-master ip address to undocument configuration variable in mesos.yaml: cassandra.confServer.hostname:

mesos.master.url: 'zk://104.131.15.45:2181/mesos'
state.zk: '104.131.15.45:2181'
cassandra.confServer.hostname: 104.131.15.45

However, this seems like a brittle solution as I would expect it to fail if a new master is elected. Did I miss something here? Should my DO cluster slaves have an entry for the master in /etc/hosts?

Requested resources are not the same as the ones in the environment variables

I'm launching the framework using Marathon and with the following resources for each node:

    "CASSANDRA_RESOURCE_CPU_CORES": "1",
    "CASSANDRA_RESOURCE_MEM_MB": "848",

But when I'm watching the logs from the framework I see this:

['Not enough cpu resources for role cassandra. Required 1.2 only 1.0 available','Not enough mem resources for role cassandra. Required 928 only 848 available']

More, when I use the REST API I get the following:

    "cpuCores" : 1.0,
    "memJavaHeapMb" : 256,
    "memAssumeOffHeapMb" : 256,
    "memMb" : 512,

Is this a bug or is this expected? How can I solve this problem?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.