mesos / storm Goto Github PK

View Code? Open in Web Editor NEW

138.0 33.0 66.0 697 KB

Storm on Mesos!

License: Apache License 2.0

Python 0.16% Java 93.44% Shell 3.65% Makefile 0.87% Dockerfile 1.88%

storm mesos mesos-framework

storm's Introduction

Storm on Mesos

Overview

Storm integration with the Mesos cluster resource manager.

To use a release, you first need to unpack the distribution, fill in configurations listed below into the conf/storm.yaml file and start Nimbus using storm-mesos nimbus.

Note: It is not necessary to repack the distribution - the configuration is automatically pushed out to the slaves from Nimbus.

Known Deficiencies Versus non-Mesos Storm

Storm's topology 'rebalance' action is not supported and is explicitly disabled in the custom Storm version used to build this project.
Supervisor logs cannot be loaded via the Storm UI's link to each worker's logviewer, because the Supervisor logs are specialized per-topology when running under Mesos.

Building

Run bin/build-release.sh to download storm distribution and bundle Storm with this framework into one tar release.

STORM_RELEASE=X.X.X MESOS_RELEASE=Y.Y.Y bin/build-release.sh

Where X.X.X and Y.Y.Y are the respective versions of Storm and Mesos you wish to build against. This will build a Mesos executor package. You'll need to edit storm.yaml and supply the Mesos master configuration as well as the executor package URI (produced by the step above).

Sub-commands

Sub-commands can be invoked similar to git sub-commands.

For example the following command will download the Storm release tarball into the current working directory.

bin/build-release.sh downloadStormRelease

main

Build a Storm package with the Mesos scheduler. The output of this command can be used as the package for mesos.executor.uri.
clean

Attempts to clean working files and directories created when building.
downloadStormRelease

A utility function to download the Storm release tarball for the targeted storm release.

Set MIRROR environment variable to configure download mirror.
mvnPackage

Runs the maven targets necessary to build the Storm Mesos framework.
prePackage

Prepares the working directories to be able to package the Storm Mesos framework.
- Optional argument specifying the Storm release tarball to package against.
package

Packages the Storm Mesos Framework.
dockerImage

Builds a Docker image from the current code.
help

Prints out usage information about the build-release.sh script.

Docker images Building

In order to build the storm-mesos docker image, or a docker image ready to be used as mesos.container.docker.image in your storm configuration, run the following:

make help
make images STORM_RELEASE=X.X.X MESOS_RELEASE=Y.Y.Y DOCKER_REPO=mesos/storm

Where X.X.X and Y.Y.Y are the respective versions of Storm and Mesos you wish to build against. This will build a docker image containing a Mesos executor package. The resulting docker images are the following:

± docker images
REPOSITORY                TAG                                    IMAGE ID            CREATED 
mesos/storm               0.1.0-X.X.X-Y.Y.Y-jdk7                 11989e7bfa17        44 minutes ago
mesos/storm               0.1.0-X.X.X-Y.Y.Y-jdk7-onbuild         e7eb52b3eb9f        44 minutes ago

In order to use JDK 8 while building the docker image, run the following:

make images STORM_RELEASE=X.X.X MESOS_RELEASE=Y.Y.Y DOCKER_REPO=mesos/storm JAVA_PRODUCT_VERSION=8

A custom image could be built from the onbuild tagged docker image. It is based on the dockerfile onbuild/Dockerfile

Images are also published to Docker Hub under the image mesos/storm at https://hub.docker.com/r/mesos/storm/.

Releasing New Version

Select the Branch

Note that normally your local repo should be synced to the HEAD of github.com:mesos/storm's master branch. However, it is possible that you're working from a different branch and doing releases for an earlier numbered version, per the branching regime we created for handling backwards-incompatible changes in Storm (such as the package path change from backtype.storm.* to org.apache.storm.* in Storm 1.0, and the LocalState implementation change in Storm 0.10).

Storm v1.x

Just use the master branch.

Storm v0.x (e.g., v0.9.6)

Check out storm-0.x branch, ensuring you are up-to-date with the latest changes in the remote base repo's storm-0.x branch.

Generate Release

If you are a committer for this repo, then you merely need to run the following command to generate a new release:

mvn release:clean release:prepare

This will automatically update the version fields and push tags that in turn kick off a travis-ci build. This travis-ci build automatically uploads the resultant artifacts to both GitHub and DockerHub.

Running Storm on Mesos

Along with the Mesos master and Mesos cluster, you'll need to run the Storm master as well. Launch Nimbus with this command:

bin/storm-mesos nimbus

It's recommended that you also run the UI on the same machine as Nimbus via the following command:

bin/storm ui

There's a minor bug in the UI regarding how it displays the number of slots in the cluster – you don't need to worry about this, it's an artifact of there being no pre-existing slots when Storm runs on Mesos. Slots are created from available cluster resources when a topology needs its Storm worker processes to be launched.

Topologies are submitted to a Storm/Mesos cluster the exact same way they are submitted to a regular Storm cluster.

Storm/Mesos provides resource isolation between topologies. So you don't need to worry about topologies interfering with one another.

Vagrant setup

For local development and familiarizing yourself with Storm/Mesos, please see the Vagrant setup docs.

Mandatory configuration

One of the following (if both are specified, Docker is preferred):
- mesos.executor.uri: Once you fill in the configs and repack the distribution, you need to place the distribution somewhere where Mesos executors can find it. Typically this is on HDFS, and this config is the location of where you put the distribution.
- mesos.container.docker.image: You may use a Docker image in place of the executor URI. Take a look at the Dockerfile in the top-level of this repository for an example of how to use it.
mesos.master.url: URL for the Mesos master.
storm.zookeeper.servers: The location of the ZooKeeper servers to be used by the Storm master.

Optional configuration

mesos.supervisor.suicide.inactive.timeout.secs: Seconds to wait before supervisor to suicides if supervisor has no task to run. Defaults to "120".
mesos.master.failover.timeout.secs: Framework failover timeout in second. Defaults to "2473600".
mesos.allowed.hosts: Allowed hosts to run topology, which takes hostname list as a white list.
mesos.disallowed.hosts: Disallowed hosts to run topology, which takes hostname list as a back list.
mesos.framework.role: Framework role to use. Defaults to "*".
mesos.framework.checkpoint: Enabled framework checkpoint or not. Defaults to false.
mesos.local.file.server.port: Port for the local file server to bind to. Defaults to a random port.
mesos.framework.name: Framework name. Defaults to "Storm!!!".
mesos.framework.user: Framework user to run with Mesos. Defaults to user to run with Storm on Mesos.
mesos.framework.principal: Framework principal to use to register with Mesos
mesos.framework.secret.file: Location of file that contains the principal's secret. Secret cannot end with a NL.
mesos.prefer.reserved.resources: Prefer reserved resources over unreserved (i.e., "*" role). Defaults to "true".
mesos.logviewer.sidecar.enabled: Default is "true". If you disable this setting, you will want to launch a logviewer process on each worker and nimbus host under supervision if you want to view logs in the Storm UI.

Resource configuration

topology.mesos.worker.cpu: CPUs per worker. Defaults to "1".
topology.mesos.worker.mem.mb: Memory (in MiB) per worker. Defaults to "1000".
- worker.childopts: Use this for JVM opts. You should have about 20-25% memory overhead for each task. For example, with -Xmx1000m, you should set topology.mesos.worker.mem.mb: 1200. By default this is platform dependent.
topology.mesos.executor.cpu: CPUs per executor. Defaults to "0.1".
topology.mesos.executor.mem.mb: Memory (in MiB) per executor. Defaults to "500".
- supervisor.childopts: Use this for executor (aka supervisor) JVM opts. You should have about 20-25% memory overhead for each task. For example, with -Xmx500m, you should set topology.mesos.executor.mem.mb: 620. By default this is platform dependent.

Automatic Launching of Logviewer

Storm-on-mesos supports automatically launching the logviewer process on each mesos worker host.

The logviewer is launched as a Mesos executor that acts as a "sidecar container" -- one logviewer is launched on each host that holds a Storm Worker for a particular framework.

Caveats:

The logviewer TCP port should not be one of those managed by Mesos and offered to the frameworks in the cluster. e.g., you can use port 8000, which is the default in Storm.
The logviewer TCP port must be unique for each Storm framework that runs in a given Mesos cluster. e.g., 8000 for one framework, and 8001 for another framework.

Configurations:

mesos.logviewer.sidecar.enabled: Set to "true" (which is the default).
logviewer.port: Set to a chosen port number, such as 8000 (which is the default).

Note that the storm-on-mesos framework attempts to discover missing logviewers and launch them, recording the logviewer processes into ZooKeeper in a subdirectory of storm.zookeeper.root (as configured in your storm.yaml configuration file). Specifically, it is recorded in {storm.zookeeper.root}/storm-mesos/logviewers/.

Running Storm on Marathon

To get started quickly, you can run Storm on Mesos with Marathon and Docker, provided you have Mesos-DNS configured in your cluster. If you're not using Mesos-DNS, set the MESOS_MASTER_ZK environment variable to point to your ZooKeeper cluster. Included is a script (bin/run-with-marathon.sh) which sets the necessary config parameters, and starts the UI and Nimbus. Since Storm writes stateful data to disk, you may want to consider mounting an external volume for the storm.local.dir config param, and pinning Nimbus to a particular host.

It is also possible to add command line parameter to both the ui and nimbus through STORM_UI_OPTS and STORM_NIMBUS_OPTS respectadly:

STORM_NIMBUS_OPTS="-c storm.local.dir=/my/mounted/volume -c topology.mesos.worker.cpu=1.5"

You can run this from Marathon, using the example app JSON below:

{
  "id": "storm-nimbus",
  "cmd": "./bin/run-with-marathon.sh",
  "cpus": 1.0,
  "mem": 1024,
  "ports": [0, 1],
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "mesos/storm",
      "network": "HOST",
      "forcePullImage":true
    }
  },
  "healthChecks": [
    {
      "protocol": "HTTP",
      "portIndex": 0,
      "path": "/",
      "gracePeriodSeconds": 120,
      "intervalSeconds": 20,
      "maxConsecutiveFailures": 3
    }
  ]
}

Running an example topology

Once Nimbus is running, you can launch one of the storm-starter topologies that is present in the examples/ dir of the storm release tarballs. So you'd download the appropriate version of storm to a machine with access to your storm cluster, then expand the tarball and cd into the resultant directory, from which you will run a command like the one below.

However, first you'll need to know the Thrift host and API port. In the Marathon example above, the port will be the second one assigned by Marathon. For example, if the host is 10.0.0.1 and second port is 32001, run:

$ ./bin/storm jar -c nimbus.host=10.0.0.1 -c nimbus.thrift.port=32001 examples/storm-starter/storm-starter-topologies-1.0.2.jar org.apache.storm.starter.WordCountTopology word-count

Running without Marathon

If you'd like to run the example above without Marathon, you can do so by specifying 2 required ports, the MESOS_SANDBOX path, and running the container. For example:

$ docker run -i --net=host -e PORT0=10000 -e PORT1=10001 -e MESOS_SANDBOX=/var/log/storm -t mesos/storm ./bin/run-with-marathon.sh

storm's People

Contributors

Stargazers

Watchers

Forkers

pawelchcki shjgiser tadasv drewrobb kensipe viktortnk jdef benwhitehead chengweiv5 helianthuslulu hobofan nikolayvoronchikhin strat0sphere darinj groupon mbrukman tnachen erikdw pdread100 tysonnorris jiangzhonghui wking1986 salimane echinthaka bixiyan klucar maverick2202 jessicalhartog barricadeio rockflying debugger87 choang dskarthick netallycode ikerlan-digital cloudxtreme mrbar42 lfundaro dbenavraham1 chenqiangzhishen digideskio spritekin benbenzhu12321 nagas dkrul lnmix huizhilu nagyist srishtyagrawal s-gelazevicius chjhaijiang fuji-151a revatiy26 gabs78 changreytang michaelmoss leusonmario srikanth-viswanathan green-branch tool-recommender-bot isabella232 alexrogalskiy

storm's Issues

add unit test for more than 1 storm workers being launched in 1 mesos executor

We currently lack unit tests for the situation with "more than 1 storm workers being launched in 1 mesos executor". As a result we've made an ugly chain of changes related to supporting this behavior:

PR #65 created a subtle regression where we could never launch more than 1 worker in a given executor (so at most 1 worker process per topology per host).
PR #78 attempted to fix this issue, but also added a new bug, where we subtract executor resources from offers more than necessary.
PR #79 attempted to fix the bug introduced in PR #78, but it introduced another bug (a NullPointerException crash, where after the first pass through the computeResourcesForSlot() main loop, the executorCpuResources and executorMemResources variables are set to null and because subtractedExecutorResources was set to true on the 1st pass).
PR #82 then is attempting to fix the NPE introduced in PR #79.

So we should add a unit test that prevents these kinds of regressions.

The missing link

I've been migrating my storm infrastructure over to be run by mesos and now I'm ready for the final test -- run the starter topology. Problem is there's absolutely no information anywhere on the web about how to actually do this. All links are either dead or point to something irrelevant.

So from your page:
" ./bin/storm jar -c nimbus.host=10.0.0.1 -c nimbus.thrift.port=32001 examples/storm-starter/storm-starter-topologies-0.9.6.jar storm.starter.WordCountTopology word-count"

given the nimbus is off on a remote docker container and there is no local storm jar to run "storm jar against" how does one actually execute this command?

correct how the storm-mesos tgz file is named

Currently creates apache-storm...tgz instead of storm-mesos...tgz and when deployed it fails.

Dockerize supervisor

There needs to be a docker image to launch a supervisor using container info, for isolation purposes.

docker build inefficiency - maven artifacts not cached across runs

As of PR #65 , the standard way of building docker images for this project has changed. Now the bin/build-release.sh script is invoked as part of building the docker image. This means that every time the code is changed as you iterate, when you build the docker image all of the maven artifacts have to be downloaded again. We should figure out some way to cache these artifacts across build runs.

Here are some potentially helpful links:

Wrong storm.yaml downloaded from nimbus

I am using http://downloads.mesosphere.io/storm/storm-mesos-0.9.2-incubating.tgz. Issue is that nimbus has the it under conf/storm.yaml . However, when worker starts to fetch it is looking into storm.yaml (not inside conf).

This results in download error and topology doesn't start. This worked fine with storm-mesos-0.9. Any help will be appreciated why worker looking into wrong directory in 0.9.2 version.

Support for highly available nimbus upcoming feature in storm 1.0

Jumping the gun here, but storm ~~0.11~~ 1.0 supports HA nimbus when released: apache/storm#354. It would be extremely nice if the mesos storm framework supported this directly. I see two options:

A) Just allow the user to run multiple storm-mesos nimbus processes directly, and configure the HA options separately if desired. Make sure that the framework leader and associated state follows the elected nimbus master.

B) Split the framework scheduler and the actual nimbus process, so that the scheduler will launch N nimbus instances as mesos tasks.

Seems like choice A) would be a lot simpler to implement, but I'm not really sure. A) would certainly not be any more difficult to deploy assuming that the user is already using a framework like marathon that could provision multiple instances of nimbus (and would not re-implement this responsibility).

Configurable worker CPU/mem per topology.

I know - this makes harder to determine available slots, but would be very useful in terms of executing topology with required resources.
Probably that means that by default available slots is 0 and new slot is created at time topology is submitted to storm cluster. Then determine required resources and allocate them if possible, or allocate when it becomes available.

build-release.sh fails

I need a recent version of storm-mesos (packing at least storm 0.9.1) but fail to find it anywhere so i'm trying to build a release myself.

Launching build-release.sh fails with the following output:

++ rm -rf _release

echo
++ mkdir -p _release
echo
./bin/build-release.sh: line 18: $1: unbound variable
echo
cd _release
++ unzip storm.zip
unzip: cannot find or open storm.zip, storm.zip.zip or storm.zip.ZIP.
echo
++ rm storm.zip
rm: storm.zip: No such file or directory
echo
++ mv 'apache-storm_' storm
mv: rename apache-storm_ to storm: No such file or directory
echo
cd ..
++ rm 'release/storm/.jar'
rm: release/storm/.jar: No such file or directory
echo
++ cp target/original-storm-0.9.2-incubating.jar target/storm-0.9.2-incubating.jar _release/storm/lib/
usage: cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file target_file
cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file ... target_directory
echo
++ cp bin/storm-mesos _release/storm/bin/
cp: directory _release/storm/bin does not exist
echo
++ mkdir -p _release/storm/native
echo
++ cp storm.yaml _release/storm/conf/storm.yaml
cp: _release/storm/conf/storm.yaml: No such file or directory
echo
cd _release
++ mv storm storm-mesos-0.9.2-incubating
echo
++ tar czf storm-mesos-0.9.2-incubating.tgz storm-mesos-0.9.2-incubating
echo
++ cp storm-mesos-0.9.2-incubating.tgz ../
echo
cd ..

The _release folder is populated but missing a lot of files.

Am i missing something important here ?

Thank you.

support using WebHDFS to serve storm-mesos tarball

PR #57 was an attempt to support use of WebHDFS to fetch the storm-mesos tarball. However, the implementation in #57 was not ideal for non-WebHDFS use cases, since it avoids using the Mesos Fetcher, and thus prevents any benefits of caching, etc. that are added to the Fetcher.

As to why we need any special handling for using WebHDFS, it's a bit complicated as you can read here and here. Basically there are some deficiencies in Mesos and WebHDFS which prevent using the Mesos Fetcher to download a tarball from WebHDFS. The Mesos deficiencies have some tickets for them already, but I don't think a ticket exists yet for WebHDFS.

storm-mesos versioning

@brndnmtthws @tnachen @DarinJ @dsKarthick @JessicaLHartog

I'm opening this issue to create a place for us to brainstorm and decide on a new versioning scheme for this project.

Current version scheme

The version is currently just directly set to the version of storm being "used". Where used means:

included as build dependency of the java bits in this project
bundling of the storm binary bits into this project's resultant package

This unfortunately doesn't allow us to vary the version of this project as we make changes to the code that are independent of the storm version.

Simplistic storm-independent versioning

The first idea is to simply set the version at say 0.1.0 to start, and then we increment the version as we make changes to the code. However, this limits us to having only ever 1 version of storm and mesos within a given "version" of the code.

Supporting multiple storm versions

Per this discussion in #68, we are planning to support multiple versions of storm from a given version of the code in this project. This implies that we should be somehow tagging / naming / versioning the resultant package with the storm version used.

Supporting multiple mesos versions

We could also potentially build against multiple mesos versions, in case there are backwards incompatible changes.

Docker image versioning

I'm admittedly ignorant about Docker image versioning, but when I look at the builds on Dockerhub, I cannot even figure out what commit SHA the image was built from. I feel like we can do better with the versioning there, ideally taking advantage of the conclusion of this discussion.

Proposal

Use maven classifiers and follow the maven artifact description scheme of suffixing the artifact description with the classifier string. Whoa that's a mouthful, let me give an example:

Basically, an example classifier could be: storm0.9.6. Then the artifact & resultant package could be called:

maven artifact: storm-mesos-0.1.0-storm0.9.6.jar
built package: storm-mesos-0.1.0-storm0.9.6.tar.gz

Then if we decided to start tweaking the mesos version as well, we could make this even longer:

built package: storm-mesos-0.1.0-storm0.9.6-mesos0.25.0.tar.gz

improve handling of mesos version in the build

As noted in PR #91, we have a couple of manual modifications that are necessary when changing the mesos version:

pom.xml's mesos.version property needs to be updated so that the package's included mesos dependency is correct.
(only applicable when using Docker image) we need to update the version of mesos in the Dockerfile.

Update `pom.xml`'s `mesos.version`

We should consider some mechanism for updating the mesos.version property without resorting to something like sed. e.g., maybe one of these Maven plugins:

properties
versions
replacer

(DONE) Update `Dockerfile`'s mesos version:

Ideally we would just use a vanilla ubuntu Docker image as the FROM, and then install the mesos package for the chosen mesos version. Unfortunately, the mesos packages published by Mesosphere have an unpredictable superfluous identifier in the names, which prevents simply doing something like this:

FROM ubuntu:14.04
...
RUN apt-get install -y mesos=0.26.0-0.2.145.ubuntu1404

So instead of this we can resort to using sed to update the base image in the Dockerfile, using a lookup-table from bin/build-release.sh to choose the appropriate base image for each supported mesos version.

Connection Refused: storm-mesos

Trying to follow this - https://mesosphere.com/docs/tutorials/run-storm-on-mesos/#step-1

./bin/storm-mesos nimbus

2014-11-17 16:33:52 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=localhost:2181 sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@5b4aa17
2014-11-17 16:33:52 o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181
2014-11-17 16:33:52 o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_67]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) ~[na:1.7.0_67]
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) ~[zookeeper-3.3.3.jar:3.3.3-1073969]
2014-11-17 16:33:53 o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181
2014-11-17 16:33:53 o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

bash-3.2$ cat conf/storm.yaml 
# Please change these for your cluster 
# to reflect your cluster settings
# -----------------------------------------------------------
mesos.master.url: "localhost:5050"
storm.zookeeper.servers:
    - "localhost"
nimbus.host: "localhost"
# -----------------------------------------------------------

# You should not need to change anything below this line
#--------------------------------------------------------

# Use the public Mesosphere Storm build
# Please note that it won't work with other distributions
mesos.executor.uri: "http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz"

# Use Netty to avoid ZMQ dependencies
storm.messaging.transport: "backtype.storm.messaging.netty.Context"

storm.local.dir: "storm-local"

mesos.framework.role: "*"
mesos.framework.checkpoint: false

#mesos.allowed.hosts:
#  - host1
#mesos.disallowed.hosts:
#  - host1
bash-3.2$

Note that mesos primary is running at port 5050 on the localhost.

Tried couple of things to get this to work, but no success...

export JAVA_OPTS="-Djava.net.preferIPv4Stack=true"

Storm should detect correct nimbus.host on mutli-master mesos cluster

My setup:
3 masters (mesos, marathon, chronos, storm) - ubuntu trusty
ips: 192.168.56.{10,11,12} - MASTER(1,2,3)

Using latest build (storm-mesos 0.9.3) + mesos 0.21.1
Running storm UI on all masters
Nimbus get started from Master1

After startup mesos elects MASTER2 to be leader.

storm.yaml

mesos.master.url: "zk://192.168.56.10:2181,192.168.56.11:2181,192.168.56.12:2181/mesos"

storm.zookeeper.servers:
  - 192.168.56.10
  - 192.168.56.11
  - 192.168.56.12

nimbus.host: 192.168.56.11  
mesos.executor.uri: "//opt/storm-mesos-0.9.3.tgz"

storm.messaging.transport: "backtype.storm.messaging.netty.Context"

storm.local.dir: "/opt/storm_local"
storm.log.dir: "/var/log/storm"

ui.port: 9998

mesos.framework.role: "*"
mesos.framework.checkpoint: false
mesos.framework.name: "Storm 0.9.3"

When i correcly specify MASTER2(current leader) as nimbus.host - UI works, but when i use MASTER1 or MASTER3 ip i get connection refused. My problem is that setup is automated (Ansible) and dont know which master get elected.

Dockerized workers having issues starting container

I have been trying to get the dockerized workers working but seems like the workers are not properly starting on mesos.

Here[1] is my storm.yaml. I'm getting this[2] error. Seems like MESOS_NATIVE_JAVA_LIBRARY is not getting passed into the place thats creating the docker jobs. Can someone please help?

[1]

mesos.master.url: "zk://my-zk-1.example.com:2181/mesos"                                                
storm.zookeeper.servers:                                                                                               
  - my-zk-1.example.com                                                                              

nimbus.host: "my-nimbus.example.com"                                                                 
# -----------------------------------------------------------                                                          

# Worker resources                                                                                       
topology.mesos.worker.cpu: 1.0                                                                           
# Worker heap with 20% overhead                                                                          
topology.mesos.worker.mem.mb: 1200                                                                       
worker.childopts: "-Xmx1000m"                                                                            

# Supervisor resources                                                                                   
topology.mesos.executor.cpu: 0.1                                                                         
topology.mesos.executor.mem.mb: 500 # Supervisor memory, with 20% overhead                               
supervisor.childopts: "-Xmx256m"                                                                         

# The default behavior is to launch the logviewer unless autostart is false.                             
# If you enable the logviewer, you'll need to add memory overhead to the                                 
# executor for the logviewer.                                                                            
logviewer.port: 8000                                                                                     
logviewer.childopts: "-Xmx128m"                                                                          
logviewer.cleanup.age.mins: 10080                                                                        
logviewer.appender.name: "A1"                                                                            
supervisor.autostart.logviewer: true                                                                     

mesos.container.docker.image: mesosphere/storm                           

storm.messaging.transport: "backtype.storm.messaging.netty.Context"                                      

storm.local.dir: "storm-local"                                                                           

# role must be one of the mesos-master's roles defined in the --roles flag                               
#                                                                                                        
mesos.framework.role: "*"                                                                                
mesos.framework.checkpoint: true                                                                         
mesos.framework.name: "Storm"

[2]

java.lang.UnsatisfiedLinkError: Expecting an absolute path of the library: null
    at java.lang.Runtime.load0(Runtime.java:792) ~[na:1.7.0_85]
    at java.lang.System.load(System.java:1062) ~[na:1.7.0_85]
    at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:159) ~[storm-0.9.6.jar:na]
    at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:188) ~[storm-0.9.6.jar:na]
    at org.apache.mesos.MesosExecutorDriver.<clinit>(MesosExecutorDriver.java:52) ~[storm-0.9.6.jar:na]
    at storm.mesos.MesosSupervisor.prepare(MesosSupervisor.java:67) [storm-0.9.6.jar:na]
    at backtype.storm.daemon.supervisor$fn__5114$exec_fn__1104__auto____5115.invoke(supervisor.clj:407) ~[storm-core-0.9.6.jar:0.9.6]
    at clojure.lang.AFn.applyToHelper(AFn.java:167) ~[clojure-1.5.1.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.5.1.jar:na]
    at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
    at backtype.storm.daemon.supervisor$fn__5114$mk_supervisor__5140.doInvoke(supervisor.clj:405) [storm-core-0.9.6.jar:0.9.6]
    at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:630) [storm-core-0.9.6.jar:0.9.6]
    at backtype.storm.daemon.supervisor.launch(Unknown Source) [storm-core-0.9.6.jar:0.9.6]
    at storm.mesos.MesosSupervisor.main(MesosSupervisor.java:50) [storm-0.9.6.jar:na]
2015-11-24T07:10:05.758+0000 b.s.util [ERROR] Halting process: ("Error on initialization")
java.lang.RuntimeException: ("Error on initialization")
    at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.6.jar:0.9.6]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.supervisor$fn__5114$mk_supervisor__5140.doInvoke(supervisor.clj:405) [storm-core-0.9.6.jar:0.9.6]
    at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:630) [storm-core-0.9.6.jar:0.9.6]
    at backtype.storm.daemon.supervisor.launch(Unknown Source) [storm-core-0.9.6.jar:0.9.6]
    at storm.mesos.MesosSupervisor.main(MesosSupervisor.java:50) [storm-0.9.6.jar:na]

bin/storm-mesos does not check exit status

The storm-mesos wrapper in the bin directory ignores the exit status of storm.

storm-mesos 0.9.3 download location

Seems like http://downloads.mesosphere.io/storm/storm-mesos-0.9.3.tgz is still not available. Do we know when its gonna be available? In the mean while I created my own release and uploaded to a local location. But like to get it from mesosphere location.

Only one worker node is started

Not sure if this is the right place to ask this question, but I didn't find a mailing list specific to this project.

I have the mesos-storm framework running via marathon as per the instructions in the readme. I am able to run topologies successfully. However, it seems that only one storm worker (aka mesos task). I was expecting this to be controlled by the config when submitting the topology. e.g. I would have expected the following to launch 4 worker nodes.

val topology = ...
val conf = new Config()
conf.setDebug(true)
conf.setNumWorkers(4)
StormSubmitter.submitTopologyWithProgressBar("word-counter-drpc", conf, topology.build())

Is there some other configuration needed to launch multiple workers?

add conf directory

I would like to add a conf directory to the repo to put the storm.yaml in and a log4j.xml so the supervisor gets deployed with logging turned on. This will also help bring the configuration in line with storm's.

improve docker image tag name

@DarinJ had the idea to embed the storm/mesos versions into the tag (possibly in the version).
Benefit: allow it to be trivial to release a few different images on dockerhub.
Related to #18, #83, #92.

Worker launch command contains references to non-existent & invalid directories

Commit I am working on:
a49e8ce

Actions:

vagrant up
start nimbus
submit a topology

Problem:
Following is the ps output of a running worker. The directories like /mnt/mesos/sandbox/logs, /opt/storm/lib/, etc. do not exist.

root     12914  5.8  7.0 2963108 285176 ?      Sl   17:36  15:19 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Xmx384m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump -Djava.library.path=storm-local/supervisor/stormdist/SampleTopology-1-1458074947/resources/Linux-amd64:storm-local/supervisor/stormdist/SampleTopology-1-1458074947/resources:/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=worker-31000.log -Dstorm.home=/opt/storm -Dstorm.conf.file= -Dstorm.options=storm.log.dir%3D%2Fmnt%2Fmesos%2Fsandbox%2Flogs -Dstorm.log.dir=/mnt/mesos/sandbox/logs -Dlogback.configurationFile=/opt/storm/logback/cluster.xml -Dstorm.id=SampleTopology-1-1458074947 -Dworker.id=869a17ff-9121-432b-bd43-ae4dfec81ef0 -Dworker.port=31000 -cp /opt/storm/lib/commons-io-2.4.jar:/opt/storm/lib/commons-codec-1.6.jar:/opt/storm/lib/commons-exec-1.1.jar:/opt/storm/lib/reflectasm-1.07-shaded.jar:/opt/storm/lib/storm-core-0.9.6.jar:/opt/storm/lib/disruptor-2.10.4.jar:/opt/storm/lib/commons-fileupload-1.2.1.jar:/opt/storm/lib/ring-core-1.1.5.jar:/opt/storm/lib/json-simple-1.1.jar:/opt/storm/lib/clj-stacktrace-0.2.2.jar:/opt/storm/lib/tools.macro-0.1.0.jar:/opt/storm/lib/math.numeric-tower-0.0.1.jar:/opt/storm/lib/jetty-6.1.26.jar:/opt/storm/lib/tools.logging-0.2.3.jar:/opt/storm/lib/asm-4.0.jar:/opt/storm/lib/commons-lang-2.5.jar:/opt/storm/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/lib/clout-1.0.1.jar:/opt/storm/lib/core.incubator-0.1.0.jar:/opt/storm/lib/carbonite-1.4.0.jar:/opt/storm/lib/logback-classic-1.0.13.jar:/opt/storm/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/lib/jgrapht-core-0.9.0.jar:/opt/storm/lib/tools.cli-0.2.4.jar:/opt/storm/lib/ring-servlet-0.3.11.jar:/opt/storm/lib/chill-java-0.3.5.jar:/opt/storm/lib/hiccup-0.3.6.jar:/opt/storm/lib/objenesis-1.2.jar:/opt/storm/lib/clojure-1.5.1.jar:/opt/storm/lib/snakeyaml-1.11.jar:/opt/storm/lib/kryo-2.21.jar:/opt/storm/lib/storm-mesos-0.1.0-SNAPSHOT-storm0.9.6-mesos0.27.0.jar:/opt/storm/lib/joda-time-2.0.jar:/opt/storm/lib/ring-devel-0.3.11.jar:/opt/storm/lib/commons-logging-1.1.3.jar:/opt/storm/lib/minlog-1.2.jar:/opt/storm/lib/servlet-api-2.5.jar:/opt/storm/lib/compojure-1.1.3.jar:/opt/storm/lib/slf4j-api-1.7.5.jar:/opt/storm/lib/logback-core-1.0.13.jar:/opt/storm/lib/jetty-util-6.1.26.jar:/opt/storm/lib/clj-time-0.4.1.jar:/opt/storm/lib/jline-2.11.jar:/opt/storm/conf:storm-local/supervisor/stormdist/SampleTopology-1-1458074947/stormjar.jar backtype.storm.daemon.worker SampleTopology-1-1458074947 master 31000 869a17ff-9121-432b-bd43-ae4dfec81ef0

And making that readable instead of 1 fat line:

% echo '/usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Xmx384m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump -Djava.library.path=storm-local/supervisor/stormdist/SampleTopology-1-1458074947/resources/Linux-amd64:storm-local/supervisor/stormdist/SampleTopology-1-1458074947/resources:/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=worker-31000.log -Dstorm.home=/opt/storm -Dstorm.conf.file= -Dstorm.options=storm.log.dir%3D%2Fmnt%2Fmesos%2Fsandbox%2Flogs -Dstorm.log.dir=/mnt/mesos/sandbox/logs -Dlogback.configurationFile=/opt/storm/logback/cluster.xml -Dstorm.id=SampleTopology-1-1458074947 -Dworker.id=869a17ff-9121-432b-bd43-ae4dfec81ef0 -Dworker.port=31000 -cp /opt/storm/lib/commons-io-2.4.jar:/opt/storm/lib/commons-codec-1.6.jar:/opt/storm/lib/commons-exec-1.1.jar:/opt/storm/lib/reflectasm-1.07-shaded.jar:/opt/storm/lib/storm-core-0.9.6.jar:/opt/storm/lib/disruptor-2.10.4.jar:/opt/storm/lib/commons-fileupload-1.2.1.jar:/opt/storm/lib/ring-core-1.1.5.jar:/opt/storm/lib/json-simple-1.1.jar:/opt/storm/lib/clj-stacktrace-0.2.2.jar:/opt/storm/lib/tools.macro-0.1.0.jar:/opt/storm/lib/math.numeric-tower-0.0.1.jar:/opt/storm/lib/jetty-6.1.26.jar:/opt/storm/lib/tools.logging-0.2.3.jar:/opt/storm/lib/asm-4.0.jar:/opt/storm/lib/commons-lang-2.5.jar:/opt/storm/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/lib/clout-1.0.1.jar:/opt/storm/lib/core.incubator-0.1.0.jar:/opt/storm/lib/carbonite-1.4.0.jar:/opt/storm/lib/logback-classic-1.0.13.jar:/opt/storm/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/lib/jgrapht-core-0.9.0.jar:/opt/storm/lib/tools.cli-0.2.4.jar:/opt/storm/lib/ring-servlet-0.3.11.jar:/opt/storm/lib/chill-java-0.3.5.jar:/opt/storm/lib/hiccup-0.3.6.jar:/opt/storm/lib/objenesis-1.2.jar:/opt/storm/lib/clojure-1.5.1.jar:/opt/storm/lib/snakeyaml-1.11.jar:/opt/storm/lib/kryo-2.21.jar:/opt/storm/lib/storm-mesos-0.1.0-SNAPSHOT-storm0.9.6-mesos0.27.0.jar:/opt/storm/lib/joda-time-2.0.jar:/opt/storm/lib/ring-devel-0.3.11.jar:/opt/storm/lib/commons-logging-1.1.3.jar:/opt/storm/lib/minlog-1.2.jar:/opt/storm/lib/servlet-api-2.5.jar:/opt/storm/lib/compojure-1.1.3.jar:/opt/storm/lib/slf4j-api-1.7.5.jar:/opt/storm/lib/logback-core-1.0.13.jar:/opt/storm/lib/jetty-util-6.1.26.jar:/opt/storm/lib/clj-time-0.4.1.jar:/opt/storm/lib/jline-2.11.jar:/opt/storm/conf:storm-local/supervisor/stormdist/SampleTopology-1-1458074947/stormjar.jar backtype.storm.daemon.worker SampleTopology-1-1458074947 master 31000 869a17ff-9121-432b-bd43-ae4dfec81ef0' | tr ' ' '\n'
/usr/lib/jvm/java-7-openjdk-amd64/bin/java
-server
-Xmx384m
-XX:+PrintGCDetails
-Xloggc:artifacts/gc.log
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1M
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=artifacts/heapdump
-Djava.library.path=storm-local/supervisor/stormdist/SampleTopology-1-1458074947/resources/Linux-amd64:storm-local/supervisor/stormdist/SampleTopology-1-1458074947/resources:/usr/local/lib:/opt/local/lib:/usr/lib
-Dlogfile.name=worker-31000.log
-Dstorm.home=/opt/storm
-Dstorm.conf.file=
-Dstorm.options=storm.log.dir%3D%2Fmnt%2Fmesos%2Fsandbox%2Flogs
-Dstorm.log.dir=/mnt/mesos/sandbox/logs
-Dlogback.configurationFile=/opt/storm/logback/cluster.xml
-Dstorm.id=SampleTopology-1-1458074947
-Dworker.id=869a17ff-9121-432b-bd43-ae4dfec81ef0
-Dworker.port=31000
-cp
/opt/storm/lib/commons-io-2.4.jar:/opt/storm/lib/commons-codec-1.6.jar:/opt/storm/lib/commons-exec-1.1.jar:/opt/storm/lib/reflectasm-1.07-shaded.jar:/opt/storm/lib/storm-core-0.9.6.jar:/opt/storm/lib/disruptor-2.10.4.jar:/opt/storm/lib/commons-fileupload-1.2.1.jar:/opt/storm/lib/ring-core-1.1.5.jar:/opt/storm/lib/json-simple-1.1.jar:/opt/storm/lib/clj-stacktrace-0.2.2.jar:/opt/storm/lib/tools.macro-0.1.0.jar:/opt/storm/lib/math.numeric-tower-0.0.1.jar:/opt/storm/lib/jetty-6.1.26.jar:/opt/storm/lib/tools.logging-0.2.3.jar:/opt/storm/lib/asm-4.0.jar:/opt/storm/lib/commons-lang-2.5.jar:/opt/storm/lib/ring-jetty-adapter-0.3.11.jar:/opt/storm/lib/clout-1.0.1.jar:/opt/storm/lib/core.incubator-0.1.0.jar:/opt/storm/lib/carbonite-1.4.0.jar:/opt/storm/lib/logback-classic-1.0.13.jar:/opt/storm/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/lib/jgrapht-core-0.9.0.jar:/opt/storm/lib/tools.cli-0.2.4.jar:/opt/storm/lib/ring-servlet-0.3.11.jar:/opt/storm/lib/chill-java-0.3.5.jar:/opt/storm/lib/hiccup-0.3.6.jar:/opt/storm/lib/objenesis-1.2.jar:/opt/storm/lib/clojure-1.5.1.jar:/opt/storm/lib/snakeyaml-1.11.jar:/opt/storm/lib/kryo-2.21.jar:/opt/storm/lib/storm-mesos-0.1.0-SNAPSHOT-storm0.9.6-mesos0.27.0.jar:/opt/storm/lib/joda-time-2.0.jar:/opt/storm/lib/ring-devel-0.3.11.jar:/opt/storm/lib/commons-logging-1.1.3.jar:/opt/storm/lib/minlog-1.2.jar:/opt/storm/lib/servlet-api-2.5.jar:/opt/storm/lib/compojure-1.1.3.jar:/opt/storm/lib/slf4j-api-1.7.5.jar:/opt/storm/lib/logback-core-1.0.13.jar:/opt/storm/lib/jetty-util-6.1.26.jar:/opt/storm/lib/clj-time-0.4.1.jar:/opt/storm/lib/jline-2.11.jar:/opt/storm/conf:storm-local/supervisor/stormdist/SampleTopology-1-1458074947/stormjar.jar
backtype.storm.daemon.worker
SampleTopology-1-1458074947
master
31000
869a17ff-9121-432b-bd43-ae4dfec81ef0

Worker jobs getting lost since they cannot download storm conf file

I launched my first topology and seems like the worker processes can not come up since they can not download the conf file. Here is what I see in the logs.

I0901 00:16:35.558715 19563 fetcher.cpp:214] Fetching URI 'http://hdfs-nn.myhost.com:50070/webhdfs/v1/binaries/storm-mesos/storm-mesos-0.9.3.tgz?op=OPEN'
I0901 00:16:35.558866 19563 fetcher.cpp:125] Fetching URI 'http://hdfs-nn.myhost.com:50070/webhdfs/v1/binaries/storm-mesos/storm-mesos-0.9.3.tgz?op=OPEN' with os::net
I0901 00:16:35.558888 19563 fetcher.cpp:135] Downloading 'http://hdfs-nn.myhost.com:50070/webhdfs/v1/binaries/storm-mesos/storm-mesos-0.9.3.tgz?op=OPEN' to '/mesos/workLogs/slaves/20150730-232738-4076896266-5050-228496-S3/frameworks/20150825-011215-4076896266-5050-231386-0000/executors/SchemaChangeNotificationTopology-1-1441066588/runs/331c3f07-464a-4767-bdb2-4360fe9ce4d6/storm-mesos-0.9.3.tgz?op=OPEN'
I0901 00:16:36.660876 19563 fetcher.cpp:214] Fetching URI 'http://nimbus.myhost.com:39163/conf/storm.yaml'
I0901 00:16:36.660948 19563 fetcher.cpp:125] Fetching URI 'http://nimbus.myhost.com:39163/conf/storm.yaml' with os::net
I0901 00:16:36.660975 19563 fetcher.cpp:135] Downloading 'http://nimbus.myhost.com:39163/conf/storm.yaml' to '/mesos/workLogs/slaves/20150730-232738-4076896266-5050-228496-S3/frameworks/20150825-011215-4076896266-5050-231386-0000/executors/SchemaChangeNotificationTopology-1-1441066588/runs/331c3f07-464a-4767-bdb2-4360fe9ce4d6/storm.yaml'
E0901 00:16:36.726596 19563 fetcher.cpp:141] Error downloading resource, received HTTP/FTP return code 404
Failed to fetch: http://nimbus.myhost.com:39163/conf/storm.yaml
Failed to synchronize with slave (it's probably exited)

The command I use to deploy: ./storm jar ~/Desktop/mytopology.jar com.chinthaka.org.MyAwesomeTopology

~/.storm/storm.yaml file

storm.zookeeper.servers:
- zk.myhost.com
storm.zookeeper.port: 2181
nimbus.host: nimbus.myhost.com

May be this is minor but the worker process is marked as LOST instead of KILLED or FAILED in mesos UI.

Storm framework consumes all mesos resources

Whenever I start the storm-mesos nimbus framework in mesos, it grabs all the available resources, without a single topology submitted. In my little 3-node mesos-slave cluster I have 3.3 CPUs and 13.5GB Mem free before I start the storm-mesos nimbus, and 0 CPU's and 0 Mem immediately after. This is as reported on the mesos console.

Switching to the Frameworks tab on the mesos console shows Storm with 0 Active Tasks, 3.1 CPUs and 11.5GB Memory.

Before Storm

After Storm

The nimbus log does report that it is declining offers every 10 seconds or so ...

2015-06-24T17:59:36.377+0000 s.m.MesosNimbus [INFO] Declining offers because no topologies need assignments
2015-06-24T17:59:46.388+0000 s.m.MesosNimbus [INFO] Declining offers because no topologies need assignments

Is it normal to expect this? I expected to consume no resources until a topology is submitted, then it would consume whatever the topology required.

Code style checker

I think project would benefit from a code style checker.

automate release tagging and building

We will soon have an actual versioning scheme implemented in this project (see #18, #76, #77).
When #77 is done, we can follow the process documented in #18 to perform releases.

However, it would be way snazzier if we could leverage travis-ci.org to:

automatically build the framework artifacts/packages/tarballs
tag the code with the version
upload the resultant framework packages/tarballs into the github repo

Logviewer ?

I would like to access the storm logviewer on one of the supervisors.
How do I do that?

(AFAIK the supervisors are started by mesos)

Thanks

Storm Authentication?

Is Storm authentication being worked? Plans? Ideas?

Allow topologies to be executed as specific users

Some users want the option of running their topologies as a specific user. This on the surface would be easy to do by setting the FrameworkInfo.setUser() to that user. However that would force the Supervisor and LogViewer (PR) to run as that user as well. There may be scenarios where this is not advisable.

It appears that if CommandInfo.setUser() is set then the supervisor/topologies will be run as that user. I know mesos/myriad has gone through extensive coding to get this right, including chown of the installation directories.

Is code from master branch supposed to work out of the box?

I tried building storm-mesos myself from master branch and it seems that it's missing very important code. It never starts mesos scheduler driver thus, it never registers with mesos master. Is this intentional or I'm missing something here. I was redirected to this repo from https://github.com/nathanmarz/storm-mesos.

I'm still working on a MesosNimbus.java patch.

Upgrade to storm 0.9.4

Are there any known issues with upgrading to the newer storm?

Clarifications from documentation

Do I need to repackage storm-mesos if I change the configs?

Do I need to fill in these settings?

mesos.allowed.hosts:

- host1

mesos.disallowed.hosts:

- host1

Not sure because the storm.yaml file says I should not need to change anything below certain point.

Are there any default values for all the resource configs or do we need to set them?

Also when I start up the storm-ui tells me I have zero Supervisors, Used slots, Free slots and Total slots. Not sure I need to add anything configs because currently while my topology uploads it fails to run.

Thank you

refactor logging to use slf4j-api style

bin/build-release.sh dockerImage should be passed environment variables

When I run STORM_RELEASE=0.10.0 bin/build-release.sh dockerImage the docker build command that gets run ignores my STORM_RELEASE setting when it runs.

LocalFileServer does not resolve to fully qualified host

The LocalFileServer.java file starts a web server for serving the storm.yaml file. This resolves to host to its local hostname. For our use-case we require the fully qualified hostname including its domain name:

private String getHost() throws Exception {
    final String envHost = System.getenv("MESOS_NIMBUS_HOST");
    if (envHost == null) {
      return InetAddress.getLocalHost().getHostName();
    } else {
      return envHost;
    }
  }

Can getHostName() be changed to getCanonicalHostName()

Concurrent access to state on local FS by multiple supervisors

Hi,
we are running storm-mesos cluster and occassionaly workers die or are "lost" in mesos. When this happens it often coincides with errors in logs related to supervisors local state.

By looking at the storm code it seems this might be caused by the way how multiple supervisor processes access the local state in the same directory via VersionedStore.
For example: https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj#L434

I've filled the bug report on Storm JIRA, but it seems this issue is more with the way storm-mesos launches supervisors as Storm states that multiple supervisors should not run on the same storm-local dir (https://issues.apache.org/jira/browse/STORM-1043).

Some examples of exeptions:

java.lang.RuntimeException: Version already exists or data already exists
at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:85) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:79) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.LocalState.persist(LocalState.java:101) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.LocalState.put(LocalState.java:82) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.LocalState.put(LocalState.java:76) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this7400.invoke(supervisor.clj:382) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) ~[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

java.io.FileNotFoundException: File '/var/lib/storm/supervisor/localstate/1441034838231' does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) ~[commons-io-2.4.jar:2.4]
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) ~[commons-io-2.4.jar:2.4]
at backtype.storm.utils.LocalState.deserializeLatestVersion(LocalState.java:61) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.LocalState.snapshot(LocalState.java:47) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.utils.LocalState.get(LocalState.java:72) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:234) ~[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
at clojure.core$partial$fn4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) ~[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

java.lang.RuntimeException: Couldn't greate generated-conf dir

I'm getting the following on startup of the nimbus. anything I should have set ?

2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:java.io.tmpdir=/tmp
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:java.compiler=<NA>
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:os.name=Linux
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:os.arch=amd64
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:os.version=3.10.0-123.el7.x86_64
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:user.name=storm
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:user.home=/home/storm
2015-11-29 18:00:07 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:user.dir=/opt/storm
2015-11-29 18:00:07 o.m.log [INFO] Logging to Logger[org.mortbay.log] via org.mortbay.log.Slf4jLog
2015-11-29 18:00:07 o.m.log [INFO] jetty-6.1.26
2015-11-29 18:00:07 o.m.log [INFO] Started [email protected]:46649
2015-11-29 18:00:07 s.m.MesosNimbus [INFO] Started HTTP server from which config for the MesosSupervisor's may be fetched. URL: http://idsp-mm-101.local:46649/generated-conf
2015-11-29 18:00:07 s.m.MesosNimbus [ERROR] Failed to prepare scheduler
java.lang.RuntimeException: Couldn't greate generated-conf dir
    at storm.mesos.MesosNimbus.initialize(MesosNimbus.java:162) [original-storm-0.9.5.jar:na]
    at storm.mesos.MesosNimbus.prepare(MesosNimbus.java:123) [original-storm-0.9.5.jar:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_91]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_91]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_91]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_91]
    at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) [clojure-1.5.1.jar:na]
    at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$fn__3724$exec_fn__1103__auto____3725.invoke(nimbus.clj:896) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.applyToHelper(AFn.java:163) [clojure-1.5.1.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
    at clojure.core$apply.invoke(core.clj:617) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$fn__3724$service_handler__3814.doInvoke(nimbus.clj:895) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:421) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:1152) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.nimbus$_launch.invoke(nimbus.clj:1184) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.nimbus.launch(Unknown Source) [storm-core-0.9.5.jar:0.9.5]
    at storm.mesos.MesosNimbus.main(MesosNimbus.java:98) [original-storm-0.9.5.jar:na]
2015-11-29 18:00:07 b.s.d.nimbus [ERROR] Error on initialization of server service-handler
java.lang.RuntimeException: java.lang.RuntimeException: Couldn't greate generated-conf dir
    at storm.mesos.MesosNimbus.prepare(MesosNimbus.java:137) [original-storm-0.9.5.jar:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_91]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_91]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_91]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_91]
    at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.5.1.jar:na]
    at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) ~[clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$fn__3724$exec_fn__1103__auto____3725.invoke(nimbus.clj:896) ~[storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.AFn.applyToHelper(AFn.java:163) ~[clojure-1.5.1.jar:na]
    at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.5.1.jar:na]
    at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$fn__3724$service_handler__3814.doInvoke(nimbus.clj:895) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:421) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:1152) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.nimbus$_launch.invoke(nimbus.clj:1184) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.nimbus.launch(Unknown Source) [storm-core-0.9.5.jar:0.9.5]
    at storm.mesos.MesosNimbus.main(MesosNimbus.java:98) [original-storm-0.9.5.jar:na]
Caused by: java.lang.RuntimeException: Couldn't greate generated-conf dir
    at storm.mesos.MesosNimbus.initialize(MesosNimbus.java:162) [original-storm-0.9.5.jar:na]
    at storm.mesos.MesosNimbus.prepare(MesosNimbus.java:123) [original-storm-0.9.5.jar:na]
    ... 16 common frames omitted
2015-11-29 18:00:07 b.s.util [ERROR] Halting process: ("Error on initialization")
java.lang.RuntimeException: ("Error on initialization")
    at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$fn__3724$service_handler__3814.doInvoke(nimbus.clj:895) [storm-core-0.9.5.jar:0.9.5]
    at clojure.lang.RestFn.invoke(RestFn.java:421) [clojure-1.5.1.jar:na]
    at backtype.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:1152) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.nimbus$_launch.invoke(nimbus.clj:1184) [storm-core-0.9.5.jar:0.9.5]
    at backtype.storm.daemon.nimbus.launch(Unknown Source) [storm-core-0.9.5.jar:0.9.5]
    at storm.mesos.MesosNimbus.main(MesosNimbus.java:98) [original-storm-0.9.5.jar:na]

Thanks

Is there a way to rebalance workers ?

In storm, I could rebalance number of workers as well as executor to increase parallelism without any downtime. How can i do the same in storm mesos ?

e.g. i start topology in storm mesos with #cpu, #mem and num.worker. How do i increase cpu or mem or num.woker without any bringing down the topology ?

I could do following in storm.

http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10

nimbus.host value in the case of multiple mesos masters

When you have multiple mesos masters, which value should be set for the nimbus.host configuration ?
for example, I have mesos.master.url: zk://master1:2181,master2:2181,master3:2181/mesos .

When I set nimbus.host to the hostname on each mesos master running storm-nimbus, I end up with 3 storm framework registered on the mesos ui.

Thanks

Vagrantfile problems

Hi,

There are a few errors in the Vagrantfile
The first error i created a pull request for, the second error:

vm:
* Ports to forward must be 1 to 65535
* Ports to forward must be 1 to 65535
* Ports to forward must be 1 to 65535

The ports for the slave are now bound to 80000 range in the Vagrantfile, Linux systems do not go past 65535 ports. But maybe its better to run both slave, storm and master on the same vagrant box to prevent overlapping of ports on two machines.

Right after that i got the next error:

==> slave: Waiting for machine to boot. This may take a few minutes...
    slave: SSH address: 127.0.0.1:2200
    slave: SSH username: vagrant
    slave: SSH auth method: private key
    slave: Warning: Connection timeout. Retrying...

4th error:

==> master: stdin: is not a tty
==> master: grep: 
==> master: /vagrant/pom.xml
==> master: : No such file or directory
==> master: cp: 
==> master: cannot stat ‘/vagrant/storm-mesos-.tgz’
==> master: : No such file or directory
==> master: /tmp/vagrant-shell: line 10: cd: /vagrant/_release/storm-mesos-: No such file or directory
==> master: /tmp/vagrant-shell: line 11: bin/storm-mesos: No such file or directory

Also a tip:
Maybe its better to run mesos-slave + master + storm on the same host to keep the resources on systems reasonable. You are now starting 2x virtualbox with 4CPU, 4G mem per instance. My macbook came to a crawl instantly when these machines started up.

Download links for storm-mesos releases

Can you please provide clarification on where to pick up releases for storm-mesos. The storm.yaml file references download links to:

http://downloads.mesosphere.io/storm/storm-mesos-0.9.0.1.tgz

I cannot find releases from the official mesosphere web page, and more specifically looking for the storm-mesos-0.9.2 release.

Thanks

refactor copious `String + String + String` styling to use `String.format()`

Related to the discussion : https://github.com/mesos/storm/pull/94/files#r53839345

Logs from mesos/storm repo are lost

When I was playing around with vagrant, I noticed that we are losing the logs generated by mesos/storm repo. That is, the mesos/storm appication logs do not appear in the nimbus.log but logs from storm-core still appear in the nimbus.log.

Ability to set docker user and docker registry in build-release.sh

This should be straight forward using environment variable similar to how MESOS_VERSION and STORM_VERSION were done. This would enable publishing docker images to person github accounts and/or private registries.

Support for storm-0.10.0

Storm 0.10.0 changed a few things that break storm on mesos. In particular:

LocalState.get now returns a TBase object (thrift related) instead of String.
bin/storm is now a bash script instead of python
snakeyaml is no longer in the classpath
commons-lang upgraded to common-lang3 which effects imports

I spent a little time correcting these, and have a really rough working version here. I'll probably clean it up next week as I need it, if there's interest I'll submit a PR. This does break storm 0.9.X though so it may be good to write a shim or place on a separate branch/tag.

'stormconf.ser' does not exist

I am getting below error

2016-01-29 19:47:50.893 b.s.d.worker [ERROR] Error on initialization of server mk-worker
java.io.FileNotFoundException: File '/data/mesos/work/slaves/58476bf6-6b41-4907-be4e-adf496595597-S10/frameworks/58476bf6-6b41-4907-be4e-adf496595597-0015/executors/word-count-2-1454096852/runs/2e3e1455-4960-44ca-97da-937259c25364/storm-mesos/storm-local/supervisor/stormdist/word-count-2-1454096852/stormconf.ser' does not exist
    at org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) ~[storm-core-0.10.0.jar:0.10.0]
    at org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) ~[storm-core-0.10.0.jar:0.10.0]
    at backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:222) ~[storm-core-0.10.0.jar:0.10.0]
    at backtype.storm.daemon.worker$fn__7098$exec_fn__1236__auto____7099.invoke(worker.clj:418) ~[storm-core-0.10.0.jar:0.10.0]
    at clojure.lang.AFn.applyToHelper(AFn.java:178) ~[clojure-1.6.0.jar:?]
    at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.6.0.jar:?]
    at clojure.core$apply.invoke(core.clj:624) ~[clojure-1.6.0.jar:?]
    at backtype.storm.daemon.worker$fn__7098$mk_worker__7175.doInvoke(worker.clj:409) [storm-core-0.10.0.jar:0.10.0]
    at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.6.0.jar:?]
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:542) [storm-core-0.10.0.jar:0.10.0]
    at clojure.lang.AFn.applyToHelper(AFn.java:165) [clojure-1.6.0.jar:?]
    at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
    at backtype.storm.daemon.worker.main(Unknown Source) [storm-core-0.10.0.jar:0.10.0]
2016-01-29 19:47:50.900 b.s.util [ERROR] Halting process: ("Error on initialization")
java.lang.RuntimeException: ("Error on initialization")
    at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:336) [storm-core-0.10.0.jar:0.10.0]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.6.0.jar:?]
    at backtype.storm.daemon.worker$fn__7098$mk_worker__7175.doInvoke(worker.clj:409) [storm-core-0.10.0.jar:0.10.0]
    at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.6.0.jar:?]
    at backtype.storm.daemon.worker$_main.invoke(worker.clj:542) [storm-core-0.10.0.jar:0.10.0]
    at clojure.lang.AFn.applyToHelper(AFn.java:165) [clojure-1.6.0.jar:?]
    at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
    at backtype.storm.daemon.worker.main(Unknown Source) [storm-core-0.10.0.jar:0.10.0]

Any idea why this happening? This is old Storm bug but it got fixed in 0.9.3 version.

HA support for nimbus?

Hi, I seems that nimbus does not have any sort of HA support at the moment? Are there any plans on adding one.

I currently have a setup where I'm running nimbus with marathon on slave servers. I'm also trying to understand what happens with existing topologies if nimbus dies and restarts on another server. Do you have any idea how people run storm-mesos in production settings, if anyone is doing this at all?

How to run this entirely under mesos

So following the instructions, it says to edit the storm.yaml file and add in storm.zookeeper.servers and nimbus.host.

So in a perfect world here is how this would work:

The storm-mesos project would read /etc/mesos/zk for the mesos.master.url attr if that isn't specified.
The framework would ensure nimbus runs somewhere, and you could run multiple instances to keep it up
The supervisors would lookup nimbus from zookeeper, so it doesn't have to be hardcoded
Magic

Or is it preferred to run this under something like marathon?

TASK_LOST: Task launched with invalid offers: Aggregated offers must belong to one single slave.

I think this may be an issue with mesos itself rather than this framework. The call in MesosNimbus to launchTasks launches tasks for several offers at once. The tasks are correctly built (in terms of the offer->slave mapping), but mesos seems to get confused and think that the scheduler is running tasks for the same offer on different slaves.

I was able to resolve the issue by issuing one call per offer to launchTasks instead of a single call with multiple offers at once. Happy to make a PR for this change, but I don't think my solution is exactly right..

Shutdown cluster via storm UI

It would be nice if there were options to shutdown the cluster from the Storm UI.

Issues with getting storm-mesos 0.9.3 to work

I ran bin/build-release.sh to build the latest release, deployed the release and was trying to run it. When I ran bin/storm-mesos nimbus it fails silently. Hence, I ran sh bin/storm nimbus storm.mesos.MesosNimbus and it fails with the following error. Really appreciate if someone can help me to fix this.

$ sh bin/storm nimbus storm.mesos.MesosNimbus
bin/storm: 19: bin/storm: import: not found
bin/storm: 20: bin/storm: import: not found
bin/storm: 21: bin/storm: import: not found
bin/storm: 22: bin/storm: import: not found
bin/storm: 23: bin/storm: import: not found
bin/storm: 24: bin/storm: import: not found
bin/storm: 25: bin/storm: try:: not found
bin/storm: 27: bin/storm: from: not found
bin/storm: 28: bin/storm: except: not found
bin/storm: 30: bin/storm: from: not found
bin/storm: 31: bin/storm: try:: not found
bin/storm: 33: bin/storm: import: not found
bin/storm: 34: bin/storm: except: not found
bin/storm: 36: bin/storm: import: not found
bin/storm: 38: bin/storm: Syntax error: "(" unexpected
$

mesos / storm Goto Github PK

storm's Introduction

Storm on Mesos

Overview

Known Deficiencies Versus non-Mesos Storm

Building

Sub-commands

Docker images Building

Releasing New Version

Select the Branch

Storm v1.x

Storm v0.x (e.g., v0.9.6)

Generate Release

Running Storm on Mesos

Vagrant setup

Mandatory configuration

Optional configuration

Resource configuration

Automatic Launching of Logviewer

Running Storm on Marathon

Running an example topology

Running without Marathon

storm's People

Contributors

Stargazers

Watchers

Forkers

storm's Issues

Current version scheme

Simplistic storm-independent versioning

Supporting multiple storm versions

Supporting multiple mesos versions

Docker image versioning

Proposal

Update pom.xml's mesos.version

(DONE) Update Dockerfile's mesos version:

Before Storm

After Storm

mesos.allowed.hosts:

- host1

mesos.disallowed.hosts:

- host1

Recommend Projects

Recommend Topics

Recommend Org

Update `pom.xml`'s `mesos.version`

(DONE) Update `Dockerfile`'s mesos version: