mesosphere / marathon-autoscale Goto Github PK

Simple Proof-of-Concept for Scaling Application running on Marathon based on Utilization

Python 79.31% Shell 18.84% Dockerfile 1.16% Makefile 0.69%

marathon-autoscale's Introduction

marathon-autoscale

Dockerized auto-scaler application that can be run under Marathon management to dynamically scale a service running on DC/OS.

Prerequisites

A running DC/OS cluster
DC/OS CLI installed on your local machine

If running on a DC/OS cluster in Permissive or Strict mode, an user or service account with the appropriate permissions to modify Marathon jobs. An example script for setting up a service account can be found in create-service-account.sh

Installation/Configuration

Building the Docker container

How to build the container:

image_name="mesosphere/marathon-autoscale" make
image_name="mesosphere/marathon-autoscale" make push

There is also a image available at docker hub for mesosphere/marathon-autoscaler

(Optional) Creating a service account

The create_service_account.sh script takes two parameters:

Service-Account-Name #the name of the service account you want to create
Namespace-Path #the path to launch this service under marathon management.  e.g. / or /dev

$ ./create-service-account.sh [service-account-name] [namespace-path]

Install from the DC/OS Catalog

The marathon-autoscaler can be installed as a service from the DC/OS catalog with dcos package install marathon-autoscaler. There is no default installation for this service. The autoscaler needs to know a number of things in order to scale an application. The DC/OS package install process for this service requires a configuration options during installation. Assuming a simple /sleep application running in Marathon and the following config.json file:

{
  "autoscaler": {
    "marathon-app" : "/sleepy",
    "userid": "agent-99",
    "password": "secret"
  }
}

All the configurations listed below are available to be changed via the config.json file. The default name of this service is ${marathon-app}-autoscaler in this case, the service is /sleepy-autoscaler.

Marathon examples

Autoscale examples

Update one of the definitions in the Marathon definitions folder to match your specific configuration. Marathon application names must include the forward slash. This modification was made in order to handle applications within service groups. (e.g. /group/hello-dcos)

Core environment variables available to the application:

AS_DCOS_MASTER # hostname of dcos master
AS_MARATHON_APP # app to autoscale

AS_TRIGGER_MODE # scaling mode (cpu | mem | sqs | and | or)

AS_AUTOSCALE_MULTIPLIER # The number by which current instances will be multiplied (scale-out) or divided (scale-in). This determines how many instances to add during scale-out, or remove during scale-in.
AS_MIN_INSTANCES # min number of instances, don’t make less than 2
AS_MAX_INSTANCES # max number of instances, must be greater than AS_MIN_INSTANCES

AS_COOL_DOWN_FACTOR # how many times should we poll before scaling down
AS_SCALE_UP_FACTOR # how many times should we poll before scaling up
AS_INTERVAL #how often should we poll in seconds

Notes

If you are using an authentication:

AS_USERID # username of the user or service account with access to scale the service
--and either--
AS_PASSWORD: secret0 # password of the userid above ideally from the secret store
AS_SECRET: secret0 # private key of the userid above ideally from the secret store

If you are using CPU as your scaling mode:

AS_MAX_RANGE # max average cpu time as float, e.g. 80 or 80.5
AS_MIN_RANGE # min average cpu time as float, e.g. 55 or 55.5

If you are using Memory as your scaling mode:

AS_MAX_RANGE # max avg mem utilization percent as float, e.g. 75 or 75.0
AS_MIN_RANGE # min avg mem utilization percent as float, e.g. 55 or 55.0

If you are using AND (CPU and Memory) as your scaling mode:

AS_MAX_RANGE # [max average cpu time, max avg mem utilization percent], e.g. 75.0,80.0
AS_MIN_RANGE # [min average cpu time, min avg men utilization percent], e.g. 55.0,55.0

If you are using OR (CPU or Memory) as your scaling mode:

AS_MAX_RANGE # [max average cpu time, max avg mem utilization percent], e.g. 75.0,80.0
AS_MIN_RANGE # [min average cpu time, min avg men utilization percent], e.g. 55.0,55.0

If you are using SQS as your scaling mode:

AS_QUEUE_URL # full URL of the SQS queue
AWS_ACCESS_KEY_ID # aws access key
AWS_SECRET_ACCESS_KEY # aws secret key
AWS_DEFAULT_REGION # aws region
AS_MIN_RANGE # min number of available messages in the queue
AS_MAX_RANGE # max number of available messages in the queue

Target application examples

In order to create artificial stress for an application, use one of the examples located in the Marathon Target Application folder.

Program Execution / Usage

Add your application to Marathon using the DC/OS Marathon CLI.

$ dcos marathon app add marathon_defs/marathon.json

Where the marathon.json has been built from one of the samples:

autoscale-cpu-noauth-marathon.json #security disabled or OSS DC/OS
autoscale-mem-noauth-marathon.json #security disabled or OSS DC/OS
autoscale-sqs-noauth-marathon.json #security disabled or OSS DC/OS
autoscale-cpu-svcacct-marathon.json #security permissive or strict on Enterprise DC/OS, using service account and private key (private key stored as a secret)

Verify the app is added with the command $ dcos marathon app list

Scaling Modes

CPU

In this mode, the system will scale the service up or down when the CPU has been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For AS_MIN_RANGE and AS_MAX_RANGE on multicore containers, the calculation for determining the value is # of CPU * desired CPU utilization percentage = CPU time (e.g. 80 cpu time * 2 cpu = 160 cpu time)

MEM

In this mode, the system will scale the service up or down when the Memory has been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For AS_MIN_RANGE and AS_MAX_RANGE on very small containers, remember that Mesos adds 32MB to the container spec for container overhead (namespace and cgroup), so your target percentages should take that into account. Alternatively, consider using the CPU only scaling mode for containers with very small memory footprints.

SQS

In this mode, the system will scale the service up or down when the Queue available message length has been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For the Amazon Web Services (AWS) Simple Queue Service (SQS) scaling mode, the queue length will be determined by the approximate number of visible messages attribute. The ApproximateNumberOfMessages attribute returns the approximate number of visible messages in a queue.

AND

In this mode, the system will only scale the service up or down when both CPU and Memory have been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For the MIN_RANGE and MAX_RANGE arguments/env vars, you must pass in a comma-delimited list of values. Values at index[0] will be used for CPU range and values at index[1] will be used for Memory range.

OR

In this mode, the system will only scale the service up or down when either CPU or Memory have been out of range for the number of cycles defined in AS_SCALE_UP_FACTOR (for up) or AS_COOL_DOWN_FACTOR (for down). For the MIN_RANGE and MAX_RANGE arguments/env vars, you must pass in a comma-delimited list of values. Values at index[0] will be used for CPU range and values at index[1] will be used for Memory range.

Extending the autoscaler (adding a new scaling mode)

In order to create a new scaling mode, you must create a new subclass in the modes directory/module and implement all abstract methods (e.g. scale_direction) of the abstract class AbstractMode.

Please note. The scale_direction function MUST return one of three values:

Scaling mode above thresholds MUST return 1
Scaling mode within thresholds MUST return 0
Scaling mode below thresholds MUST return -1

An example skeleton is below:

class ScaleByExample(AbstractMode):

    def __init__(self, api_client=None, app=None, dimension=None):
        super().__init__(api_client, app, dimension)

    def scale_direction(self):
         try:
            value = self.get_value()
            return super().scale_direction(value)
         except ValueError:
            raise

Once the new subclass is created, add the new mode to the MODES dictionary in Marathon AutoScaler.

# Dict defines the different scaling modes available to autoscaler
MODES = {
    'sqs': ScaleBySQS,
    'cpu': ScaleByCPU,
    'mem': ScaleByMemory,
    'and': ScaleByCPUAndMemory,
    'or': ScaleByCPUOrMemory,
    'exp': ScaleByExample
}

Examples

The following examples execute the python application from the command line.

(Optional) Only if using username/password or a service account

export AS_USERID=some-user-id
export AS_PASSWORD=some-password
-or-
export AS_SECRET=dc-os-secret-formatted-json

SQS message queue length as autoscale trigger

export AS_QUEUE_URL=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=us-east-1

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode sqs --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-sqs --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 2.0 --max_range 10.0

CPU as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode cpu --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-cpu --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0 --max_range 80.0

Memory as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode mem --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-memory --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0 --max_range 75.0

AND (CPU and Memory) as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode and --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-cpu --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0,2.0 --max_range 75.0,8.0

OR (CPU or Memory) as autoscale trigger

python marathon_autoscaler.py --dcos-master https://leader.mesos --trigger_mode or --autoscale_multiplier 1.5 --max_instances 5 --marathon-app /test/stress-cpu --min_instances 1 --cool_down_factor 4 --scale_up_factor 3 --interval 10 --min_range 55.0,10.0 --max_range 75.0,20.0

marathon-autoscale's People

Contributors

Stargazers

Watchers

Forkers

rajivreddy barata emdem wleese ashmawy forsingh alfirin manishrajkarnikar robfrut135 vixletmike jgarcia-mesosphere gpetrousov vitoravancini ap-hyperbole sunhongtao miracle99lynn diequeiroz huang-kai aclemmensen alanniu99 jerrydog satchel9 sindrus blurblah xiaomin0322 jmalvarezf paulagent kriloleg amitbmas90 john-dohoney navicore b3n3 bcwilsondotcom binlialfie dobriak aburan28 kongyibjcn jrx thinker0 softwaremill optionalg maria kikicarbonell foreseecode jymun yusufgungor owjjh kensipe wilsonmar ajiezzi criteo-forks nitishgoyal13 raunovv koushikr sovrn jpg-datarobot geunsam2 tazpiano markthill taylord0ng luffy1937 nithinmelala hendrytb shashanklmurthy

marathon-autoscale's Issues

DCOS CLuster is "oauth_enabled: false" ,How can I run ?

my DCOS CLuster is "oauth_enabled: false".
when I run "python marathon-autoscale.py " IT have some err:

Do you really calculate CPU utilization?

Hey, I've just read the code and I am a bit confused by the fact, that you write:

max_cpu_time = int(input("Enter the Max percent of CPU Usage averaged across all Application Instances to trigger Autoscale (ie. 80) : "))

... but you actually just sum up the system/user CPU seconds and don't really calculate any percentage of CPU utilization?

cpus_time =(task_stats['cpus_system_time_secs']+task_stats['cpus_user_time_secs'])

So right now setting the max_cpu_time is not a percentage but a total value, which just gets averaged over all tasks?

app_avg_cpu = (sum(app_cpu_values) / len(app_cpu_values))

Or do I understand something wrong?

MODE(and, or) doesn't work

`and` , `or` MODE doesn't work

Problem

There are some problems with the latest code. According to the manual, the marathon auto-scale service must operate in a total of five modes cpu, mem, sqs, and, or modes. However, the current and, or mode is not working.

error log

AttributeError: 'NoneType' object has no attribute 'get_app_details'
I0116 11:18:30.314077 10007 executor.cpp:401] Received killTask for task auto_auto-test.instance-f140cda0-3801-11ea-8aa8-e6b71f12091b._app.1
2020-01-16 02:18:45,728 - autoscale - ERROR - 'NoneType' object has no attribute 'get_app_details'
Traceback (most recent call last):
  File "marathon_autoscaler.py", line 269, in run
    direction = self.scaling_mode.scale_direction()
  File "/marathon-autoscale/autoscaler/modes/scalebycpuormem.py", line 42, in scale_direction
    results.append(self.mode_map[mode].scale_direction())
  File "/marathon-autoscale/autoscaler/modes/scalecpu.py", line 46, in scale_direction
    value = self.get_value()
  File "/marathon-autoscale/autoscaler/modes/scalecpu.py", line 18, in get_value
    app_task_dict = self.app.get_app_details()
AttributeError: 'NoneType' object has no attribute 'get_app_details'

autoscale request failed

Hello,
I have a DCOS cluster running on azure and I have unlocked the DCOS ui by mapping it to my windows machine and no I can access it on my localhost. I have a sample application running in marathon and I want to check the autoscaling feature for that before using it on the actual application. I have given the parameters but there seems to be an error to get the ´dcos-ca.crt´. In the parameters I have given the public IP of the master node and also tried with the private IP but nothing worked for me. Any suggestions where I am wrong in this case.

Buggy handling of Marathon HTTP responses

This line throws an exception because the response bytes are not decoded prior to deserialization. This basically prevents the autoscaler from launching.

This line is also problematic for a related reason: json.loads can't deserialize a non-string object

How scale down?

We need scale down app if CPU or Memory down, how do it here?

How do we run it against multiple apps ??

I have multiple apps running on marathon which needs to be auto scaled .
i have tried comma separated values but it dint work

Caught exception: undefined method `[]' for nil:NilClass

Deployed marathon-autoscale, as a container. Here is my JSON file:

{
  "id": "marathon-lb-internal-autoscale",
  "args":[
    "--marathon", "http://leader.mesos:8080",
    "--haproxy", "http://marathon-lb.marathon.mesos:9090",
    "--target-rps", "100",
    "--interval", "3",
    "--apps", "nginx-internal"
  ],
  "cpus": 0.1,
  "mem": 16.0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "mesosphere/marathon-lb-autoscale",
      "network": "BRIDGE",
      "forcePullImage": true
    }
  }
}

However, when issued docker logs command against the container, I get the following:

[2016-02-19T16:21:57] Starting autoscale controller
[2016-02-19T16:21:57] Options: #<OpenStruct marathon=#<URI::HTTP http://leader.mesos:8080>, haproxy=[#<URI::HTTP http://marathon-lb-internal2.marathon.mesos:9090>], interval=3.0, samples=10, cooldown=5, target_rps=100, apps=#<Set: {"nginx-internal"}>, threshold_percent=0.5, threshold_instances=3, intervals_past_threshold=3>
[2016-02-19T16:21:57] E: Caught exception: undefined method `[]' for nil:NilClass
[2016-02-19T16:21:57] E: ["/marathon-lb-autoscale/autoscale.rb:252:in `block in update_current_marathon_instances'", "/marathon-lb-autoscale/autoscale.rb:251:in `each'", "/marathon-lb-autoscale/autoscale.rb:251:in `update_current_marathon_instances'", "/marathon-lb-autoscale/autoscale.rb:149:in `run'", "/marathon-lb-autoscale/autoscale.rb:322:in `<main>'"]
[2016-02-19T16:22:00] E: Caught exception: undefined method `[]' for nil:NilClass
[2016-02-19T16:22:00] E: ["/marathon-lb-autoscale/autoscale.rb:252:in `block in update_current_marathon_instances'", "/marathon-lb-autoscale/autoscale.rb:251:in `each'", "/marathon-lb-autoscale/autoscale.rb:251:in `update_current_marathon_instances'", "/marathon-lb-autoscale/autoscale.rb:149:in `run'", "/marathon-lb-autoscale/autoscale.rb:322:in `<main>'"]
[2016-02-19T16:22:03] E: Caught exception: undefined method `[]' for nil:NilClass

create_service_account.sh asks for dcos-enterprise-cli

Hi,

I am trying out the autoscale app. But whenever I try to run the above mentioned script, it errors out on dcos-enterprise-cli requirement since I am using the free version.
Can this service run on the community version of dcos? If so, is there a secondary script for that or will this require some manual hacking.

example won't run. Looks like this tool might not be supported anymore

never mind. I put wrong DNS entry in

Auto scale memory inside container.

Hello,

How about a more fine granular autosclae. Autoscale up/down the memory inside the containers.
For example I have 1 container with 70% free memory. I can autoscale the container to use less memory.
Or 1 container 10% free memory, it could restart and allocate more on that container until a certain limit for example 2GB and afterwards instantiate a new one.

Can prevent situations where you have 200 lightweight tasks with 1GB preallocated memory. If they consume 200MB of memory flat each there is a huge improvement if we have this.

Cheers,
Florin

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.