netflix / fenzo Goto Github PK

View Code? Open in Web Editor NEW

699.0 451.0 115.0 1.75 MB

Extensible Scheduler for Mesos Frameworks

Java 99.23% Groovy 0.77%

fenzo's People

Stargazers

Watchers

Forkers

ruo91 ashmehro maselvaraj justomiguel tomzhang ezhulenev voidexception siddharth20729 tharanga-abeyseela tsw424 sensaid cloudstrack oztc aspyker ilackarms mr-justin mbrukman skyairmj kinzer1 minospong alkiller22 drcrallen zhoupan hardiku vinay-g sixgodx coconutpalm tbak kensipe tansydev spodila zmyer is00hcw rspieldenner deanwei zhangwusheng coolraider hrzq19901209 islove nagyistge hadoop835 saberlilydian kiril-me mukteshkrmishra daddyauden schevalier xiaomin0322 gvenka008c mreddy8182 emaxerrno liuq4360 lizhanhui spartawebti hardikdr icando oier zdf8122 pbting mahak jcohen pologood zhoujie0101 maniacs-ops jameslinus wyegelwel brewwinds zonybob necronomicon ycaihua super-fishz raymondlwb aaronjwood arunsingh cookingcodewithme atorson zhangzhikuan optionalg alenye swinsey 1309893329 szymexxx hexiezhi aihua treadstone90 jxzmcc yangguang001 deltacore-pl kvish darsi-an fakenetflix cynron lecterqian huiwenhan fabiokung redneil katoomba-demo ramch22 suancaiyu0707 backwardn cse-ljl

fenzo's Issues

Support dynamic fitness threshold

I'd like the "good enough" calculation to vary with the urgency of the task. For example, imagine that a task should be scheduled within 30 seconds. At first, the fitness bar should be held high, then be gradually lowered as the task request gets older; at the 30 second mark, "anything that meets the hard constraints will do".

I suggest that the FitnessGoodEnoughFunction accept the TaskAssignmentResult to convey the TaskRequest along with the fitness measurement.

contributor agreement.

is the indiv contributor agreement different than corporate contributor agreement.

also, is it as easy as just signing your PR via git -s

is it just the normal apache 2 ..i.e.: https://www.apache.org/licenses/cla-corporate.txt ?

[question] set resources evaluation in Fenzo

Assume two task requests and a VM lease come in:

VMLease1: { fooset: {foo-a, foo-b} }

TaskRequest1: requires foo-a
TaskRequest2: requires foo-a

Both task requests are entered simultaneously:
taskScheduler.scheduleOnce( [TaskRequest1, TaskRequest2], [VMLease1])

For both tasks, I have overridden getHardConstraints() to include a ConstraintEvaluator that checks whether a VMLease has the set resource "foo-a"

If TaskRequest1 is evaluated first and succeeds, how does Fenzo tell TaskRequest2 that "foo-a" is not available anymore when it runs TaskRequest2's constraints evaluator? Or is it the case that each task request's foo-a ConstraintEvaluator sees a different VirtualMachineCurrentState when ConstraintEvaluator.evaluate() is called (so that only one of the task requests' foo-a evaluators will see foo-a)?

What I'm trying to ask is, if VMLease1 satisfies both TaskRequest1 and TaskRequest2, how does Fenzo know not to return from taskScheduler.scheduleOnce() with a success to both TaskRequests since they both ask for the same resource?

Update Mesos dependency

The library is currently linking against Mesos 0.24. Consider updating the Mesos system requirements to a more recent version, e.g. 1.0+. This would help with #83 (which requires a more recent protocol), and with reducing the test scope.

Update Javadocs

The Javadocs don't match the latest code. It'd be awesome to update them!

advise on how to loop/tick fenzo

Generally there are a couple of options:

new Thread().run()
with every mesos offer, enqueue and deuque
w/ timers - i.e.: actors - like the flink scheduler, or hierarchical timer wheel - for frequent timers.

When developing a large framework - i.e.: relatively involved, where mesos is a small part - based on what you guys built at netflix, is there anything that works particularly well ?

The downside of (1) is using thread.sleep() which is blocking.

The downside of (2) is that given the async nature of fullfilling offers, it will probably not work for short expiration tasks.

The downside of (3) is that you commit to their threading model - i.e.: actors, or callbacks for the time wheel approach which seem ok.

Also this is more of a mailing list discussion, but couldn't find any, so I thought of issues as a way to record some fenzo wisdom. :) - maybe it can turn into docs PR later !

Incorrect handling of reserved resources

Problem Description
Fenzo misinterprets offers containing a mix of reserved and unreserved resources, causing it to fail to consider all offered resources. For example, given an offer of 2 reserved CPUs and 3 unreserved CPUs, Fenzo behaves as though the offer contains 2 (or 3) CPUs, not 5 CPUs as it should.

This situation arises when the operator (or another framework in the same role) reserves a subset of a host for the framework's role. This is an increasingly common phenomenon due to:

the dynamic reservation feature, which makes it easy for an operator to make fine-grained reservations.
the growing popularity of the dcos-commons library, which makes extensive use of dynamic reservations. A framework based on that library may use the same role as a Fenzo-based framework, leading to unintended side-effects.

Here's an example depicting the resources within such an offer (2 cpus for myrole, 3 unreserved):

cpus(myrole):2.0; mem(myrole):4096.0; ports(myrole):[1025-2180];
disk(*):28829.0; cpus(*):3.0; mem(*):10766.0; ports(*):[2182-3887,8082-8180,8182-32000]

Problem Location
The root cause is within com.netflix.fenzo.plugins.VMLeaseObject. The VMLeaseObject assumes that a given resource name (e.g. cpus) will appear at most once in the offer.

Suggested fix
VMLeaseObject should aggregate all resources with the same name (subject to a set of roles to filter on).

A suggested workaround is for the framework to use an alternate implementation of com.netflix.fenzo.VirtualMachineLease. See example here.

After reading Fenzo, I don't understand how to get Framework information on Mesos?

Complete text of README.md

Fill this in with links to more complete documentation.

Execution failed for task ':fenzo-core:compileJava'.> 无效的源发行版: 1.8

11:30:45.916 [ERROR] [org.gradle.BuildExceptionReporter]
11:30:45.917 [ERROR] [org.gradle.BuildExceptionReporter] FAILURE: Build failed with an exception.
11:30:45.918 [ERROR] [org.gradle.BuildExceptionReporter]
11:30:45.918 [ERROR] [org.gradle.BuildExceptionReporter] * What went wrong:
11:30:45.918 [ERROR] [org.gradle.BuildExceptionReporter] Execution failed for task ':fenzo-core:compileJava'.
11:30:45.918 [ERROR] [org.gradle.BuildExceptionReporter] > 无效的源发行版: 1.8
11:30:45.919 [ERROR] [org.gradle.BuildExceptionReporter]
11:30:45.919 [ERROR] [org.gradle.BuildExceptionReporter] * Exception is:
11:30:45.920 [ERROR] [org.gradle.BuildExceptionReporter] org.gradle.api.tasks.TaskExecutionException: Execution failed f
or task ':fenzo-core:compileJava'.
11:30:45.921 [ERROR] [org.gradle.BuildExceptionReporter] at org.gradle.api.internal.tasks.execution.ExecuteAction
sTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:69)
11:30:45.921 [ERROR] [org.gradle.BuildExceptionReporter] at org.gradle.api.internal.tasks.execution.ExecuteAction
sTaskExecuter.execute(ExecuteActionsTaskExecuter.java:46)
11:30:45.921 [ERROR] [org.gradle.BuildExceptionReporter] at org.gradle.api.internal.tasks.execution.PostExecution
AnalysisTaskExecuter.execute(PostExecutionAnalysisTaskExecuter.java:35)
11:30:45.921 [ERROR] [org.gradle.BuildExceptionReporter] at org.gradle.api.internal.tasks.execution.SkipUpToDateT
askExecuter.execute(SkipUpToDateTaskExecuter.java:68)
11:30:45.921 [ERROR] [org.gradle.BuildExceptionReporter] at org.gradle.api.internal.tasks.execution.ValidatingTas
kExecuter.execute(ValidatingTaskExecuter.java:58)
11:30:45.921 [ERROR] [org.gradle.BuildExceptionReporter] at org.gradle.api.internal.tasks.execution.SkipEmptySour
ceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:52)

Interaction between frameworks using Fenzo

We have multiple Mesos frameworks in a Mesos Cluster with three hosts(agents). Some of the frameworks developed by ourselves are using Fenzo and some of the frameworks are not using Fenzo (e.g. Marathon). We have configured leaseOfferExpirySecs to 2 and have found that frameworks that use Fenzo have been starving frameworks that do not use Fenzo.
We would like to ask the following questions.

Can we have more than one Fenzo TaskSchduler in a Mesos Cluster?
Can we have Mesos Framework that uses Fenzo and Mesos Framework that does not use Fenzo in a Mesos Cluster?
Can we use Fenzo in a small cluster with three hosts(agents)?

Auto scaling

Read about Fenzo in netflix blog. Auto scaling concept sounds interesting. I'm yet to try my hands on Fenzo. Can you please let me know context of auto scaling here? Is it like Fenzo will shutdown/bring up VMs based on the demand ?

Thanks in advance,
Mani

High availabilty support

If one builds a framework on top of Fenzo, what are the guidelines to enable high availability of the framework? Specifically assuming zookeeper is being used to provide leadership election between framework instances, how local cache of the Fenzo (for example Tasks queues, running state etc.) can be synchronized to other instances of framework built on Fenzo?

Thanks for your help.

How does Fenzo scale?

Apologies if this isn't the place to ask questions - I couldn't find a mailing list.

My understanding is that a TaskScheduler is quite stateful and should be a singleton within a cluster of JVM instances. Is this view incorrect? If so then what are the considerations around scaling?

Thanks!

After reading Fenzo, I don't understand how to get Framework information on Mesos?

[question] Recover the task assignment after system restart

Hi,
I am using the UniqueHostAttrConstraint to ensure tasks assigned to different host.
However, after the system restart. The constraint is not working. I believe this is due to the assignment history loss after restart.

So what will be the correct way to presisent the assignment history and recover it after system restart?

DisableVM does not remember hosts

The explanation for TaskScheduler:disableVM indicates that it will remember the "disabled" state if the VM is not yet known by Fenzo.
http://netflix.github.io/Fenzo/fenzo-core/com/netflix/fenzo/TaskScheduler.html#disableVM-java.lang.String-long-

In my testing, I was not observing this to be working. If I disable a VM before offers are received from it, it still comes up in an enabled state.

Add missing javadocs

Completely describe the Fenzo API in javadoc comments, including classes, interfaces, methods, attributes, and enums.

Use the active voice in order to make the documentation unambiguous -- see http://go/pv

Add info to documentation about multiple-framework issues

See #51 for discussion

Fenzo does not account for the resources of a custom executor

I am running tasks with a custom dockerized executor that needs 0.5 cpus.
If (let's say) all my tasks need 0.1 cpus , and Fenzo gets a lease for 1.0 cpu, what normally happens is that "scheduleOnce" tries to pair up ten tasks against that lease... so I schedule those ten.

Task 1 launches fine, but because it's the first time the executor runs on that agent, there are actually 0.6 CPUs used in the process (0.5 + 0.1)
Task 2 launches fine (now we're at 0.7 cpus)
Task 3 launches fine (now we're at 0.8 cpus)
Task 4 launches fine (now we're at 0.9 cpus)
Task 5 launches fine (now we're at 1.0 cpus)
Tasks 6-10 fail with TASK_ERROR - effectively saying the task resources are more that what is left in the offer.

I can work around it by checking if my custom executor is not part of the offer and then manually summing resources and not scheduling tasks that would cause a TASK_ERROR, but of course this is somewhat duplicating what I hope Fenzo would do for me, and I am bound to screw it up.

If a VMLease returns null for its attributes, fenzo silently fails to assign tasks

It'd be great if the (I assume) NPE was logged or bubbled up somewhere.

Have javadocs built automagically; published in a consistent public location

We're not sure how best to do this, so it may take some investigation. Do we have to host these elsewhere, or can we get them here in github with the rest of our docs?

NamedResourceSetRequest

Didn't see any docs around this:

TaskRequest.java {
     public NamedResourceSetRequest(String resName, String resValue, int numSets,
                                   int numSubResources) {
      this.resName = resName;
      this.resValue = resValue;
      this.numSets = numSets;
      this.numSubResources = numSubResources;
    }

....
}

Is it for the mesos agents --attributes ?

Fenzo holds all resource offers of slaves that have tasks assigned

Sorry that I couldn't find a place to ask questions so had to open an issue here.

I found that after calling TaskScheduler.getTaskAssigner().call(...), Fenzo will hold all subsequent resource offers of the corresponding slave forever without even looking at leaseOfferExpirySecs. I'd like to know why Fenzo needs to do that. It makes other frameworks unable to use the remaining resources of that slave. Or is it expected that there should NOT be other frameworks in the cluster? In other words, is it expected that the framework that uses Fenzo should be the only framework in the cluster?

Thanks a lot!

Give use cases to understand the features better

After reading the features I'm struggling to understand what actual use cases are where Fenzo comes in. For example in my case, I have a Spark framework (on Mesos) which runs interactive queries with a measured average response time. Now if I increase the cluster resources by a factor 2 or 4 for example, and increase the amount of Spark frameworks by that same factor, the variance increases greatly. So far, I can see that this stems from the fact that a Spark framework is very greedy; if a resource offer is made, the framework accepts all, and in the meantime other frameworks stall. The average response time is pretty much the same only because the added resources make up for the longer waiting periods for the frameworks. But the higher variance makes this unfavorable to interactive Spark sessions.

Are there mechanisms/features in Fenzo which can help with this above problem? (Basically QoS problems, which Mesos doesn't seem to care about)
Is anybody working on Spark/Fenzo integration? If not, does anyone have an idea about caveats/hurdles to implement it?

TaskRequest should support wait queue expiration

When a TaskRequest is submitted to a Fenzo scheduler (scheduler.scheduleOnce()), it stays in the scheduler's internal queue until a resource offer comes in that fits its requirements. However, it may be the case that no such offered resource will ever satisfy it, and so we would like for the task to be auto-removed from the internal queue after sitting there for a specified amount of time, instead of bad TaskRequests filling up the queue forever. To support this, the following things would be needed:

specifying the queue wait timeout in the SchedulerBuilder SchedulerBuilder.withQueueWaitTimeout(1000), where 1000 is seconds
ability to add a callback in either the Scheduler or SchedulerBuilder for TaskRequest removal-from-queue events, i.e. SchedulerBuilder.withRemovedFromQueueCallback(Action1<TaskRequest>) or scheduler.setRemovedFromQueueCallback(Action1<TaskRequest>),
the TaskRequest interface should specify a "int waitTimeout()" method that the scheduler can access to figure out of the task request has been waiting too long to be assigned.

[Question] Recommendation for long running service-style task

For a framework based on Fenzo, what are the guidelines for scheduling service style tasks?
I am looking for a use case to schedule mix of service and batch jobs. The queueable task input to Fenzo has no distinction for service or batch jobs. This implies that framework should restart the service job when the job finishes/fails. One way is to push the failed/finished service job back in the pending queue, and wait for Fenzo to schedule the job. However, this may lead to interruption of service till the time the prior pending jobs in the queue gets scheduled.

Is there any recommendation to handle restart for the service style tasks and also minimize the interruption of the service?

Add lots of comments to the sample code to explain what it's doing and why

If we're going to encourage people to use the sample projects to help them make their first Fenzo-aware frameworks, these should be full of good code comments to explain what's going on and why and what options are available.

Hard time limit on holding offers for Fenzo

Currently, Fenzo supports releasing offers at a fixed rate. In order to have a Fenzo-scheduled framework function in a multiframework environment, it's important for every offer to be declined in a timely manner if it isn't used. For example, another framework may have specific constraints or properties it's looking for in an agent--currently, it's possible for Fenzo to hold onto an offer for many minutes (if you're unlucky on a very large cluster). This can negatively impact other frameworks that are looking for very specific hosts.

The solution I'd like to see is the ability to configure a maximum time to hold an offer before declining it; this way, a Fenzo-based framework could choose to hold no offers for longer than 30 seconds; this would greatly benefit multi-framework Mesos clusters.

Pluggable ENI fitness evaluator

Feature request

Fenzo supports a concept of preferential named consumable resource, which models a collection
of two-level resources. The top level resource is tagged with a name during task placement process,
which defines some sort of its runtime profile. Multiple tasks matching the same profile can be
associated with the same consumable resource, and be allocated portion of its subresources.

For example, in AWS an ENI and its security group can be modeled as two level resource. The ENI
interface models the resource, the subresource is a number of IPs that can be associated with an ENI
interface, and the runtime profile is defined by security group(s) associated with an ENI.
Tasks with identical security groups placed on the same agent, may thus share single ENI interface
until pool of available IPs (sub-resources) is exhausted. When the last task associated with an ENI
interface is terminated, its runtime profile becomes undefined again.

As calling AWS API is expensive, it makes sense to reduce the amount of network stack configuration related calls by reusing already provisioned resources. This means Fenzo should promote task placement on an agent/ENI slot which already holds required resources. As Fenzo has limited insight into it (unless a task is already associated with an ENI), we need a pluggable API to externalize this evaluation process.

Implementation proposal

To achieve this goal, two new callback interface are proposed. PreferentialNamedConsumableResourceEvaluator computes fitness score for each valid task/ENI assignments. SchedulingEventListener provides notifications from within the scheduling loop, so newly placed tasks can be accounted for during fitness calculation process.

/**
 * Evaluator for {@link PreferentialNamedConsumableResource} selection process. Given an agent with matching
 * ENI slot (either empty or with a matching name), this evaluator computes the fitness score.
 * A custom implementation can provide fitness calculators augmented with additional information not available to
 * Fenzo for making best placement decision.
 *
 * <h1>Example</h1>
 * {@link PreferentialNamedConsumableResource} can be used to model AWS ENI interfaces together with IP and security
 * group assignments. To minimize number of AWS API calls and to improve efficiency, it is beneficial to place a task
 * on an agent which has ENI profile with matching security group profile so the ENI can be reused. Or if a task
 * is terminated, but agent releases its resources lazily, they can be reused by another task with a matching profile.
 */
public interface PreferentialNamedConsumableResourceEvaluator {

    /**
     * Provide fitness score for an idle consumable resource.
     *
     * @param hostname hostname of an agent
     * @param resourceName name to be associated with a resource with the given index
     * @param index a consumable resource index
     * @param subResourcesNeeded an amount of sub-resources required by a scheduled task
     * @param subResourcesLimit a total amount of sub-resources available
     * @return fitness score
     */
    double evaluateIdle(String hostname, String resourceName, int index, double subResourcesNeeded, double subResourcesLimit);

    /**
     * Provide fitness score for a consumable resource that is already associated with some tasks. These tasks and
     * the current one having profiles so can share the resource.
     *
     * @param hostname hostname of an agent
     * @param resourceName name associated with a resource with the given index
     * @param index a consumable resource index
     * @param subResourcesNeeded an amount of sub-resources required by a scheduled task
     * @param subResourcesUsed an amount of sub-resources already used by other tasks
     * @param subResourcesLimit a total amount of sub-resources available
     * @return fitness score
     */
    double evaluate(String hostname, String resourceName, int index, double subResourcesNeeded, double subResourcesUsed, double subResourcesLimit);
}

/**
 * A callback API providing notification about Fenzo task placement decisions during the scheduling process.
 */
public interface SchedulingEventListener {

    /**
     * Called before a new scheduling iteration is started.
     */
    void onScheduleStart();

    /**
     * Called when a new task placement decision is made (a task gets resources allocated on a server).
     *
     * @param taskAssignmentResult task assignment result
     */
    void onAssignment(TaskAssignmentResult taskAssignmentResult);

    /**
     * Called when the scheduling iteration completes.
     */
    void onScheduleFinish();
}

Make TaskScheduler.Builder be non-final

For mocking purposes (w/ Mockito) it would be handy if the builder weren't final.

Increase debug-level logging in Fenzo's scheduler

Currently, I am trying to debug an issue where I provide with one task and one lease to schedule, and fenzo says that it has zero successful or failed assignments. I'm trying to debug this, but since there's no debug logging available, it's tricky to trace what's going on.

how to handle remove offers from TaskSchedulingService in response to offerRescinded event from mesos master

com.netflix.fenzo.TaskScheduler do not cleanup ThreadPool created by ExecutorService

Hi,
I find that my application failed to shutdown gracefully after using TaskScheduler. After code trace, seems the ExecutorService is not shutdown (and has no way to do it). Can we have shutdown call in TaskScheduler to clear up threadpool?

Support for persistent volumes

Are there any plans to support multi-disk resources and persistent volume allocation / creation?

fitness calculators | oring properties

hi @spodila just testing out fenzo

Seeing the binpackingfitnescalculators.* it's effectively


sum ( fitness_i( request ).... + fitness_n( request) ) /  total_fitness_fns 

# or | fitness |

Wondering if it'd be worth it to make it OR objects... so that one can do


builder.
           .....
           .withFitnessCalculator ( Fitness1 | Fitness2 | Fitness3 )
           .build()

Interested in a patch?

I guess two things:

You can use constraints to do what I want - effectively further filtering of machines to offers
Since there are fairly small set of resources on mesos (cpu, mem, gpu, bandwidth)... it's not really too bad to have 4 or even 10 (in the future) that are hard coded since for most workloads you'll need CPU + mem so the permutations are not that many ... w/ or w/out network, w/ or w/out gpu.. etc.

Add a time interval based trigger type to fenzo-triggers

@amit-git Feel free to jump on this if you have time or else I will add it
@spodila Just FYI ^^

.travis.yml: The 'sudo' tag is now deprecated in Travis CI

Travis are now recommending removing the sudo tag.

"If you currently specify sudo: false in your .travis.yml, we recommend removing that configuration"

Make AssignableVMs key on slaveId not hostname

Hostname appears to be informational to Mesos: hostname does not appear in TaskStatus messages or TaskInfo messages. SlaveID does, and for Mesos tasks are assigned by SlaveID, not by hostname. This means that a framework that uses Fenzo must map SlaveIDs to hostnames just so that it can call Fenzo when a task state changes.

Better yet, make the taskUnassigner take a TaskStatus and then just pull out whatever is required.

BinPacking with weights

From a quick glance in this line

Fenzo/fenzo-core/src/main/java/com/netflix/fenzo/plugins/BinPackingFitnessCalculators.java

Line 146 in 5de0e08

return (cpuFitness + memFitness + networkFitness)/3.0;

every type of resource is considered equal. For example (0.9 + 0.1 + 0.1 )/3
~= 0.36 and fitness being 0.64 . For me such a machine with high cpu utilization may be useless even if I only need one more cpu for subsequent tasks and leads to wasted resources. This can be true especially for machines with many GBs of RAM but few cpus. Instead of having that formula I think we should try normalize the different type of items in the bins or use some type of weights. Also another question is that bin packing optimizes the number of bins but how about load balancing. I might missing something here.

Complete wiki documentation

The wiki docs should:

Describe how Fenzo fits in to a Framework=>Mesos=>Cloud Cluster environment, from the point of view of the Mesos community.
Describe how to use the various features of the Fenzo API, from the point of view of Java developers.

Some questions to answer:

How do I use Fenzo to control autoscaling, and how can I automate this with framework resource allocations?
How do I establish custom slave attributes that I can then use as constraints?
How do I get jobs that use similar resources allocated to the same instance so as to reduce resource use?
How do I adjust task urgency? How can I create & deploy fitness calculator plugins that do this in an automated way?
What are hard and soft constraints, and how do I distinguish them?

A possible outline:

Intro
Getting started
Fenzo design and concepts
How to use Fenzo
Constraint evaluator plugins
Fitness calculator plugins
Autoscaling the cluster
Appendix: JavaDocs

Incorrect handling of multi-disk offers

Fenzo misunderstands offers that contain numerous disk resources, as can occur when an agent is configured with multiple disks (as described here). Fenzo consequently miscalculates the available disk resources.

The VMLeaseObject constructor iterates over the offered resources to identify the cpus, mem, and disk. The last encountered disk resource wins, becoming the basis for the diskMB quantity. The logic should only consider the 'root' disk resource, i.e. the disk without a source component. This approach makes sense for frameworks that don't explicitly support non-root disks.

How to fully support multi-disk scheduling with Fenzo is considered a separate issue.

Fenzo on Openstack

@spodila Does Fenzo provides support on Openstack environment? Also how to setup and have it running?

Test issue

Testing.

Unsafe concurrent access to unknown lease collection in AssignableVMs class

unknownLeaseIdsToExpire array is modified directly when invoking TaskScheduler#expireLease method. This array is not concurrent, and most likely the intent was to handle it like all the other mutations by the internal thread.

Here is an example stack trace:
fenzo.TaskScheduler:? - Error with scheduling run: null java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at com.netflix.fenzo.AssignableVMs.expireAnyUnknownLeaseIds(AssignableVMs.java:257) at com.netflix.fenzo.AssignableVMs.prepareAndGetOrderedVMs(AssignableVMs.java:270) at com.netflix.fenzo.TaskScheduler.doSchedule(TaskScheduler.java:754) at com.netflix.fenzo.TaskScheduler.doScheduling(TaskScheduler.java:736) at com.netflix.fenzo.TaskScheduler.scheduleOnce(TaskScheduler.java:711) at com.netflix.fenzo.TaskSchedulingService.scheduleOnce(TaskSchedulingService.java:275) at com.netflix.fenzo.TaskSchedulingService.access$700(TaskSchedulingService.java:73) at com.netflix.fenzo.TaskSchedulingService$1.run(TaskSchedulingService.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Resource ranges and "greedy" scheduling

UPFRONT DISCLAIMER:
Nothing is broken - I am only asking if the Fenzo developer community would have any interest in pursuing or accepting a new feature.

Problem

I am working on a development effort where our team is adding new capabilities to a Mesos Framework that uses Fenzo. One of our requirements is "greedy" scheduling. Effectively, we work with a lot of scientific algorithms that can run sequentially or can run in parallel modes. The parallel modes run faster but need more resources, making them harder to schedule sometimes. Currently, these algorithms must be configured with high resource requirements, but sometimes they take forever to schedule. Alternatively, we can configure them with low resource requirements, and they run, but much more slowly.

Design space:

We are looking at introducing a data model on top of TaskRequest that takes "resource ranges" (min, max). We are then looking at two different design approaches:

Approach 1:

Initialize task requests to their maximum requested resources (e.g. 16 cpus)
After some configurable number of scheduling iterations have passed, if a task is still unassigned, back off or reduce the requested resources (e.g. 8 cpus).
Continue until task is schedule or you reach minimum resource levels (e.g. 1.0 cpu)

Approach 2:

Generate permutations on the TaskRequest ranges
Let Fenzo generate SchedulingResults for the various permutations of TaskRequest resource ranges
Run the SchedulingResults through some fitness evaluation that takes into account % of resources utilized, number of tasks scheduled, etc.
Use the best SchedulingResult

We recognize that approach 2, although more optimal and faster to schedule, is not readily possible using TaskQueues and TaskSchedulingService. There would be a lot of changes to Fenzo needed to allow concurrent calls to scheduleOnce() on the same sets of resourceOffers and taskRequests to generate the permutations... (honestly not even sure if you'd use 1 taskQueue per permutation or try to handle it all in one).

Our proposed solution is to choose approach 2 ONLY IF the Fenzo team is interested in helping to support the feature, or if the Fenzo team would be willing to accept the additional complexity of a pull request with resource ranges and permutations.

If the Fenzo team is not interested in this "greedy" scheduling feature or doesn't believe it adds general-purpose value to the community, then our team is going to select approach 1.

Any insight you have on this matter is greatly appreciated! Whether it be a "yes" or "no" on contributing support...... a "yes" or "no" on accepting a pull request.... or even general guidance on something we've missed, whereby Fenzo is already capable of solving this problem elegantly.

Thank you!

Reject Leases on single hosts to avoid fragmentation

In com.netflix.fenzo.AssignableVMs#removeLimitedLeases there is this comment:
// randomize the list so we don't always reject leases of the same VM before hitting the reject limit
Don't we want to free up large chunks of contiguous space on a single VM, rather than fragmenting space across lots of machines?
See @spodila comment on #50

Lease expiration v.s. scale up trigger

Thanks Fenzo for the separation of concerns to abstract the scheduling logic. I was able to create a simple framework (using Fenzo) to scale up/down tasks(Docker) and also scale up/down platform (Softlayer) as needed.

One issue I faced is to balance between a shorter offer expiration (to increase platform sharing) and less aggressive scale up trigger (due to offer just expired ).

Finally the solution is, besides setting offer expiration and scale up delay configuration, I also add another configuration of "wait seconds since last lease expiration before scale up" to ignore the scale up if the last offer expiration is within a duration.

Please let me know if there are other alternatives than introducing this new configuration. You can find out details of my framework here: https://github.com/yanglei99/Mesos_Auto_Scale

Thanks.

Allow for custom shortfall evaluators

Hey there,

Great to see the new OptimizingShortfallEvaluator. Any chance this class/interface hierarchy could be made public so that we can extend it to implement our own shortfall evaluation strategies?

The use case I have is where we are scheduling only short-lived tasks on a dedicated auto-scaling group (or possibly groups in the future). If there are tasks that have a lifetime in the order of seconds, then the current shortfall evaluation ends up grossly overestimating resource needs. Ideally we want to pseudo-schedule some pseudo tasks that represent what we think our resource requirements will be for the next n minutes (where n is probably derived from the auto-scaling cooldown period) based on currently running tasks and pending tasks (and maybe some task history that we record as well).

We might even just start with something fairly naive that doesn't even use pseudo-scheduling, so it would be cool just to be able to implement ShortfallEvaluator ourselves.

Support v1 Protos (or provide utility to convert)

I started porting our Fenzo-based scheduler over to the Mesos HTTP API using mesos-rxjava.
One issue I ran into is documented in the mesos-rxjava project (mesosphere/mesos-rxjava#74). To summarize, the HTTP API now hands me an org.apache.mesos.v1.Protos.Offer, but Fenzo only accepts org.apache.mesos.Protos.Offer.
I have a few ideas for how to convert to/from these Offers, but they feel risky. I was wondering if (a) Fenzo plans to support v1.Protos in the future? ... and (b) if you had any idea how to cleanly convert from v1.Protos to legacy Protos in the meantime?