Giter Site home page Giter Site logo

practo / k8s-worker-pod-autoscaler Goto Github PK

View Code? Open in Web Editor NEW
157.0 57.0 33.0 14.86 MB

Kubernetes autoscaler for the workers. Resource is called WPA. Queues Supported: SQS, Beanstalkd.

Home Page: https://medium.com/practo-engineering/launching-worker-pod-autoscaler-3f6079728e8b

License: Apache License 2.0

Makefile 7.22% Go 86.82% Shell 5.91% Dockerfile 0.04%
kubernetes autoscaling queue sqs aws

k8s-worker-pod-autoscaler's Introduction

Worker Pod Autoscaler

GoDoc Widget CI Status


Scale kubernetes pods based on the combination of queue metrics by intelligently querying them only when needed.

Currently the supported Message Queueing Services are:

Pull Requests are welcome to add new message queuing services.


Install the WorkerPodAutoscaler

Install

Running the below script will create the WPA CRD and install the worker pod autoscaler deployment.

export AWS_REGIONS='ap-south-1,ap-southeast-1'
export AWS_ACCESS_KEY_ID='sample-aws-access-key-id'
export AWS_SECRET_ACCESS_KEY='sample-aws-secret-acesss-key'
./hack/install.sh

Note: AWS_ variables needs to exported only when using SQS and the node role in which the WPA pod runs do not have the required IAM Policy.

Verify Installation

Check the wpa resource is accessible using kubectl

kubectl get wpa

Upgrade

Please follow this document for upgrading Worker Pod Autoscaler.

Example

Do install the wpa crd and wpa deployment before going with the example. (Please check above.)

  • Create Deployment that needs to scale based on queue length.
kubectl create -f artifacts/examples/example-deployment.yaml
  • Create WPA object (example-wpa) that will start scaling the example-deployment based on SQS queue length.
kubectl create -f artifacts/examples/example-wpa.yaml

This will start scaling example-deployment based on SQS queue length.

Configuration

WPA Resource

apiVersion: k8s.practo.dev/v1
kind: WorkerPodAutoScaler
metadata:
  name: example-wpa
spec:
  minReplicas: 0
  maxReplicas: 10
  deploymentName: example-deployment
  queueURI: https://sqs.ap-south-1.amazonaws.com/{{aws_account_id}}/{{queue_prefix-queue_name-queue_suffix}}
  targetMessagesPerWorker: 2
  secondsToProcessOneJob: 0.03
  maxDisruption: "100%"

Beanstalk's queueURI would be like: beanstalk://beanstalkDNSName:11300/test-tube

WPA Spec Documentation:

Spec Description Mandatory
minReplicas Minimum number of workers you want to run. Yes
maxReplicas Maximum number of workers you want to run Yes
deploymentName Name of the kubernetes Deployment in the same namespace as WPA object. No*
replicaSetName Name of the kubernetes ReplicaSet in the same namespace as WPA object. No*
queueURI Full URL of the queue. Yes
targetMessagesPerWorker Target ratio between the number of queued jobs(both available and reserved) and the number of workers required to process them. For long running workers with visible backlog, this value may be set to 1 so that each job spawns a new worker (upto maxReplicas). Yes
secondsToProcessOneJob For fast running workers doing high RPM, the backlog is very close to zero. So for such workers scale up cannot happen based on the backlog, hence this is a really important specification to always keep the minimum number of workers running based on the queue RPM. (highly recommended, default=0.0 i.e. disabled). No
maxDisruption Amount of disruption that can be tolerated in a single scale down activity. Number of pods or percentage of pods that can scale down in a single down scale down activity. Using this you can control how fast a scale down can happen. This can be expressed both as an absolute value and a percentage. (default is the WPA flag --wpa-default-max-disruption). No
  • It is mandatory to set either deploymentName or replicaSetName.

Explained the above specifications with examples:

  • targetMessagesPerWorker:
availableMessages=90(backlog), reservedMessages=110(inprocess), and 10 workers are required to process 110+90=200 messages then
targetMessagesPerWorker=110+90/10 = 20
  • secondsToProcessOneJob:
secondsToProcessOneJob=0.5
queueRPM=300
min=1
minWorkersBasedOnRPM=Ceil(0.5*300/60)=3, so there will be minium 3 workers running based on the RPM.
  • maxDisruption:
min=2, max=1000, current=500, maxDisruption=50%: then the scale down cannot bring down more than 250 pods in a single scale down activity.
min=2, max=1000, current=500, maxDisruption=125: then the scale down cannot bring down more than 125 pods in a single scale down activity.

WPA Controller

Run the workerpodautoscaler

Usage:
  workerpodautoscaler run [flags]

Examples:
  workerpodautoscaler run

Flags:
      --aws-regions string                               comma separated aws regions of SQS (default "ap-south-1,ap-southeast-1")
      --beanstalk-long-poll-interval int                 the duration (in seconds) for which the beanstalk receive message call waits for a message to arrive (default 20)
      --beanstalk-short-poll-interval int                the duration (in seconds) after which the next beanstalk api call is made to fetch the queue length (default 20)
  -h, --help                                             help for run
      --k8s-api-burst int                                maximum burst for throttle between requests from clients(wpa) to k8s api (default 10)
      --k8s-api-qps float                                qps indicates the maximum QPS to the k8s api from the clients(wpa). (default 5)
      --kube-config string                               path of the kube config file, if not specified in cluster config is used
      --metrics-port string                              specify where to serve the /metrics and /status endpoint. /metrics serve the prometheus metrics for WPA (default ":8787")
      --namespace string                                 specify the namespace to listen to
      --queue-services string                            comma separated queue services, the WPA will start with (default "sqs,beanstalkd")
      --resync-period int                                maximum sync period for the control loop but the control loop can execute sooner if the wpa status object gets updated. (default 20)
      --scale-down-delay-after-last-scale-activity int   scale down delay after last scale up or down in seconds (default 600)
      --sqs-long-poll-interval int                       the duration (in seconds) for which the sqs receive message call waits for a message to arrive (default 20)
      --sqs-short-poll-interval int                      the duration (in seconds) after which the next sqs api call is made to fetch the queue length (default 20)
      --wpa-default-max-disruption string                it is the default value for the maxDisruption in the WPA spec. This specifies how much percentage of pods can be disrupted in a single scale down acitivity. Can be expressed as integers or as a percentage. (default "100%")
      --wpa-threads int                                  wpa threadiness, number of threads to process wpa resources (default 10)

Global Flags:
  -v, --v Level   number for the log level verbosity

If you need to enable multiple queue support, you can add queues comma separated in --queue-services. For example, if beanstalkd is started and there is no WPA beanstalk resource present, then nothing happens, until a beanstalk WPA resource is created. Queue poller service only operates on the filtered WPA objects.

--queue-services=sqs,beanstalkd

Troubleshoot (running WPA at scale)

Running WPA at scale require changes in --k8s-api-burst and --k8s-api-qps flags.

WPA makes call to the Kubernetes API to update the WPA resource status. client-go is used as the kubernetes client to make the Kubernetes API calls. This client allows 5QPS and 10Burst requests to Kubernetes API by default. The defaults can be changed by using k8s-api-burst and k8s-api-qps flags.

You may need to increase the --k8s-api-qps and k8s-api-burst if wpa_controller_loop_duration_seconds is greater than 200ms (wpa_controller_loop_duration_seconds>0.200)

For ~800 WPA resources, 100 QPS keeps the wpa_controller_loop_duration_seconds<0.200

WPA Metrics

WPA emits the following prometheus metrics at :8787/metrics.

wpa_controller_loop_count_success{workerpodautoscaler="example-wpa", namespace="example-namespace"} 23140
wpa_controller_loop_duration_seconds{workerpodautoscaler="example-wpa", namespace="example-namespace"} 0.39

wpa_log_messages_total{severity="ERROR"} 0
wpa_log_messages_total{severity="WARNING"} 0

wpa_queue_messages{workerpodautoscaler="example-wpa", namespace="example-namespace", queueName="example-q"} 87
wpa_queue_messages_sent_per_minute{workerpodautoscaler="example-wpa", namespace="example-namespace", queueName="example-q"} 2007

wpa_worker_current{workerpodautoscaler="example-wpa", namespace="example-namespace", queueName="example-q"} 27
wpa_worker_desired{workerpodautoscaler="example-wpa", namespace="example-namespace", queueName="example-q"} 5
wpa_worker_idle{workerpodautoscaler="example-wpa", namespace="example-namespace", queueName="example-q"} 0

go_goroutines{endpoint="workerpodautoscaler-metrics"} 40

Using these metrics, scaling trends can be better analysed, comparing the Replicas Vs Queue:

If you have ServiceMonitor installed in your cluster. You can bring these metrics to Prometheus by running the following:

kubectl create -f artifacts/service.yaml
kubctl create -f artifacts/servicemonitor.yaml

Why make a separate autoscaler CRD ?

Go through this medium post for details.

Kubernetes does support custom metric scaling using Horizontal Pod Autoscaler. Before making this we were using HPA to scale our worker pods. Below are the reasons for moving away from HPA and making a custom resource:

TLDR; Don't want to write and maintain custom metric exporters? Use WPA to quickly start scaling your pods based on queue length with minimum effort (few kubectl commands and you are done !)

  1. No need to write and maintain custom metric exporters: In case of HPA with custom metrics, the users need to write and maintain the custom metric exporters. This makes sense for HPA to support all kinds of use cases. WPA comes with queue metric exporters(pollers) integrated and the whole setup can start working with 2 kubectl commands.

  2. Different Metrics for Scaling Up and Down: Scaling up and down metric can be different based on the use case. For example in our case we want to scale up based on SQS ApproximateNumberOfMessages length and scale down based on NumberOfMessagesReceived. This is because if the worker jobs watching the queue is consuming the queue very fast, ApproximateNumberOfMessages would always be zero and you don't want to scale down to 0 in such cases.

  3. Fast Scaling: We wanted to achieve super fast near real time scaling. As soon as a job comes in queue the containers should scale if needed. The concurrency, speed and interval of sync have been made configurable to keep the API calls to minimum.

  4. On-demand Workers: min=0 is supported. It's also supported in HPA.

Release

  • Decide a tag and bump up the tag here and create and merge the pull request.

  • Get the latest master code.

git clone https://github.com/practo/k8s-worker-pod-autoscaler
cd k8s-worker-pod-autoscaler
git pull origin master

  • Build and push the image to public.ecr.aws/practo. Note: ECR push access is required or use a custom registry by adding REGISTRY=public.ecr.aws/exampleorg make push
git fetch --tags
git tag v1.6.0
make push

Note: For every tag major and major minor versions tags also available. For example: v1 and v1.6

  • Create a Release in Github. Refer this and create a release. Release should contain the Changelog information of all the issues and pull request after the last release.

  • Publish the release in Github ๐ŸŽ‰

  • For first time deployment use this.

  • For future deployments. Edit the image in deployment with the new tag.

kubectl edit deployment -n kube-system workerpodautoscaler

Contributing

It would be really helpful to add all the major message queuing service providers. This interface implementation needs to be written down to make that possible.

  • After making code changes, run the below commands to build and run locally.
$ make build
making bin/darwin_amd64/workerpodautoscaler

$ bin/darwin_amd64/workerpodautoscaler run --kube-config /home/user/.kube/config

  • Generate CRD generated code at pkg/apis and pkg/generated using:
make generate

Thanks

Thanks to kubernetes team for making crds and sample controller. Thanks for go-build-template.

k8s-worker-pod-autoscaler's People

Contributors

akiros001 avatar alok87 avatar matkam avatar sushant-115 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k8s-worker-pod-autoscaler's Issues

Scale up logic using targetMessagesPerWorker

I am using WPA in my service, where my pods are by design taking a lot of time to be initialized (~15 mins).
as I need to start initializing new pods immediately once needed according to the queue size, I've used this configuration:
--resync-period=60
--wpa-threads=10
--aws-regions=us-east-2
--sqs-short-poll-interval=20
--sqs-long-poll-interval=20

so in a short time the auto-scaler will recognize the need of scaling, then the pods will start initializing.

for simplicity lets say one worker can handle 10 jobs, so if I have 100 messages in the queue then I will need 10 workers.
the behavior I get is:
in the first loop the algorithm decides to scale up to 10 workers as expected
in the second loop the queue messages are still ~100 as the workers did not start yet, so the algorithm decides to scale up to 100 this time.

looking on the code logic in file https://github.com/practo/k8s-worker-pod-autoscaler/blob/59d0d487fa22f8121a7a123584e33b09dba04896/pkg/controller/controller.go, I see this:
image

my understanding that if I decided on targetMessagesPerWorker, I should get always 10 workers for 100 messages in my case, but it looks like the algorithm still looks on the current workers count.

Is there a way to keep the behavior as constant without depending on currently running pods, like this (some flag that will "remove" the if, so the scale will be always a constant and not depending on the current running workers):
image

also the flag name giving this meaning, that the scale logic is simply affected by the queue size where you configure messages count that can be handled by single worker.

appreciate your help!

Scaling with spiky queues

I'm looking at using worker-pod-autoscaler as a means to scale our application based on an SQS queue. Overall it's been working well, indeed it behaves exactly as described in the docs.

However, it's behaviour doesn't well fit the nature of our workloads so I'm opening this issue to describe the problem and hopefully start some discussion about a possible solution.

Our queue backlog is "spiky", large numbers of messages arrive on the queue both unpredictably and suddenly. To demonstrate this I created the following WorkerPodAutoscaler spec:

apiVersion: k8s.practo.dev/v1
kind: WorkerPodAutoScaler
spec:
  deploymentName: cpapi-transport
  maxDisruption: 100%
  maxReplicas: 100
  minReplicas: 0
  queueURI: https://sqs.eu-west-1.amazonaws.com/xxxx/queue_name
  secondsToProcessOneJob: null
  targetMessagesPerWorker: 40

Then I sent 1000 messages to the empty queue (this took about 10 seconds).

A graph of the queue length over the coming minutes looks like this:
image

And a graph of desired replicas looks like this:
image

We can see that WPA behaves as expected. Replicas scale immediately to (1000/40=) 25, and queue length drops rapidly as a result. Then as the queue continues to fall replicas are removed and the message removal rate slows until some time later the queue is finally back to zero and replicas goes to zero too.

The problem for us is the way the number of workers reduces in proportion to the queue length, which means the removal-rate is constantly falling and therefore items remain on the queue for longer than we would like. For us, the length of the queue backlog is irrelevant as a SLO, what matters is the amount of time items are sitting on the queue.

In the example above we can see that it's taken eight minutes for the queue to finally reach zero. For our use-case we do not want messages on the queue for any more than five minutes. I could try to mitigate this by reducing targetMessagesPerWorker, and this may be reasonably effective but will result in a lot of initial replicas and still suffer from an ever-decreasing removal rate, a very inefficient solution. Also behaviour would be different for larger/smaller spikes.

My suggestion would be a third alternative metric (in addition to targetMessagesPerWorker and secondsToProcessOneJob) called something like maxDesiredMessageAge.

To create an algorithm based on that metric we also need to know some more information about the queue:

  • How old is the oldest message?
  • How long until the queue reaches zero based on the current removal rate?

For SQS both those metrics can be found (or derived at least) from Cloudwatch:

  • ApproximateAgeOfOldestMessage
  • ApproximateNumberOfMessagesVisible/NumberOfMessagesDeleted

The algorithm would then be a loop that looks something like this (pseudo-code):

AGE_OF_OLDEST_MESSAGE=ApproximateAgeOfOldestMessage
ESTIMATED_TIME_TO_ZERO=ApproximateNumberOfMessagesVisible/NumberOfMessagesDeleted

if ESTIMATED_TIME_TO_ZERO > AGE_OF_OLDEST_MESSAGE
  SCALE_METRIC=ESTIMATED_TIME_TO_ZERO
else
  SCALE_METRIC=AGE_OF_OLDEST_MESSAGE
  
DESIRED_REPLICAS = ceil[CURRENT_REPLICAS * ( SCALE_METRIC / maxDesiredMessageAge )]

This should result in replicas ramping up more steadily as a result of the spike but remaining scaled-up in order to clear the queue within the target time.

You'll probably notice that the final calculation is pretty much what HorizontalPodAutoscaler does. However, using HPA doesn't work for this as it doesn't synchronise metrics with scaling events resulting in massive-overscaling, i.e. it scales-up metrics and then scales again and again because the queue metrics dont update quickly enough.

My example is very simplistic, but I'm just trying to figure out if it's feasible in theory at least. Would love to hear some thoughts...

Install WPA on namespace scope

I'm trying install the WPA on Namespace scope, so i change the ClusterRole to Role and ClusterRoleBinding to RoleBinding, but when the WPA starts i have this error:

Error creating crd: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:NAMESPACE:workerpodautoscaler" cannot create resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "workerpodautoscaler" not found

It's possible run the WPA on namespace scope?

Regards

Handle SQS NonExistentQueue error

WPA pod goes into crash loop when atleast one of the queues it is monitoring is not present or deleted.

F0918 13:33:13.220319       1 sqs.go:273] Unable to get approximate messages in queue "sample-queue", AWS.SimpleQueueService.NonExistentQueue: The specified queue does not exist for this wsdl version.

I think the best option here is to skip scaling instead of crashing so that when the queue gets created later on(when a producer is actually trying to send message), wpa can resume scaling.

[SQS] Inter Region SQS Support

WPA object could be in ap-south-1 and SQS could be in ap-southeast-1
Current IAM permissions wont allow WPA to access SQS in other region.
Please fix this.

Deployment gets stuck MinimumReplicasUnavailable

I have upgraded to the last library and since then I am having issues with the autoscaler. What happens is this after a period of time the pod that does the scaling gets stuck in pending. This is the error that I am getting:

Get "https://10.70.152.3:10250/containerLogs/kube-system/workerpodautoscaler-f898df7cb-wdsb7/wpa?tailLines=5000ร—tamps=true": dial tcp 10.70.152.3:10250: connect: no route to host

When I check the autoscaler deployment I get another message:

MinimumReplicasUnavailable
Deployment does not have minimum availability

This started happening after I've upgraded to the last version. It cannot be a problem of availability because I have 50+ nodes in kubernetes cluster. The only way that I get it to work is if I delete the cluster and recreate it from scratch.

Any pointers on what might be causing this?

Info message is not updated with the current number of workers

  • Info message displayed is showing workers as 1 even when the workers did scale to 20 as expected. Info message should display workers as 20 and not 1.
kn logs -f workerpodautoscaler-c987dc49-tq95s -n kube-system --tail 1 | grep entity_log_change
I1127 08:52:16.491053       1 sqs.go:303] prod-consult-entity_log_change-latest: approxMessages=0
I1127 08:52:17.691175       1 sqs.go:355] prod-consult-entity_log_change-latest: emptyReceives=1.000000, workers=1, idleWorkers=1
I1127 08:52:36.691773       1 sqs.go:303] prod-consult-entity_log_change: approxMessages=46758
  • Also, there are instances of messages showing current as -1 for longer duration. (will open a separate issue when it is certain that is not because of the same issue)

No v.1.1.0 docker image?

Looks like the v1.1.0 release has gone out, I'm excited to test it. But it seems there's no 1.1.0 docker image in dockerhub?

Probably I'm being too impatient since the release was only a few hours ago. But just thought I'd check in case it was an oversight...

Concurrent map write / read / iteration

Find and fix the race.

WPA controller is crashing sometimes because of concurrent map write. Please figure out this issue and fix it.

fatal error: I0812 07:42:08.440000       1 controller.go:299] queue: XXX-XXX-XX, messages: 0, idle: -1, desired: 0
concurrent map iteration and map write

goroutine 16 [running]:
runtime.throw(0x170f0aa, 0x26)
	/usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc00009ad58 sp=0xc00009ad28 pc=0x42c042
runtime.mapiternext(0xc00009af60)
	/usr/local/go/src/runtime/map.go:860 +0x597 fp=0xc00009ade0 sp=0xc00009ad58 pc=0x40ebe7
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).Run(0xc00043e270, 0xc000088660)
	/src/pkg/queue/poller.go:90 +0xfc fp=0xc00009afd0 sp=0xc00009ade0 pc=0x13ec58c
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00009afd8 sp=0xc00009afd0 pc=0x458671
created by main.(*runCmd).run
	/src/cmd/workerpodautoscaler/run.go:121 +0x718

goroutine 1 [chan receive, 116 minutes]:
github.com/practo/k8s-worker-pod-autoscaler/pkg/controller.(*Controller).Run(0xc00043af00, 0xa, 0xc000088660, 0x0, 0x0)
	/src/pkg/controller/controller.go:154 +0x2eb
main.(*runCmd).run(0xc0001a56c0, 0xc000328500, 0xc000203cc0, 0x0, 0x5)
	/src/cmd/workerpodautoscaler/run.go:139 +0x974
github.com/spf13/cobra.(*Command).execute(0xc000328500, 0xc000203c70, 0x5, 0x5, 0xc000328500, 0xc000203c70)
	/src/vendor/github.com/spf13/cobra/command.go:830 +0x2ae
github.com/spf13/cobra.(*Command).ExecuteC(0x26e96a0, 0xc0002aff78, 0x2, 0x2)
	/src/vendor/github.com/spf13/cobra/command.go:914 +0x2fc
github.com/spf13/cobra.(*Command).Execute(...)
	/src/vendor/github.com/spf13/cobra/command.go:864
main.main()
	/src/cmd/workerpodautoscaler/main.go:42 +0xcd

goroutine 5 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x26fbb40)
	/src/vendor/k8s.io/klog/klog.go:990 +0x8b
created by k8s.io/klog.init.0
	/src/vendor/k8s.io/klog/klog.go:404 +0x6c

goroutine 6 [syscall, 116 minutes]:
os/signal.signal_recv(0x0)
	/usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/usr/local/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 8 [chan receive, 116 minutes]:
github.com/practo/k8s-worker-pod-autoscaler/pkg/signals.SetupSignalHandler.func1(0xc00020dbc0, 0xc000088660)
	/src/pkg/signals/signal.go:20 +0x34
created by github.com/practo/k8s-worker-pod-autoscaler/pkg/signals.SetupSignalHandler
	/src/pkg/signals/signal.go:19 +0xd0

goroutine 48 [sleep]:
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:307
time.Sleep(0x7708828c)
	/usr/local/go/src/runtime/time.go:105 +0x159
k8s.io/client-go/util/flowcontrol.realClock.Sleep(...)
	/src/vendor/k8s.io/client-go/util/flowcontrol/throttle.go:70
k8s.io/client-go/util/flowcontrol.(*tokenBucketRateLimiter).Accept(0xc0003704c0)
	/src/vendor/k8s.io/client-go/util/flowcontrol/throttle.go:95 +0x132
k8s.io/client-go/rest.(*Request).tryThrottle(0xc001062300, 0xc00102d1e0, 0x2c7)
	/src/vendor/k8s.io/client-go/rest/request.go:534 +0x2a8
k8s.io/client-go/rest.(*Request).Do(0xc001062300, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/src/vendor/k8s.io/client-go/rest/request.go:823 +0x62
github.com/practo/k8s-worker-pod-autoscaler/pkg/generated/clientset/versioned/typed/workerpodautoscaler/v1alpha1.(*workerPodAutoScalers).Update(0xc001356360, 0xc00102d080, 0xc, 0x1a1bb40, 0xc001356360)
	/src/pkg/generated/clientset/versioned/typed/workerpodautoscaler/v1alpha1/workerpodautoscaler.go:131 +0x10a
github.com/practo/k8s-worker-pod-autoscaler/pkg/controller.(*Controller).updateWorkerPodAutoScalerStatus(0xc00043af00, 0xc000000001, 0xc00126d340, 0x100000001, 0xc000dd3ce0, 0x4)
	/src/pkg/controller/controller.go:413 +0xb5
github.com/practo/k8s-worker-pod-autoscaler/pkg/controller.(*Controller).syncHandler(0xc00043af00, 0xc000b81920, 0x2d, 0x16ecd35, 0x6, 0xc000dd3d98, 0x42adbf)
	/src/pkg/controller/controller.go:307 +0x9f1
github.com/practo/k8s-worker-pod-autoscaler/pkg/controller.(*Controller).processNextWorkItem.func1(0xc00043af00, 0x15aff80, 0xc0013752e0, 0x0, 0x0)
	/src/pkg/controller/controller.go:204 +0x13f
github.com/practo/k8s-worker-pod-autoscaler/pkg/controller.(*Controller).processNextWorkItem(0xc00043af00, 0xc000924801)
	/src/pkg/controller/controller.go:214 +0x4d
github.com/practo/k8s-worker-pod-autoscaler/pkg/controller.(*Controller).runWorker(0xc00043af00)
	/src/pkg/controller/controller.go:164 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00087cd40)
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00087cd40, 0x3b9aca00, 0x0, 0xc0003b4701, 0xc000088660)
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00087cd40, 0x3b9aca00, 0xc000088660)
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by github.com/practo/k8s-worker-pod-autoscaler/pkg/controller.(*Controller).Run
	/src/pkg/controller/controller.go:152 +0x262

goroutine 23 [chan receive, 116 minutes]:
k8s.io/client-go/tools/cache.(*sharedProcessor).run(0xc000327c00, 0xc0003e81e0)
	/src/vendor/k8s.io/client-go/tools/cache/shared_informer.go:478 +0x46
k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54 +0x2e
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0003b45c0, 0xc0000b69c0)
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62

goroutine 19 [IO wait]:
internal/poll.runtime_pollWait(0x7f4f62f6fea8, 0x72, 0xffffffffffffffff)
	/usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc00039c218, 0x72, 0xf400, 0xf42d, 0xffffffffffffffff)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00039c200, 0xc0002a0000, 0xf42d, 0xf42d, 0x0, 0x0, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:169 +0x19b
net.(*netFD).Read(0xc00039c200, 0xc0002a0000, 0xf42d, 0xf42d, 0x203000, 0x0, 0xf3f6)
	/usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00000ed88, 0xc0002a0000, 0xf42d, 0xf42d, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/net.go:177 +0x69
crypto/tls.(*atLeastReader).Read(0xc000fddb40, 0xc0002a0000, 0xf42d, 0xf42d, 0xc001264a08, 0x441bd2, 0xc0012649e0)
	/usr/local/go/src/crypto/tls/conn.go:761 +0x60
bytes.(*Buffer).ReadFrom(0xc00037acd8, 0x19c6c20, 0xc000fddb40, 0x409995, 0x1561ee0, 0x16937c0)
	/usr/local/go/src/bytes/buffer.go:207 +0xbd
crypto/tls.(*Conn).readFromUntil(0xc00037aa80, 0x19c8500, 0xc00000ed88, 0x5, 0xc00000ed88, 0x2ae)
	/usr/local/go/src/crypto/tls/conn.go:783 +0xf8
crypto/tls.(*Conn).readRecordOrCCS(0xc00037aa80, 0x1787c00, 0xc00037abb8, 0xc001264d58)
	/usr/local/go/src/crypto/tls/conn.go:590 +0x125
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:558
crypto/tls.(*Conn).Read(0xc00037aa80, 0xc0003fd000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:1236 +0x137
bufio.(*Reader).Read(0xc0000b4420, 0xc0003cc1f8, 0x9, 0x9, 0x4063f4, 0xc000d286c0, 0xc001264d58)
	/usr/local/go/src/bufio/bufio.go:223 +0x23e
io.ReadAtLeast(0x19c6ae0, 0xc0000b4420, 0xc0003cc1f8, 0x9, 0x9, 0x9, 0x19c6dc0, 0xc0007c9d30, 0xc000048050)
	/usr/local/go/src/io/io.go:310 +0x88
io.ReadFull(...)
	/usr/local/go/src/io/io.go:329
golang.org/x/net/http2.readFrameHeader(0xc0003cc1f8, 0x9, 0x9, 0x19c6ae0, 0xc0000b4420, 0x0, 0x0, 0x0, 0x0)
	/src/vendor/golang.org/x/net/http2/frame.go:237 +0x88
golang.org/x/net/http2.(*Framer).ReadFrame(0xc0003cc1c0, 0xc000667dd0, 0x0, 0x0, 0x0)
	/src/vendor/golang.org/x/net/http2/frame.go:492 +0xa1
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc001264fb8, 0x1786998, 0xc000058fb8)
	/src/vendor/golang.org/x/net/http2/transport.go:1679 +0x8d
golang.org/x/net/http2.(*ClientConn).readLoop(0xc0000bc480)
	/src/vendor/golang.org/x/net/http2/transport.go:1607 +0x76
created by golang.org/x/net/http2.(*Transport).newClientConn
	/src/vendor/golang.org/x/net/http2/transport.go:670 +0x637

goroutine 21 [select]:
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).sync(0xc00043e270, 0xc000088660)
	/src/pkg/queue/poller.go:61 +0x1ae
created by github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).Run
	/src/pkg/queue/poller.go:80 +0x64

goroutine 14 [select]:
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Queues).Sync(0xc000433ef0, 0xc000088660)
	/src/pkg/queue/queue.go:68 +0x19d
created by main.(*runCmd).run
	/src/cmd/workerpodautoscaler/run.go:110 +0x3ec

Partial scale down is not happening in one scenario

Partial scale down is not happening in one scenario. Let's see the scenario.
โŒ When there is no backlog, but processing is happening by some workers. But the number of unprocessed jobs is less than targetMessagesPerWorker. Then scale down based on maxDisruption set is not taking effect.

Example Scenario (SQS queue service):

min=0
max=20
targetMessagesPerWorker=200
maxDisruption=10%
secondsToProcessOneJob=0.3
currentRunningWorkers(pods)=20
messagesSentPerMinute=10

AWS Metrics Current Values:

approxMessagesVisible=0
approxMessagesNotVisible=10

Expectation

Based on the number of messages sent being a low value and maxDisruption being set, the scale down should have triggered, slowly bringing the pods down from 20 to min. But this did not happen.

Suspected Bug

This is happening because when qMessages < targetMessagesPerWorker then partial scale down is not considered and currenWorkers are being returned as desired.

if (math.Abs(1.0-usageRatio) <= tolerance) || (queueMessages < targetMessagesPerWorker) {
// desired is same as current in this scenario
return convertDesiredReplicasWithRules(
currentWorkers,
currentWorkers,

Scale up logic doesn't work well for long-running tasks

Consider a scenario where TargetJobsPerPod is set to 1. When the first job comes, it scales up from 0 to 1 pod. When the first job is still being processed and a second job comes into the queue, it doesn't scale up to 2 pods. This is because only the jobs visible are used to calculate the usage ratio. I think we should use both jobs visible and jobs currently being processed to calculate the desired workers.

WPA concurrent write crash

Today with the latest master code c31dc85, WPA was seen to crash with this error once.

fatal error: concurrent map read and map write

goroutine 11672 [running]:
runtime.throw(0x16dfaec, 0x21)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc0055f9978 sp=0xc0055f9948 pc=0x433912
runtime.mapaccess2_faststr(0x14f5de0, 0xc0003c42d0, 0xc004df5740, 0x51, 0xc00317b730, 0x413201)
        /usr/local/go/src/runtime/map_faststr.go:116 +0x47c fp=0xc0055f99e8 sp=0xc0055f9978 pc=0x412e5c
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*SQS).getReceiveMessageCache(...)
        /src/pkg/queue/sqs.go:277

Poller metrics

Following polling metrics is required, to find the issues early.

  1. wpa_poller_loop_ount_success
  2. wpa_poller_loop_duration_seconds

This can help us figure out if the pollers are getting stuck and not getting executed.

Can one of `targetMessagesPerWorker` and `secondsToProcessOneJob` be deduced from the other

As per the current implementation targetMessagesPerWorker is a mandatory option and secondsToProcessOneJob is an optional one with default=0

As per my understanding, secondsToProcessOneJob is used to calculate minimum number of workers and targetMessagesPerWorker is used to calculate the usage ratio and hence to determine desired number of workers.

Since both the values can be tuned separately, if these values are not in sync, it can result in undesired scaling behaviour.
Taking the example from one of the test cases,

// TestScaleUpWhenCalculatedMinIsGreaterThanMax
// when calculated min is greater than max
func TestScaleUpWhenCalculatedMinIsGreaterThanMax(t *testing.T) {
queueName := "otpsms"
queueMessages := int32(1)
messagesSentPerMinute := float64(2136.6)
secondsToProcessOneJob := float64(10)
targetMessagesPerWorker := int32(2500)
currentWorkers := int32(10)
idleWorkers := int32(0)
minWorkers := int32(2)
maxWorkers := int32(20)
maxDisruption := "0%"
expectedDesired := int32(20)
desiredWorkers := controller.GetDesiredWorkers(
queueName,
queueMessages,
messagesSentPerMinute,
secondsToProcessOneJob,
targetMessagesPerWorker,
currentWorkers,
idleWorkers,
minWorkers,
maxWorkers,
&maxDisruption,
)
if desiredWorkers != expectedDesired {
t.Errorf("expected-desired=%v, got-desired=%v\n", expectedDesired,
desiredWorkers)
}

As per this snapshot, a worker would take 10s to process a single job, and there were approximately 2136.6 messages sent in the last 1 minute. This would make the minimum number of workers needed to 21366, but the number of desired workers will be calculated as 1 since once worker can handle 2500 messages in a minute(targetMessagesPerWorker is 2500, but there is only one message in the queue)

Outcome:
Min: 21366
Max: 20
Desired calculated: 1
Desired: 20(capped to max workers)

Yes, allowing both targetMessagesPerWorker and secondsToProcessOneJob to be configured separately might help in cases where we want to clear backlog as fast as possible(@justjkk 's comment), however it is true only for 50% of cases where

targetMessagesPerWorker < (60 / secondsToProcessOneJob)

My question is, can one of targetMessagesPerWorker and secondsToProcessOneJob be deduced from the other using the formula

secondsToProcessOneJob = 60 / targetMessagesPerWorker

to avoid scaling issues due to misconfiguration?

Errors in syncing the Deployment and updating WPA status

This could be environment related, keep an issue open till this is investigated.

  • Syncing deployment
E0804 06:17:42.597049       1 controller.go:321] error syncing '{titan-dq-latest/realtime-analysis-8c8de add}': Deployment realtime-analysis-8c8de not found in namespace titan-dq-latest, requeuing
  • Updating WPA status.
E0804 06:22:12.437967       1 controller.go:778] Error updating wpa status, err: Operation cannot be fulfilled on workerpodautoscalers.k8s.practo.dev "fabricvnmarkettingsettings-27706": the object has been modified; please apply your changes to the latest version and try again

Partial scale down doesn't work in some scenarios

Assuming maxDisruption is set and secondsToProcessOneJob is not set,
โœ… when there is some backlog, partial scale down is allowed to happen based on the usageRatio calculation.
โœ… when there is no backlog and all workers are idle, massive scale down happens.
โŒ when there is no backlog and some workers are not idle, partial scale should happen but doesn't. However, if NumberOfMessagesSent > 0 and secondsToProcessOneJob is set, it scales down to minimum allowed based on messagesSent.

Example Scenario

  • maxDisruption: 50% (enabled)
  • secondsToProcessOneJob: 0 (disabled)
  • CurrentMessages(ApproximateNumberOfMessagesVisible): 0 (no backlog)
  • CurrentReplicas: 17
  • messagesReceived(NumberOfMessagesReceived): 3 (atleast some workers are not idle)

Expected behavior

  • DesiredReplias: 8 (Scale down by half)

Observed behavior

  • DesiredReplicas: 17 (No scale down)

Non existent queues leads to high cpu in WPA

When the queue specified in WPA is not present, WPA CPU shoots up. Since it continuously tries to find the non existent queue. If there are many non existent queues then the CPU can increase upto 100% cpu limit leading to CPU throttling of the WPA pod and resulting in scaling becoming very slow.

E0302 12:03:44.409670       1 sqs.go:292] Unable to find queue "prod-dx-doctor_appointment_ingest_queue", AWS.SimpleQueueService.NonExistentQueue: The specified queue does not exist for this wsdl version.

Proposed Solutions

  • Retry with a crashloop if the queue does not exist.

  • Put external validation on non existent queue. Not adding validations in WPA because queue creation can happen by an external job. WPA validation should be done by the external service which creates WPA for you.

Odd scaling behavior with low targetMessagesPerWorker

Set the following options just to test the autoscaler
minReplicas: 1 maxReplicas: 8 targetMessagesPerWorker: 1 --resync-period=1 --wpa-threads=100 --aws-region=us-west-2 --sqs-short-poll-interval=1 --sqs-long-poll-interval=1

and noticed some strange behavior in scaling that I can't explain. With a no messages on the queue, one pod is spun up as expected. Then when I enqueued a second message, the autoscaler spun up max replicas (8), then terminated all but two, then slowly started spinning up new replicas one at a time back up to max replicas (8).

I played around with the configuration, thread count, and poll times but still kept getting this result. Not sure if my understanding of "targetMessagesPerWorker" is off or if this is a bug, but thought I'd mention it.

WPA suffers restart with all go routines crashing

WPA crashes due to race condition when sleeping for short poll interval.

goroutine 71311719 [sleep]:
time.Sleep(0x4a817c800)
	/usr/local/go/src/runtime/time.go:188 +0xba
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*SQS).waitForShortPollInterval(...)
	/src/pkg/queue/sqs.go:372
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*SQS).poll(0xc0001fbc80, 0xc004cddfc0, 0x32, 0xc0062220f6, 0x2a, 0xc00440a440, 0xf, 0xc0062220c0, 0x60, 0xc0062220c8, ...)
	/src/pkg/queue/sqs.go:486 +0x4bf
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).runPollThread(0xc00019c090, 0xc004cddfc0, 0x32)
	/src/pkg/queue/poller.go:46 +0x88
created by github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).Run
	/src/pkg/queue/poller.go:94 +0x20d

V1.0.0 scale down is never happening because msgsRecieved is not updated

I am testing auto scaler version 1.0.0 as I see it was lately released. experiencing this behavior:
after having scale up, and the work finished, scale down does not happen.
looking on my queue in AWS console I see its empty without messages in flights for more than 20 mins:
image
here is monitor status for the queue, where you can see that all the message handling had finished before 22:00:
image

when looking on the auto-scaler logs I see that the replicas are not scaled down because there are received messages:
two snapshots of times in logs, where time differences more than 20 mins (the cache according to the documentation in the code should be 1 min)
image

image

it looks like the msgsReceived cache is never refreshed.

Pod describe info:
Name: workerpodautoscaler-57fc6bf9d9-225db
Namespace: kube-system
Priority: 1000
Priority Class Name: infra-normal-priority
Node: ip-192-168-127-142.us-east-2.compute.internal/192.168.127.142
Start Time: Sun, 12 Jul 2020 00:41:20 +0300
Labels: app=workerpodautoscaler
pod-template-hash=57fc6bf9d9
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.126.38
Controlled By: ReplicaSet/workerpodautoscaler-57fc6bf9d9
Containers:
wpa:
Container ID: docker://4898ad92c38baed27d84a0f206ee60b85f0b149526142a2abfd956dccc676069
Image: practodev/workerpodautoscaler:v1.0.0
Image ID: docker-pullable://practodev/workerpodautoscaler@sha256:2bdcaa251e2a2654e73121721589ac5bb8536fbeebc2b7a356d24199ced84e73
Port:
Host Port:
Command:
/workerpodautoscaler
run
--resync-period=60
--wpa-threads=10
--aws-regions=us-east-2
--sqs-short-poll-interval=20
--sqs-long-poll-interval=20
--wpa-default-max-disruption=0
State: Running
Started: Sun, 12 Jul 2020 00:41:22 +0300
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 10m
memory: 20Mi
Environment Variables from:
workerpodautoscaler-secret-env Secret Optional: false
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from workerpodautoscaler-token-j8lvc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
workerpodautoscaler-token-j8lvc:
Type: Secret (a volume populated by a Secret)
SecretName: workerpodautoscaler-token-j8lvc
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: :NoExecute
:NoSchedule
Events:
Type Reason Age From Message


Normal Scheduled 45m default-scheduler Successfully assigned kube-system/workerpodautoscaler-57fc6bf9d9-225db to ip-192-168-127-142.us-east-2.compute.internal
Normal Pulling 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Pulling image "practodev/workerpodautoscaler:v1.0.0"
Normal Pulled 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Successfully pulled image "practodev/workerpodautoscaler:v1.0.0"
Normal Created 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Created container wpa
Normal Started 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Started container wpa

WPA deployment:
apiVersion: k8s.practo.dev/v1alpha1
kind: WorkerPodAutoScaler
metadata:
creationTimestamp: "2020-01-28T14:59:16Z"
generation: 5316
name: processor-ip4m
namespace: default
resourceVersion: "52253623"
selfLink: /apis/k8s.practo.dev/v1alpha1/namespaces/default/workerpodautoscalers/processor-ip4m
uid: c111ba43-41de-11ea-b4d5-066ce59a32e8
spec:
deploymentName: processor-ip4m
maxDisruption: null
maxReplicas: 80
minReplicas: 1
queueURI: **************
secondsToProcessOneJob: 10
targetMessagesPerWorker: 720
status:
CurrentMessages: 0
CurrentReplicas: 31
DesiredReplicas: 31

Update AWS version to support IAM for Service Accounts

Any chance this could support a aws sdk version that allows IAM for Service Accounts instead of passing the secrets as environment variables?

Would this be as easy as updating the go aws sdk to a version past 1.23.13? Perhaps the latest 1.28.12?

Related #51

Too frequent scale ups and down of pods

With the current autoscaling logic, every resync-period the scale up and down of pods are happening very frequently. If you see the scale up is happening to max even when it is not required.

The autoscaling logic requires tweaking to have some tolerance and not do unnecessary scale up and downs.

Screenshot 2019-08-10 at 3 42 56 PM

Scale down can happen when worker deletes the job before processing

When a job is there in SQS, a worker picks that job, then deletes that job and then starts the processing. And say takes 5 minutes to process the job. And on success does nothing and on failure re inserts the job.

WPA would scale down this worker.
How do you propose to fix this issue with SQS ?

Desired should be incremented only above target

Desired should not be incremented when the messages are below target.
In case of SQS:

I0823 07:11:07.026362       1 controller.go:300] prod-Q: messages: 1, idle: -1, desired: 1
I0823 07:11:08.183130       1 sqs.go:276] prod-Q: approxMessages=0
I0823 07:11:08.188899       1 sqs.go:296] pprod-Q: approxMessagesNotVisible > 0, not scaling down
I0823 07:11:16.020379       1 controller.go:300] prod-Q: messages: 0, idle: -1, desired: 2

When apporx messages is zero and workers are processing i.e. approxMessagesNotVisible > 0 but it is below the target which is 10 in this case, we should not increment the desired value.

Workerpodautoscaler crashes when trying to modify Deployment

Workerpod autoscaler is crashing when trying to modify Deployment.

Here is the fatal error

I0808 17:26:59.314616       1 controller.go:300] queue: XXX, messages: 0, idle: 5, desired: 0
F0808 17:26:59.317674       1 controller.go:307] Failed to update deployment: Operation cannot be fulfilled on deployments.apps "XXX: the object has been modified; please apply your changes to the latest version and try again

GCP (Google Cloud Platform) support

I've used worker-pod-autoscaler in AWS, and it works pretty well!
thank you for the great work.

is there any plans to add support to google cloud platform too?

Data Validation for scaling metrics

WPA specs are not validated for data type and other value ranges it can have. Please add WPA data validation and fail the creation of WPA resource if it does not get validated.

Retry wpa status update

Add retry for status update of wpa

F0828 06:37:05.140914       1 controller.go:322] Error updating status of wpa: Operation cannot be fulfilled on workerpodautoscalers.k8s.practo.dev "XXX": StorageError: invalid object, Code: 4, Key: /registry/k8s.practo.dev/workerpodautoscalers/XXX, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 0xc4481a3850, UID in object meta:

Latency metric

  • Desired are set from WPA control loops.
  • Desired pre-requisites are calculated from queue provider loops.

When you cross 800 queues, the default 20 threads does not work and you need to increase this value. But this information is hidden to the user.

We should have some WPA metric which specifies the latency it is incurring in bringing about the change which can help the user re-configure the threadiness based on this metric increasing.

Events for the users to understand what's going on with their WPA

We have often see developer telling us the problems going on with WPA and then the first way to debug the issue is by going through WPA kubernetes logs for the queue to find what is going on. We should logs events for all the important thing that goes on with the WPA object.

Reasons:

  1. Easy to debug.
  2. Developer who does have access to wpa logs but has access to their WPA object can troubleshoot problems on their own.

Examples

  • Say if the queue specified does not exist, logs error, but does not log events.
  • Cases where scale up is happening slow, developer should be able to see their WPA queue activity history to measure the scaling times. lastScaleupTime, idleSince

Env variable `WPA_TAG` not exported in `hack/install.sh`

As environment variable WPA_TAG is not exported in hack/install.sh, when running hack/generate.sh, it will fail to replace the placeholder {{ WPA_TAG }} properly in the deployment template.

       image: practodev/workerpodautoscaler:{{ WPA_TAG }}

The consequence is that the workerpodautoscaler pod will be stuck in status InvalidImageName:

$ k get pods -n kube-system
NAME                                        READY   STATUS             RESTARTS   AGE
workerpodautoscaler-7bbd6f667-z59b6         0/1     InvalidImageName   0          17s

WPA Beanstalk is crashing sometimes with nil pointer error.

WPA is crashing and restarting sometimes with the nil pointer error

I0702 11:51:51.353487       1 controller.go:484] dev-call_otp_updater-crstg1 minWorkers=0, maxDisruptableWorkers=0
I0702 11:51:51.353505       1 controller.go:356] dev-call_otp_updater-crstg1: messages: 0, idle: 0, desired: 0
I0702 11:51:51.353513       1 controller.go:593] crstg1/25-29871-otpcallupdater-praccomm-api: WPA status is already up to date
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1202436]

goroutine 45213 [running]:
github.com/beanstalkd/go-beanstalk.(*Conn).cmd(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x16c2531, 0xa, 0xc00d553a38, 0x1, ...)
        /src/vendor/github.com/beanstalkd/go-beanstalk/conn.go:76 +0x66
github.com/beanstalkd/go-beanstalk.(*Tube).Stats(0xc00d553a90, 0x34, 0xc00085c1de, 0x16)
        /src/vendor/github.com/beanstalkd/go-beanstalk/tube.go:93 +0xcf
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*beanstalkClient).executeGetStats(0xc00734c440, 0xc00d553b98, 0x139935a, 0x1541140, 0x15dc740)
        /src/pkg/queue/beanstalk.go:123 +0x90
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*beanstalkClient).getStats(0xc00734c440, 0xc000a2ec00, 0x34, 0x1a0bec0, 0xc00734c440)
        /src/pkg/queue/beanstalk.go:144 +0x2f
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Beanstalk).getMessages(0xc00017e090, 0xc000a2ec00, 0x34, 0x446235, 0xa1bc22, 0xc00e35a450)
        /src/pkg/queue/beanstalk.go:262 +0x8a
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Beanstalk).poll(0xc00017e090, 0xc0014c3b00, 0x26, 0xc000a2ec1e, 0x16, 0xc00b599a70, 0x6, 0xc000a2ec00, 0x34, 0xc000a2ec0c, ...)
        /src/pkg/queue/beanstalk.go:351 +0x75
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).runPollThread(0xc00017e0f0, 0xc0014c3b00, 0x26)
        /src/pkg/queue/poller.go:46 +0x88
created by github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).Run
        /src/pkg/queue/poller.go:93 +0x209

Only massive scale down is possible with WPA at present (WPA only supports scale down to min)

Say, for example:
You have the following WPA configuration

min = 0
max = 20
target = 10
  • Say, we got 100 messages in the queue then WPA scales up the number of pods. Good ๐Ÿ‘
  • But, WPA scales down all the pods from the current number to min(0) only when there are no messages in the queue and none of the workers are doing the processing(emptyReceives)

WPA at present does not know the workers which are idle and which are still processing. It only knows if all are not processing or not. Therefore it does not assume the workers to be idempotent and waits for all workers to finish processing then it scales down all together to the min.

Problem
Massive scale down is not cost efficient. We can discuss and find ways to solve this.

cc @anujith-singh @jotish

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.