Giter Site home page Giter Site logo

Comments (5)

vroad avatar vroad commented on July 28, 2024 2
  • When the action starts a new EC2 instance, some special code can be run, which monitors the active processes on the EC2 instance. If the EC2 instance is middle longer than some specified time, it can terminate itself and deregister the self-hosted runner on GitHub. However, this is the most unclear thing in this solution and should be verified properly.

To reliably stop idle instances, the monitoring program should run outside of the instance.
Otherwise if the instance become unresponsive for some reason, it won't terminate.

If the instance is in ASG, unhealthy instances will get terminated, and new instances comes up as long as desired capacity is bigger than 0.

AWS doesn't always mark unresponsive instance as unhealthy, though.
To stop such instances you'll need custom health check. To save cost we can't keep ALB running, perhaps? So the only option left to us is custom lambda-based health check. Instances that does not report status correctly should be terminated.
Could be done without ASG, but no replacement instances come up.

from ec2-github-runner.

jpalomaki avatar jpalomaki commented on July 28, 2024 2

Just a random thought: would it be possible to use a mix of scheduled and workflow-run event-triggered GitHub workflows to manage the pool of self-hosted runners (using ec2-github-runner action to start/stop them)?

from ec2-github-runner.

vroad avatar vroad commented on July 28, 2024

How could this be implemented? Something like Cluster Autoscaler?

https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html

To make this work in cluster autoscaler way, you need to set up autoscaling groups and serverless app that terminates idle nodes.

Or, you could create cloudwatch alarm that scales out ASG when a SQS queue has pending messages, and scales in when SQS becomes empty for a while.

non-fio queue could deliver the same message twice , so FIFO queue would work better.

from ec2-github-runner.

vroad avatar vroad commented on July 28, 2024

Calling this action with stop mode is no longer required if we use those methods?

If we create lambda function that periodically watches runner, stop action is useless.
SQS message's retention period can be short as 60 seconds, we could use that for emptying the queue, but setting too short value might terminate instance too early. Or, we could consume message manually and use retention as fallback, in case stop fails?

from ec2-github-runner.

machulav avatar machulav commented on July 28, 2024

@vroad thank you for your ideas!

I thought about a bit different solution:

  • At the beginning of your workflow, you sill run the ec2-github-runner action. But in addition to the current parameters, you specify the number of EC2 instances you need to handle the workflow and the label, which will be assigned to each runner. Then, you use the label in the following jobs the workflow to run them on the newly created runners.
  • At the beginning of the execution, the action will check how many runners are already active with the specified label and create the rest if required.
  • When the action starts a new EC2 instance, some special code can be run, which monitors the active processes on the EC2 instance. If the EC2 instance is middle longer than some specified time, it can terminate itself and deregister the self-hosted runner on GitHub. However, this is the most unclear thing in this solution and should be verified properly.

In such a way, you should be able to gain the following benefits:

  • re-use runners for a few workflow runs;
  • at a consequence, save time at the beginning of some workflow runs as the runners will be already created;
  • the idle runners will be removed automatically;
  • for each workflow, you can specify different configurations and use different runners.

Does it make sense?

from ec2-github-runner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.