To ensure that no more than one instance is running on several consequent pushes, the

Thank you for showering me with all the different ideas, <a class="user-mention notran

Looks like <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

stop not working when start was cancelled by concurrency cancelation rule about ec2-github-runner HOT 13 OPEN

machulav commented on July 28, 2024 1

stop not working when start was cancelled by concurrency cancelation rule

from ec2-github-runner.

Comments (13)

jpalomaki commented on July 28, 2024 2

Would it help if you used a combination of workflow-level concurrency control and job-level concurrency control? Then you could use cancel-in-progress for the test job, which I presume would be safe (since it would only cut the test job short), while still being able to limit concurrent workflow runs to just one as well?

from ec2-github-runner.

jpalomaki commented on July 28, 2024 1

You can also explicitly skip workflow runs on push, by using commit messages: https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/.

Also, if you used the pull_request event, you could perhaps leverage draft PRs to conditionally skip workflows runs if github.event.pull_request.draft is true, until the PR is made ready for review.

from ec2-github-runner.

jpalomaki commented on July 28, 2024 1

Instead, there should be a sort of handshake protocol / or a failover self-stopping action / dead-man's switch.

Nice idea 👍 Not sure how complex/brittle the implementation would get, though.

Does github action have a sort of {{ always }} step that gets to run even if the workflow is cancelled? some sort of finalize.

Yep, the stop job is in fact annotated with that (see the if condition), but the problem is that the input to that job is now lacking (because the start job was cut short).

All these suggestions are great for a handful of core devs, but we have contributors with a hugely varying level of expertise and it's already at times too much to even ask for a PR. So these won't work well.

Understood. One option could then be to leverage PR comments, e.g. verbatim /test to trigger tests at a suitable spot. You can use the issue_comment event (comparing the comment text) to trigger a workflow when a PR comment is added. Or perhaps there are some ready-made github apps/integrations that could help with this?

where have you found lock?

That was just an example, static concurrency group name.

Ideally we want to prevent concurrent runs within the same PR. But other PRs should be totally independent and allowed to run their CI when needed.

All right, looks like you are in fact using ${{ github.ref }} for the concurrency group already 👍

from ec2-github-runner.

mgaitan commented on July 28, 2024 1

what I'm doing is to setup a croned self-termination of the instance using aws cli

 pre-runner-script: |
       aws ec2 terminate-instances --instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id) | at now +${{ inputs.timeout-minutes }} minutes

from ec2-github-runner.

jpalomaki commented on July 28, 2024

If you are stopping (canceling) the runner start job before it gets a chance to finish, I don't think there's much the ec2-github-runner action could do to guard against that; it simply never gets to output the instance id and label that are needed for the runner stop job to work.

How about not using cancel-in-progress, but instead have the workflow runs queue up? Docs say: When a concurrent job or workflow is queued, if another job or workflow using the same concurrency group in the repository is in progress, the queued job or workflow will be pending. Any previously pending job or workflow in the concurrency group will be canceled

This means you'd only ever have one pending workflow run in the queue (I quickly tested this, and it seems to be the case).

Here's my test workflow (note how I don't use cancel-in-progress), try running it several times in succession:

name: Test

on:
  workflow_dispatch:

concurrency: lock

jobs:
  main:
    runs-on: ubuntu-20.04
    steps:
      - run: |
          echo "Running ..."
          sleep 30

from ec2-github-runner.

stas00 commented on July 28, 2024

If you are stopping (canceling) the runner start job before it gets a chance to finish, I don't think there's much the ec2-github-runner action could do to guard against that; it simply never gets to output the instance id and label that are needed for the runner stop job to work.

That's because we have a race condition and nothing to guard against it.

Instead, there should be a sort of handshake protocol / or a failover self-stopping action / dead-man's switch.

So for example, the atomic failover protocol could be this:

- start action:
set a timeout flag on the ec2 side (user-data script?)
attempt to start
if started:
    set the instance id (which stop can use to stop)
    turn timeout flag off
else:
   do nothing

some time later - timeout on the ec2 side kicks in if it wasn't turned off and the instance shuts itself down

I don't quite know how to exactly implement this, but I hope that my proposal of "atomic action" can lead to some working solution.

Does github action have a sort of {{ always }} step that gets to run even if the workflow is cancelled? some sort of finalize.

from ec2-github-runner.

stas00 commented on July 28, 2024

You can also explicitly skip workflow runs on push, by using commit messages: https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/.

Also, if you used the pull_request event, you could perhaps leverage draft PRs to conditionally skip workflows runs if github.event.pull_request.draft is true, until the PR is made ready for review.

All these suggestions are great for a handful of core devs, but we have contributors with a hugely varying level of expertise and it's already at times too much to even ask for a PR. So these won't work well.

from ec2-github-runner.

stas00 commented on July 28, 2024

Thank you for showering me with all the different ideas, @jpalomaki

When a concurrent job or workflow is queued, if another job or workflow using the same concurrency group in the repository is in progress, the queued job or workflow will be pending. Any previously pending job or workflow in the concurrency group will be canceled.

yes, this may be good enough, other than wasting resources of continuing running the current job even if know that its outcome is irrelevant.

concurrency: lock

where have you found lock? I haven't seen this documented. Unless you're just using a random consistent string, which simply indicates that any PR of this repo should be lined up for a single resource. i.e. only a single instance ever runs across all PRs.

Ideally we want to prevent concurrent runs within the same PR. But other PRs should be totally independent and allowed to run their CI when needed.

Though it might be OK as well.

Currently it takes a really long time to run even a basic test due to many compilations that have to happen before the tests can run. I need to think how to minimize that overhead.

from ec2-github-runner.

stas00 commented on July 28, 2024

Instead, there should be a sort of handshake protocol / or a failover self-stopping action / dead-man's switch.

Nice idea +1 Not sure how complex/brittle the implementation would get, though.

Won't actually this be as simple as doing:

  start-runner:
[...]
    steps:
      - run: perl -e 'sleep 600; qx[shutdown now]' & # start dead-man's switch
      - <normal start action here> 
      - run: killall -9 perl # cancel dead-man's switch

of course this is just a rough prototype to show the concept, and relies on not having any other perl programs running, but this is good enough to show the idea.

update: duh! there is no instance yet to run the switch on on the user side ;) so this would need to be called in user-data script - but perhaps this can be done by the action and hide the complexity from the user.

from ec2-github-runner.

jpalomaki commented on July 28, 2024

Looks like @machulav has already done some work on this topic: #7

from ec2-github-runner.

stas00 commented on July 28, 2024

The linked discussion alludes that some timeout work will be done, but I can't find any PRs that actually implement it. Not sure why that issue was closed then, since we can see the race condition is still there.

from ec2-github-runner.

stas00 commented on July 28, 2024

If I'm not mistaken the timeout issue is #4 and it's still unresolved.

from ec2-github-runner.

stas00 commented on July 28, 2024

That is a great idea, @jpalomaki! I didn't know it was possible to combine both in the same workflow.

from ec2-github-runner.

stop not working when start was cancelled by concurrency cancelation rule about ec2-github-runner HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent