Giter Site home page Giter Site logo

Comments (13)

jpalomaki avatar jpalomaki commented on July 28, 2024 2

Would it help if you used a combination of workflow-level concurrency control and job-level concurrency control? Then you could use cancel-in-progress for the test job, which I presume would be safe (since it would only cut the test job short), while still being able to limit concurrent workflow runs to just one as well?

from ec2-github-runner.

jpalomaki avatar jpalomaki commented on July 28, 2024 1

You can also explicitly skip workflow runs on push, by using commit messages: https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/.

Also, if you used the pull_request event, you could perhaps leverage draft PRs to conditionally skip workflows runs if github.event.pull_request.draft is true, until the PR is made ready for review.

from ec2-github-runner.

jpalomaki avatar jpalomaki commented on July 28, 2024 1

Instead, there should be a sort of handshake protocol / or a failover self-stopping action / dead-man's switch.

Nice idea 👍 Not sure how complex/brittle the implementation would get, though.

Does github action have a sort of {{ always }} step that gets to run even if the workflow is cancelled? some sort of finalize.

Yep, the stop job is in fact annotated with that (see the if condition), but the problem is that the input to that job is now lacking (because the start job was cut short).

All these suggestions are great for a handful of core devs, but we have contributors with a hugely varying level of expertise and it's already at times too much to even ask for a PR. So these won't work well.

Understood. One option could then be to leverage PR comments, e.g. verbatim /test to trigger tests at a suitable spot. You can use the issue_comment event (comparing the comment text) to trigger a workflow when a PR comment is added. Or perhaps there are some ready-made github apps/integrations that could help with this?

where have you found lock?

That was just an example, static concurrency group name.

Ideally we want to prevent concurrent runs within the same PR. But other PRs should be totally independent and allowed to run their CI when needed.

All right, looks like you are in fact using ${{ github.ref }} for the concurrency group already 👍

from ec2-github-runner.

mgaitan avatar mgaitan commented on July 28, 2024 1

what I'm doing is to setup a croned self-termination of the instance using aws cli

 pre-runner-script: |
       aws ec2 terminate-instances --instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id) | at now +${{ inputs.timeout-minutes }} minutes

from ec2-github-runner.

jpalomaki avatar jpalomaki commented on July 28, 2024

If you are stopping (canceling) the runner start job before it gets a chance to finish, I don't think there's much the ec2-github-runner action could do to guard against that; it simply never gets to output the instance id and label that are needed for the runner stop job to work.

How about not using cancel-in-progress, but instead have the workflow runs queue up? Docs say: When a concurrent job or workflow is queued, if another job or workflow using the same concurrency group in the repository is in progress, the queued job or workflow will be pending. Any previously pending job or workflow in the concurrency group will be canceled

This means you'd only ever have one pending workflow run in the queue (I quickly tested this, and it seems to be the case).

Here's my test workflow (note how I don't use cancel-in-progress), try running it several times in succession:

name: Test

on:
  workflow_dispatch:

concurrency: lock

jobs:
  main:
    runs-on: ubuntu-20.04
    steps:
      - run: |
          echo "Running ..."
          sleep 30

from ec2-github-runner.

stas00 avatar stas00 commented on July 28, 2024

If you are stopping (canceling) the runner start job before it gets a chance to finish, I don't think there's much the ec2-github-runner action could do to guard against that; it simply never gets to output the instance id and label that are needed for the runner stop job to work.

That's because we have a race condition and nothing to guard against it.

Instead, there should be a sort of handshake protocol / or a failover self-stopping action / dead-man's switch.

So for example, the atomic failover protocol could be this:

- start action:
set a timeout flag on the ec2 side (user-data script?)
attempt to start
if started:
    set the instance id (which stop can use to stop)
    turn timeout flag off
else:
   do nothing

some time later - timeout on the ec2 side kicks in if it wasn't turned off and the instance shuts itself down

I don't quite know how to exactly implement this, but I hope that my proposal of "atomic action" can lead to some working solution.

Does github action have a sort of {{ always }} step that gets to run even if the workflow is cancelled? some sort of finalize.

from ec2-github-runner.

stas00 avatar stas00 commented on July 28, 2024

You can also explicitly skip workflow runs on push, by using commit messages: https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/.

Also, if you used the pull_request event, you could perhaps leverage draft PRs to conditionally skip workflows runs if github.event.pull_request.draft is true, until the PR is made ready for review.

All these suggestions are great for a handful of core devs, but we have contributors with a hugely varying level of expertise and it's already at times too much to even ask for a PR. So these won't work well.

from ec2-github-runner.

stas00 avatar stas00 commented on July 28, 2024

Thank you for showering me with all the different ideas, @jpalomaki

When a concurrent job or workflow is queued, if another job or workflow using the same concurrency group in the repository is in progress, the queued job or workflow will be pending. Any previously pending job or workflow in the concurrency group will be canceled.

yes, this may be good enough, other than wasting resources of continuing running the current job even if know that its outcome is irrelevant.

concurrency: lock

where have you found lock? I haven't seen this documented. Unless you're just using a random consistent string, which simply indicates that any PR of this repo should be lined up for a single resource. i.e. only a single instance ever runs across all PRs.

Ideally we want to prevent concurrent runs within the same PR. But other PRs should be totally independent and allowed to run their CI when needed.

Though it might be OK as well.

Currently it takes a really long time to run even a basic test due to many compilations that have to happen before the tests can run. I need to think how to minimize that overhead.

from ec2-github-runner.

stas00 avatar stas00 commented on July 28, 2024

Instead, there should be a sort of handshake protocol / or a failover self-stopping action / dead-man's switch.

Nice idea +1 Not sure how complex/brittle the implementation would get, though.

Won't actually this be as simple as doing:

  start-runner:
[...]
    steps:
      - run: perl -e 'sleep 600; qx[shutdown now]' & # start dead-man's switch
      - <normal start action here> 
      - run: killall -9 perl # cancel dead-man's switch

of course this is just a rough prototype to show the concept, and relies on not having any other perl programs running, but this is good enough to show the idea.

update: duh! there is no instance yet to run the switch on on the user side ;) so this would need to be called in user-data script - but perhaps this can be done by the action and hide the complexity from the user.

from ec2-github-runner.

jpalomaki avatar jpalomaki commented on July 28, 2024

Looks like @machulav has already done some work on this topic: #7

from ec2-github-runner.

stas00 avatar stas00 commented on July 28, 2024

The linked discussion alludes that some timeout work will be done, but I can't find any PRs that actually implement it. Not sure why that issue was closed then, since we can see the race condition is still there.

from ec2-github-runner.

stas00 avatar stas00 commented on July 28, 2024

If I'm not mistaken the timeout issue is #4 and it's still unresolved.

from ec2-github-runner.

stas00 avatar stas00 commented on July 28, 2024

That is a great idea, @jpalomaki! I didn't know it was possible to combine both in the same workflow.

from ec2-github-runner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.