Giter Site home page Giter Site logo

catchpoint / workflow-telemetry-action Goto Github PK

View Code? Open in Web Editor NEW
278.0 5.0 37.0 16.79 MB

Github action to collect metrics (CPU, memory, I/O, etc ...) from your workflows to help you debug and optimize your CI/CD pipeline

License: Apache License 2.0

TypeScript 98.77% Shell 1.23%
actions ci-cd github monitoring observability telemetry workflow

workflow-telemetry-action's Introduction

workflow-telemetry-action

A GitHub Action to track and monitor the

  • workflow runs, jobs and steps
  • resource metrics
  • and process activities of your GitHub Action workflow runs. If the run is triggered via a Pull Request, it will create a comment on the connected PR with the results and/or publishes the results to the job summary.

The action traces the jobs' step executions and shows them in trace chart,

And collects the following metrics:

  • CPU Load (user and system) in percentage
  • Memory usage (used and free) in MB
  • Network I/O (read and write) in MB
  • Disk I/O (read and write) in MB

And traces the process executions (only supported on Ubuntu)

as trace chart with the following information:

  • Name
  • Start time
  • Duration (in ms)
  • Finish time
  • Exit status as success or fail (highlighted as red)

and as trace table with the following information:

  • Name
  • Id
  • Parent id
  • User id
  • Start time
  • Duration (in ms)
  • Exit code
  • File name
  • Arguments

Example Output

An example output of a simple workflow run will look like this.

Step Trace Example

Metrics Example

Process Trace Example

Usage

To use the action, add the following step before the steps you want to track.

permissions:
  pull-requests: write
jobs:
  workflow-telemetry-action:
    runs-on: ubuntu-latest
    steps:
      - name: Collect Workflow Telemetry
        uses: catchpoint/workflow-telemetry-action@v2

Configuration

Option Requirement Description
github_token Optional An alternative GitHub token, other than the default provided by GitHub Actions runner.
metric_frequency Optional Metric collection frequency in seconds. Must be a number. Defaults to 5.
proc_trace_min_duration Optional Puts minimum limit for process execution duration to be traced. Must be a number. Defaults to -1 which means process duration filtering is not applied.
proc_trace_sys_enable Optional Enables tracing default system processes (aws, cat, sed, ...). Defaults to false.
proc_trace_chart_show Optional Enables showing traced processes in trace chart. Defaults to true.
proc_trace_chart_max_count Optional Maximum number of processes to be shown in trace chart (applicable if proc_trace_chart_show input is true). Must be a number. Defaults to 100.
proc_trace_table_show Optional Enables showing traced processes in trace table. Defaults to true.
comment_on_pr Optional Set to true to publish the results as comment to the PR (applicable if workflow run is triggered by PR). Defaults to true.
Requires pull-requests: write permission
job_summary Optional Set to true to publish the results as part of the job summary page of the workflow run. Defaults to true.
theme Optional Set to dark to generate charts compatible with Github dark mode. Defaults to light.

workflow-telemetry-action's People

Contributors

actions-user avatar dependabot[bot] avatar gokhan721 avatar jjosef avatar massongit avatar nkraetzschmar avatar rwxdash avatar serkan-ozal avatar suleymanbarman avatar tspascoal avatar whywaita avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

workflow-telemetry-action's Issues

[Feature Request] Pull-request decoration with telemetry data could be dark-mode friendly

Hi Thundra devs! ๐Ÿ‘‹

First of all : awesome job on this Action, I'm already using it in a project and loving it โ™ฅ๏ธ

Unfortunately I'm one of those dark-mode addicted Devs (๐Ÿ˜‚) and I realized that the charts plotted in the PR comment driven by this GHA don't look super-great outside the "white" theme in Github, eg

image

I was wondering how hard it would be to support a dark-mode friendly report, eventually by requiring opt-in from the user through the Action public API, ie providing another input like

      - name: Setup telemetry for build machine
        uses: thundra-io/[email protected]
        with:
          theme: 'dark' 

and using such an input to disambiguate the colors to inject in the payload to be sent to globadge under the hood ๐Ÿ‘€

Thanks in advance for reading/considering this feature request ๐Ÿ™‚

Swapfile metrics

It would be nice to measure /swapfile usage on ubuntu. Because not enough RAM is huge performance bottleneck and when actions/runner start using swap it increase build time or kill process if no ram+swap space.

In my case I'm forced to this issue https://stackoverflow.com/q/71590851/4024146

It would be nice when you could measure swap file too.

Reuse existing Telemetry PR comment when new telemetry is executed

Descrition of Issue

When comment_on_pr is true, action post telemetry result to PR comment. This is great feature. However it may bother PR because action post new PR comment for each new commit to PR.

Is there any plant to make it better with PR comment?

Expected behaviour

I think there are 2 method, which is often selected by other PR comment OSS.

  1. Reuse existing PR Comment if exisiting.
  2. Add new PR comment, but hide when old comment is existing.

Actual behaviour

Telemetry PR comment is posted for each new commit on PR.

image
image
image

Environment

  • OS: ubuntu-latest (=ubuntu-22.04)
  • Actions version: unforesight/workflow-telemetry-action@v1

How to save the job summary generated?

Hey team,
I see this metrics are generated and is present in job summary when ran successfully.
Is there a way to send this metrics to monitoring tool like prometheus or collector tool like Open telemetry???

-> Or may be if we could send these metrics as a post request to a particular application where we developed for ourself?
Any suggestion on flow of work helps.
Thanks in advance.

Unable to render rich display when generated telemetry contains mermaid keywords

In Cilium we're using this action and hitting an issue with the rendered mermaid charts: cilium/cilium#32241 (comment)

After some investigation, I think we are hitting this bug mermaid-js/mermaid#2495.

Basically, if the telemetry action generates a chart with words like "call" in the node text, it breaks rendering due to the mermaid bug mentioned above.

Here's a minimal example that fails

gantt
	title Some title
	dateFormat x
	axisFormat %H:%M:%S

	Call blah blah blah : 1721689506000, 1721689507000
Loading

And it works if you remove call:

gantt
	title Some title
	dateFormat x
	axisFormat %H:%M:%S

	blah blah blah : 1721689506000, 1721689507000
Loading

I've actually never used mermaid charts myself yet, so I'm not sure how easy it is to fix, but according to mermaid-js/mermaid#2495 (comment) you can wrap the nodes in quotes to fix the issue?

So eg:

gantt
	title Some title
	dateFormat x
	axisFormat %H:%M:%S

	"Call blah blah blah" : 1721689506000, 1721689507000
Loading

Seems to work ๐Ÿคท . Maybe the action can be updated to quote everything to fix the issue?

Source code for proc_tracer binary

Hello, thank you for providing the neat action.
I'm wondering where is the source code of the proc_tracer which is committed to this repository as binaries.

Thank you so much.

workflow-telemetry-action: `Error: EACCES: permission denied`

Hi there, thanks for the awesome action!
I'm currently using it as part of the rworkfows action and one of my users reported an error.

Any insights you might be able to provide would be greatly appreciated! @rwxdash @suleymanbarman

1. Bug description

An error arises during the telemetry step in the rworkflows action.

Console output

https://github.com/tempbioc/parody/actions/runs/7700556648/job/20984649706

Run runforesight/workflow-telemetry-action@v1
  with:
    github_token: ***
    comment_on_pr: false
    metric_frequency: 5
    proc_trace_min_duration: -1
    proc_trace_sys_enable: false
    proc_trace_chart_show: true
    proc_trace_chart_max_count: 100
    proc_trace_table_show: false
    job_summary: true
    theme: light
/usr/bin/docker exec  6c2e97cc4a283c72fe7bf139a67a7da38acadf3432c0c132475639d0cd[27](https://github.com/tempbioc/parody/actions/runs/7700556648/job/20984649706#step:4:28)aae2 sh -c "cat /etc/*release | grep ^ID"
[Workflow Telemetry] Initializing ...
[Workflow Telemetry] Starting step tracer ...
[Workflow Telemetry] Started step tracer
[Workflow Telemetry] Starting stat collector ...
[Workflow Telemetry] Started stat collector
[Workflow Telemetry] Starting process tracer ...
[Workflow Telemetry] Using proc_tracer_ubuntu-22
Error: [Workflow Telemetry] Unable to start process tracer
Error: [Workflow Telemetry] Error
Error: Error: EACCES: permission denied, open '/__w/_temp/_runner_file_commands/save_state_ac4bd7c5-921e-48b3-b0[29](https://github.com/tempbioc/parody/actions/runs/7700556648/job/20984649706#step:4:30)-d0cb99c12220'
[Workflow Telemetry] Initialization completed
Run echo "RGL_USE_NULL=TRUE" >> $GITHUB_ENV
/__w/_temp/06874ccf-cdc5-4e06-aed4-f0e66a4aa052.sh: line 1: /__w/_temp/_runner_file_commands/set_env_1542[31](https://github.com/tempbioc/parody/actions/runs/7700556648/job/20984649706#step:4:32)dd-e084-4f24-9ded-f0a82f647c04: Permission denied
/__w/_temp/06874ccf-cdc5-4e06-aed4-f0e66a4aa052.sh: line 2: /__w/_temp/_runner_file_commands/set_env_15[42](https://github.com/tempbioc/parody/actions/runs/7700556648/job/20984649706#step:4:44)31dd-e084-4f24-9ded-f0a82f647c04: Permission denied
/__w/_temp/06874ccf-cdc5-4e06-aed4-f0e66a4aa052.sh: line 3: /__w/_temp/_runner_file_commands/set_env_154231dd-e084-4f24-9ded-f0a82f647c04: Permission denied
/__w/_temp/06874ccf-cdc5-4e06-aed4-f0e66a4aa052.sh: line 4: /__w/_temp/_runner_file_commands/set_env_154231dd-e084-4f24-9ded-f0a82f647c04: Permission denied
/__w/_temp/06874ccf-cdc5-4e06-aed4-f0e66a4aa052.sh: line 5: /__w/_temp/_runner_file_commands/set_env_154231dd-e084-4f24-9ded-f0a82f647c04: Permission denied
Error: Process completed with exit code 1.

Expected behaviour

Telemetry actions runs all the way through and produces reports.

2. Reproducible example

https://github.com/tempbioc/parody/actions/runs/7700556648/workflow

Potentially related issues

Error: [Workflow Telemetry] Unable to finish stat collector

Here is the run where I see this issue. The step and process trace post fine (mostly) but nothing else is showing. Logs have:

Error: [Workflow Telemetry] Unable to finish stat collector
Error: [Workflow Telemetry] AggregateError
Error: AggregateError

This seems to happen because the action I'm trying to monitor fails with: Process completed with exit code 137, which seems to be GitHub Actions code for you used too many resources (thus me wanting to integrate telemetry).

Is there a way we can ensure that these metrics are still collected, even partially?

Process Trace not showing any processes

With the default configuration (only difference: comment_on_pr: 'false') I get an empty Process Trace in the job summary:

image

See e.g. here but this is consistent for me across multiple projects and workflows. All runs are Ubuntu, so in my understanding, this should show the processes.

What can be the issue?

Ability to get the reports on monthly basis

Hallo Team,

We would like to get metrics for the list of jobs which ran on a monthly basis and publish it as a report (either a message to Slack or a mail to github org admins). We looked at workflow-telemetry-action and would like to explore if it is possible to get tis report via workflow-telemetry-action. When looking at the options it seems like it can provide the metrics for the current running workflow. Is it possible to extend to run for all the workflows running on a monthly basis?

Running the action in a container fails

Any chance this action could run inside of a container? I tried to run this inside of a Ubuntu 22.04 container and it fails. I love the information it returns and that it runs across all types of hosted runners.

At work we use the actions-runner-controller, which means all runners are hosted inside of containers with docker-in-docker capabilities. I suspect it will fail there as well, but have not tested yet. I would really like to suggest this action to our users for helping them right sizing the runners they use.

Post Workflow Telemetry fails with 'Error: Resource not accessible by integration'

Hi,

I've added the step to my workflow with.

- name: Workflow Telemetry
  uses: runforesight/workflow-telemetry-action@v1

And the post step is failing with the error Error: Resource not accessible by integration

Setup Log:

##[debug]Evaluating condition for step: 'Workflow Telemetry'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Workflow Telemetry
##[debug]Register post job cleanup for action: runforesight/workflow-telemetry-action@v1
##[debug]Loading inputs
##[debug]Evaluating: github.token
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'token'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Loading env
Run runforesight/workflow-telemetry-action@v1
[Workflow Telemetry] Initializing ...
[Workflow Telemetry] Starting step tracer ...
[Workflow Telemetry] Started step tracer
[Workflow Telemetry] Starting stat collector ...
[Workflow Telemetry] Started stat collector
[Workflow Telemetry] Starting process tracer ...
::save-state name=PROC_TRACER_PID::1745588
##[debug]Save intra-action state PROC_TRACER_PID = 1745588
[Workflow Telemetry] Started process tracer
[Workflow Telemetry] Initialization completed
##[debug]Node Action run completed with exit code 0
##[debug]Finishing: Workflow Telemetry

Post log:

##[debug]Evaluating condition for step: 'Post Workflow Telemetry'
##[debug]Evaluating: always()
##[debug]Evaluating always:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Post Workflow Telemetry
##[debug]Loading inputs
##[debug]Evaluating: github.token
##[debug]Evaluating Index:
##[debug]..Evaluating github:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'token'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Loading env
Post job cleanup.
[Workflow Telemetry] Finishing ...
(node:1747867) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
Error: Resource not accessible by integration
##[debug]Node Action run completed with exit code 1
##[debug]Finishing: Post Workflow Telemetry

I suspect this is related to permissions as we have restricted them in our workflows, but I can't work out which need to be enabled from a cursory look. We're also using a custom runner, but hopefully that won't cause any issues?

permissions:
  contents: read
  packages: write

Thanks for developing this, it looks really fantastic.

Document permission `actions: read`

Seems like actions: read can be required and it doesn't seem to be documented. Would it make sense to do so?

[Workflow Telemetry] Unable to get current workflow job info. Please sure that your workflow have "actions:read" permission!

theme: dark option not working as expected

Hi,
The option theme: dark is not working as expected to generate reports for Github dark mode. All the images rendered are still considering the light mode which does not allow us to visualize properly the X and Y title for each one of the graphs generated by the telemetry report

cpu-metrics

Disk I/O metrics missing under Windows

We don't see Disk I/O metrics when running workflows under Windows. E.g. this workflow, run under Ubuntu, shows both Network and Disk I/O:

image

This other workflow, running for the same code but under Windows, only has Network I/O:

image

I'd expect Windows to show Disk I/O too.

I don't know if this is missing documentation about the lack of support under Windows, or a bug.

Gantt chart fails to display if step title has a :

image

This was the mermaid line that caused the issue

	Pull ghcr.io/enricomi/publish-unit-test-result-action:v1.40 : 1678371132000, 167837113600

Stripping : from step titles should file the problem (this was not even a regular step, but a pre step installing a docker action)

This is quite common for actions that use docker. The logs for this particular step:

image

Post Collect Telemetry Failed

Hey There,

The post action failed with following error. Any idea what might be causing this?

[Workflow Telemetry] Getting process tracer result from file /home/runner/actions-runner/_work/_actions/catchpoint/workflow-telemetry-action/v1/dist/proc-tracer/proc-trace.out ...
Error: [Workflow Telemetry] Unable to report process tracer result
Error: [Workflow Telemetry] Error
Error: Error: ENOENT: no such file or directory, open '/home/runner/actions-runner/_work/_actions/catchpoint/workflow-telemetry-action/v1/dist/proc-tracer/proc-trace.out'
[Workflow Telemetry] Reporting all content ...
Error: [Workflow Telemetry] Resource not accessible by integration

@jmhbnz FYI.

Deprecation warning for Node.js 12 actions

https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/

During a K6 load test on GitHub Actions, an error occurred that said "Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: runforesight/[email protected]." This error is due to the deprecation of Node.js 12 for GitHub Actions, as it has been out of support since April 2022. GitHub plans to migrate all actions to run on Node.js 16 by Summer 2023, and in the meantime, they have added a warning to workflows that contain actions running on Node.js 12.

To fix this error, you need to update the action runforesight/[email protected] to run on Node.js 16. If you are the maintainer of this action, you can update its configuration settings to run on Node.js 16. If you are a user of this action, you need to update your workflow to use the latest version of the action that runs on Node.js 16. You can do this by specifying the version of the action in your workflow using the syntax run: /@v.

It is important to take action on this issue as Node.js 12 is no longer supported and using it can result in security vulnerabilities and other issues.

Upgrade node versions in .github actions

As node 16 is now deprecated, people will get warnings in their action runs while using this module. It is a simple change, would be glad to make a PR for it.

Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: catchpoint/[email protected]. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

Just requires updating the following files https://github.com/catchpoint/workflow-telemetry-action/tree/master/.github/workflows

and using checkout@4 and setup-node@4, and changing the node version to 20.

Getting raw stats

Hello,

This is an amazing action that I just came across and plan to use for a project. I was wondering if it is possible to also get a csv or some other file format with the raw data points, either as a separate gist or added within the repository, however would be easier, rather than just the graphs in the summary? Is there any existing feature to accomplish something along those lines and/or is there any possible prospect of adding it?
(An ad-hoc way I thought about would be parsing the SVG file for the data points, but would be ideal to add a raw data file during the post action instead if possible)

Doesn't work on "ubuntu-latest" runner image

I wanted to use this action for debugging out of memory problems in my builds. I use runs-on: ubuntu-latest in my workflow, unfortunately, I receive an error:

[Workflow Telemetry] Initializing ...
[Workflow Telemetry] Starting step tracer ...
[Workflow Telemetry] Started step tracer
[Workflow Telemetry] Starting stat collector ...
[Workflow Telemetry] Started stat collector
[Workflow Telemetry] Starting process tracer ...
[Workflow Telemetry] Process tracing disabled because of unsupported OS: {"platform":"linux","distro":"Ubuntu","release":"[22](https://github.com/TemperWorks/Temper-Android/actions/runs/4566446845/jobs/8058962036#step:7:23).04.2 LTS","codename":"Jammy Jellyfish","kernel":"5.15.0-1034-azure","arch":"x64","hostname":"fv-az338-794","fqdn":"fv-az338-794.5cf0qfxbd2nuflq50ezo1x5smd.gx.internal.cloudapp.net","codepage":"UTF-8","logofile":"ubuntu","serial":"83bd4[27](https://github.com/TemperWorks/Temper-Android/actions/runs/4566446845/jobs/8058962036#step:7:28)37a0442acb30763087465ae8f","build":"","servicepack":"","uefi":false}
[Workflow Telemetry] Initialization completed

Could you, please, clarify on what images this action can be used?

remove escalation of system command exectution.

Hi Team:

src/processTrace.ts Line122.

There is a system command execution that is using escalated user permission. With assumption that procTracePID is launched by regular user, it should be handled by same user account without privilege's escalation. There is security concerns if code involve higher permission level.

Option to disable specific metric

Hey!

I'm wondering if it would make sense, to have an option, to disable specific metric.

In my example I'm mainly interested in memory. So I would disable IO and maybe CPU metric. Reason for this is, so that metric reports (comments) on PR are a bit smaller and easier to read.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.