Giter Site home page Giter Site logo

helix-ops's Introduction

Helix Operations

Tooling for automating operations of Project Helix services

Status

codecov CircleCI GitHub license GitHub issues LGTM Code Quality Grade: JavaScript semantic-release

Installation

$ npm install -D @adobe/helix-ops

Automated Monitoring

helix-ops provides the following command line tools intended to be run as part of your deployment pipeline to automate your monitoring:

Statuspage: Automated Update of Components

statuspage allows to automatically create components in Statuspage and return its automation email address.

Usage:

$ npx statuspage
statuspage <cmd>

Commands:
  statuspage setup  Create or reuse a Statuspage component

Options:
  --version  Show version number                                       [boolean]
  --help     Show help                                                 [boolean]
  --auth                                  Statuspage API Key (or env
                                          $STATUSPAGE_AUTH)  [string] [required]
  --page_id, --pageId                     Statuspage Page ID (or env
                                          $STATUSPAGE_PAGE_ID)
                                                             [string] [required]
  --name                                  The name(s) of the component(s)
                                                                         [array]
  --description                           The description of the component
                                                                        [string]
  --group                                 The name of an existing component
                                          group                         [string]
  --incubator                             Flag as incubator component
                                                      [boolean] [default: false]
  --incubator_page_id, --incubatorPageId  Statuspage Page ID for incubator
                                          components   [string] [default: false]
  --silent                                Reduce output to automation email only
                                                      [boolean] [default: false]

$ npx statuspage setup --group "Delivery"
Creating component @adobe/helix-example-service in group Delivery
Automation email: [email protected]
done.

Note: You can directly reuse the output of statuspage in your shell by adding the --silent parameter:

$ npx statuspage setup --group "Delivery" --silent
[email protected]

By default, the check will use the package name and description from your package.json, and leave group empty.

statuspage requires a Statuspage API Key that should be passed using either the --auth parameter or the STATUSPAGE_AUTH environment variable, as well as a Statuspage [Page ID] that should be passed using either the --page_id parameter or the STATUSPAGE_PAGE_ID environment variable.

New Relic: Automated Update of Synthetics Checks, Alert Policies and Notification Channels

newrelic automates the following New Relic features:

  1. creation or update of monitors in New Relics Synthetics
  2. creation of notification channels in New Relic Alerts
  3. creation or update of alert policies and conditions in New Relic Alerts
  4. wiring alert policies to notification channels and conditions to monitors

Usage:

$ npx newrelic
newrelic <cmd>

Commands:
  newrelic setup  Create or update a New Relic setup

Options:
  --version        Show version number                                 [boolean]
  --help           Show help                                           [boolean]
  --auth           Admin API Key (or env var $NEWRELIC_AUTH) [string] [required]
  --url            The URL(s) to check                        [array] [required]
  --email          The email address(es) to send alerts to               [array]
  --name           The name(s) of the monitor, channel and policy        [array]
  --group_policy   The name of a common policy to add the monitor(s) to [string]
  --group_targets  The 0-based indices of monitors to add to the group policy
                                                          [array] [default: [0]]
  --incubator      Flag as incubator setup                             [boolean]
  --locations      The location(s) to use                                [array]
  --frequency      The frequency to trigger the monitor in minutes      [number]
  --type           The type of monitor (api or browser)                 [string]
  --script         The path to a custom monitor script                  [string]

$ npx newrelic setup \
  --url https://adobeioruntime.net/api/v1/web/namespace/package/action@v1/_status_check/healthcheck.json \
  --email [email protected] --group_policy "Delivery"
Creating monitor @adobe/helix-example-service
Updating locations and frequency for monitor @adobe/helix-example-service
Updating script for monitor @adobe/helix-example-service
Creating notification channel @adobe/helix-example-service
Creating alert policy @adobe/helix-example-service
Linking notification channel to alert policy @adobe/helix-example-service
Creating condition in alert policy
Verifying group alert policy Delivery
Updating alert policy condition
done.

By default, the check will use the name from your package.json, but you can override it using the --name parameter.

newrelic requires a New Relic Admin's API Key (read the docs, it's different from your API key, even when you are an Admin) that should be passed using either the --auth parameter or the NEWRELIC_AUTH environment variable.

New Relic Synthetics Setup

Use with CircleCI

You can invoke the adobe/helix-post-deploy orb in your CircleCI config.yaml and use the monitoring command as a step in your job, with optional parameters. Note: you will still need to add @adobe/helix-ops as a dependency in your package.json.

helix-ops's People

Contributors

dependabot[bot] avatar dominique-pfister avatar greenkeeper[bot] avatar koraa avatar lgtm-com[bot] avatar renovate-bot avatar renovate[bot] avatar rofe avatar semantic-release-bot avatar stefan-guggisberg avatar trieloff avatar tripodsan avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helix-ops's Issues

use npx instead of installing package locally

using npm install to install the package locally, and then executing .bin/... seems a bit hacky. it would be better to use npx.

eg:

npx -yes --package=@adobe/helix-ops -- monitoringSetup '{ ....

ps: I tried to change this, but the way it is currently tested makes it quite hard ;-)

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet.
We recommend using:

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

[monitoring] Cannot specify type in PATCH method since type cannot be changed

Description
When attempting to update existing monitors, the following (non-fatal) error is thrown:

Unable to update locations for monitor: 400 - {"errors":[{"error":"Cannot specify type in PATCH method since type cannot be changed."}],"count":1}

This is due to #54 where monitor type was made configurable.

To Reproduce
Steps to reproduce the behavior:

  1. Check the latest CircleCI semantic-release build of any Helix service with helix-ops >= 1.8.0

Expected behavior
Updating existing monitors should not throw an error.

Version:
1.8.0 or higher

Monitor script triggers alerts too eagerly

There was an incident wave this morning, caused by the fact that we new trigger alerts if there's an error in the activation, or if the monitor fails to get the activation details for the initial request.

We should adjust the script to only throw if the previous request was unsuccessful, or if the activation details reveal a problem.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

circleci
.circleci/config.yml
  • codecov 4.1.0
github-actions
.github/workflows/codeql.yml
  • actions/checkout v4@692973e3d937129bcbf40652eb9f2f61becf3332
  • github/codeql-action v3
  • github/codeql-action v3
  • github/codeql-action v3
.github/workflows/semantic-release.yaml
  • actions/checkout v4@692973e3d937129bcbf40652eb9f2f61becf3332
  • actions/setup-node v4
.github/workflows/semver-check.yaml
npm
package.json
  • @adobe/fetch 4.1.8
  • diff 5.2.0
  • fs-extra 11.2.0
  • git-log-parser 1.2.1
  • shelljs 0.8.5
  • stream-to-array 2.3.0
  • yargs 17.7.2
  • @adobe/eslint-config-helix 2.0.6
  • @semantic-release/changelog 6.0.3
  • @semantic-release/git 10.0.1
  • c8 10.1.2
  • eslint 8.57.0
  • events 3.3.0
  • husky 9.1.1
  • jsdoc-to-markdown 8.0.2
  • junit-report-builder 3.2.1
  • lint-staged 15.2.7
  • mocha 10.7.0
  • mocha-multi-reporters 1.5.1
  • nock 13.5.4
  • semantic-release 24.0.0
  • sinon 18.0.0
  • yaml 2.4.5

  • Check this box to trigger a request for Renovate to run again on this repository

Monitoring: Incubator

Overview

Currently when a brand new service is deployed, it gets directly hooked up to one of our live PagerDuty alert policies, and reports against our public production SLA in Statuspage. It can then very well happen that monitoring fails because something isn't quite right yet, or the service is still under construction, which will result in a PagerDuty incident and tank our SLA.

Details

Obviously this setup has 2 major issues: something going wrong in a still new and most likely irrelevant service has the potential to send our on call engineer on a wild goose chase. We also shoot our own foot by artificially decreasing our Delivery (4 nines) or Publishing (3 nines) and thereby eating into our error budget. Manual cleanup is required to clean up the mess.

Proposed Actions

I think we should add an intermediate safety step between development/testing of a service and hooking it up to our "armed" production monitoring setup. I propose the following workflow:

  1. The default monitoring config has an incubator flag
  2. When a service gets deployed for the first time, it is added to a dedicated Incubator page in Statuspage. This page can be public or require authentication.
  3. In case of an error, New Relic only informs the Statuspage component and the #helix-escalations Slack channel for visibility, but won't trigger any PagerDuty incidents yet.
  4. Once confidence in the service is sufficient, the developer removes the incubator flag from the monitoring config
  5. This moves the service to the configured "armed" alert policy in New Relic and Statuspage component group. Any outages occurring during the incubator time will be erased.
  6. From now on, service failures rightfully trigger PagerDuty and affect our SLA.

The automated release is failing 🚨

🚨 The automated release from the master branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you could benefit from your bug fixes and new features.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can resolve this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here is some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


No npm token specified.

An npm token must be created and set in the NPM_TOKEN environment variable on your CI environment.

Please make sure to create an npm token and to set it in the NPM_TOKEN environment variable on your CI environment. The token must allow to publish to the registry https://registry.npmjs.org/.


Good luck with your project ✨

Your semantic-release bot 📦🚀

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

[AEM‐01] Potential Insufficient Validation in Hosts and URLs

Potentially insufficient checks related to host names and URLs were identified.

Impact Rationale:
The impact depends on how this data is used at a later point. Due to the time constraints, we could not confirm if this could lead to further exploitation.

Likelihood Rationale:
There are different potential scenarios where this could become exploitable, or at least lead to functional issues and unexpected behavior.

Mitigation:
Review the highlighted fragments and enforce stricter comparisons to ensure that only the expected resources, such as URLs or host names, are properly parsed and categorized.

[AEM‐01] Potential Insufficient Validation in Hosts and URLs

Description
Follow up to #383 from remediation.

The updated code checks that the host must ends with ‘.amazonaws.com’, as seen below:

const hlx3 = host.endsWith('.amazonaws.com') && pathname.includes('/helix3/');

Attackers could host arbitrary files on their S3 buckets for instance, with a folder named ‘helix3’. Doing so, they
would pass the updated check. For instance, consider the following attacker‐controlled bucket:

http://anvilsecure-attacker.s3-website-us-east-1.amazonaws.com/helix3/malicious_file

Therefore, the issue is considered partially fixed.

[newrelic.js] make email optional

It should be possible to create a monitoring setup without specifying an email for a notification channel. For example, if you just want to add a monitor to an existing alert policy, and you don't want to send alerts to Statuspage.

Use Major action versions for monitoring scripts

Overview

the current monitoring scripts test the minor versions of the deployed actions,

eg: [email protected]

Helix internally always uses the major version. so we are testing something that is not actually used.

The following problem can arise:

now, we release the next minor version:

  • v3 -> v3.3.0
  • v3.3 -> v3.3.0
  • v3.2 -> v3.2.1

the monitoring scripts get automatically updated during the release and now test @v3.3.

Assume, that the new version causes problems, and we need to immediately switch the v3 link back to v3.2.1. without updating the monitoring scripts, they will continue to test v3.3 (hence v3.3.0) and keep reporting errors.

Proposed Actions

Use the major version only for monitoring scripts.

Switch default monitoring to universal runtime

By default, all service monitors should be switched to the universal runtime URL. runtime should be added as an option like aws to add a secondary monitor specifically for that environment. Secondary monitors trigger a lower priority alert policy which informs Slack instead of PD.

@trieloff

Add ability to create scripted browser monitors

Is your feature request related to a problem? Please describe.
In New Relic, we are mainly using 2 types of monitors: scripted API and scripted browser. While the earlier can be created automatically using the newrelic automation tool, we can't automate the latter yet.

Describe the solution you'd like
Add argument --type allowing users to select api (default) or browser.

Add ability to use a custom monitor script

The current monitor script only works for Helix services using helix-status. There are cases where you would want to automate monitoring for different kinds of actions, and therefore provide a custom monitor script.

The newrelic tool should have an optional --script argument taking a path to the script file as value.

Skip request object when logging activation details in New Relic

Is your feature request related to a problem? Please describe.
The current monitor script logs the entire activation details to the New Relics script log. This includes the base64-encoded Authorization header from the request.

Describe the solution you'd like
We should avoid logging sensitive data. The authorization key being used is stored in New Relic as a secure credential.

Describe alternatives you've considered
We should strip the request object from the activation details before logging them, since it does not provide anything of value.

The automated release is failing 🚨

🚨 The automated release from the main branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you could benefit from your bug fixes and new features.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can resolve this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the main branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here is some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


Invalid npm token.

The npm token configured in the NPM_TOKEN environment variable must be a valid token allowing to publish to the registry https://registry.npmjs.org/.

If you are using Two Factor Authentication for your account, set its level to "Authorization only" in your account settings. semantic-release cannot publish with the default "
Authorization and writes" level.

Please make sure to set the NPM_TOKEN environment variable in your CI with the exact value of the npm token.


Good luck with your project ✨

Your semantic-release bot 📦🚀

[monitoring] Add ability to specify monitor locations and frequency

Is your feature request related to a problem? Please describe.
By default, 15 locations are configured in a New Relic Synthetics monitor created through the monitoring automation, and there is currently no way to customize them. Also, the frequency is hardcoded to 15 minutes.

Describe the solution you'd like

  • Add optional arguments --locations and --frequency to newrelic:
$ node newrelic setup <url> --auth <token> --frequency 5 --locations AW_US_WEST_1, AW_US_EAST_2
  • Add optional command parameters to monitoring command in the helix-post-deploy orb:
  helix-post-deploy/monitoring:
    newrelic_frequency: 5
    newrelic_locations: AW_US_WEST_1, AW_US_EAST_2

Additional context
Retrieve a list of valid locations:

curl -v -X GET -H 'X-Api-Key:<key>' https://synthetics.newrelic.com/synthetics/api/v1/locations

Workaround
Adjust locations manually in the Synthetics UI.

monitoring orb: Add support for AWS

in order to setup monitoring for AWS deployed actions, it would be nice if the monitoring orb supports it.

current example:

      - helix-post-deploy/monitoring:
          statuspage_name: Microsoft Word Adapter
          statuspage_group: Delivery
          newrelic_group_policy: Delivery Repeated Failure
      - helix-post-deploy/monitoring:
          newrelic_name: "@adobe/helix-word2md.aws"
          newrelic_url: https://$HLX_AWS_API.execute-api.$HLX_AWS_REGION.amazonaws.com/helix-services/word2md/v2/_status_check/healthcheck.json # todo: grab version automatically
          statuspage_name: Microsoft Word Adapter (AWS)
          statuspage_group: Delivery
          incubator: true
          newrelic_group_policy: Delivery Repeated Failure

would be nice to write only need to have 1 command with:

      - helix-post-deploy/monitoring:
          statuspage_name: Microsoft Word Adapter
          statuspage_group: Delivery
          newrelic_group_policy: Delivery Repeated Failure
          aws_api: $HLX_AWS_API
          aws_region: $HLX_AWS_REGION

Automatic orb releases

CircleCI orbs need to be released manually at the moment, using an org admin's token. It would be great to piggy-back on semantic-release and release a new patch, minor or major version of an orb based on the corresponding commit message.

monitoring script can report success even if action failed

With Helix pages, the initial request to a website can return status code 200 mocking success, even if the underlying action has failed with a 500+ status code. In this case, the script should throw instead of swallowing the error.

Monitoring Setup fails

Description
Latest changes to the shell script fail to properly pass strings containing spaces.

To Reproduce

  1. Error in CircleCI ('Monitoring Setup step insemantic-release` job):
spName="Foo Bar Test"
(...)
# statuspage automation
spEmail=`node ${toolPath}/statuspage setup --silent ${spName:+--name \"${spName}\"} ${spGroup:+--group \"${spGroup}\"}`
(...)
statuspage setup

Create or reuse a Statuspage component

Options:
  --version            Show version number                             [boolean]
  --help               Show help                                       [boolean]
  --auth               Statuspage API Key (or env $STATUSPAGE_AUTH)
                                                             [string] [required]
  --page_id, --pageId  Statuspage Page ID (or env $STATUSPAGE_PAGE_ID)
                                                             [string] [required]
  --name               The name of the component
                                      [string] [default: "@adobe/helix-word2md"]
  --description        The description of the component
     [string] [default: "Helix Service that renders word documents as markdown"]
  --group              The name of an existing component group          [string]
  --silent             Reduce output to automation email only
                                                      [boolean] [default: false]

Unknown arguments: Foo, Test"
Exited with code 1

Expected behavior
Monitoring Setup correctly uses the name from the parameter

Version:
Orb adobe/[email protected]

monitoring script throws runtime errors

Description
The monitoring script expects a certain structure of the activation details:

{ 
body: { 
  duration: <number>,
  annotations: [ ... ],
  ...
  }
  ...
}

This structure is not guaranteed. body might also be a simple string, which leads to the following error in the script log:

Error storing insights: TypeError: Cannot read property 'filter' of undefined
    at Request.$http.get [as _callback] (eval at JobResource.getScriptFn (/opt/runtimes/4.0.0/modules/synthetics-runner/lib/job-resource/index.js:78:19), <anonymous>:54:36)
    at Request.self.callback (/opt/runtimes/4.0.0/node_modules/request/request.js:185:22)
    at Request.emit (events.js:182:13)
    at Request.<anonymous> (/opt/runtimes/4.0.0/node_modules/request/request.js:1161:10)
    at Request.emit (events.js:182:13)
    at IncomingMessage.<anonymous> (/opt/runtimes/4.0.0/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (events.js:273:13)
    at IncomingMessage.emit (events.js:187:15)
    at endReadableNT (_stream_readable.js:1094:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)

Expected behavior
The monitor script should guard against such errors.

remove helix-ops dependencies in most projects to reduce dependency crawl.

Is your feature request related to a problem? Please describe.
AFAICS, helix-obs has 2 main parts, the orb needed in all circleci configs, and the cli needed to update the monitoring after a release.

whenever a dependency update is causing a new release of helix-obs, all our 47 repositories receive an update PR. most of them are approved and merged automatically, but 10-15 are not.

Describe the solution you'd like
I think if the helix-obs dev dependency in the projects is removed and the circleci script installs npm install helix-obs-cli on demand, we achieve the same.

@rofe WDYT?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.