Giter Site home page Giter Site logo

Comments (11)

adunkman avatar adunkman commented on August 20, 2024

Blocked by #387.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

Had a quick chat with @julialeague about this to help set my head straight on a path forward — thanks Julia!!

There are a lot of things we could be looking at, but without knowing usage patterns well (#137), we can’t know what unusual looks like. Therefore, I propose:

  • This issue tackles the obvious fires. These are things we know would mean an outage or a problem. They include:
    • The UI is unavailable. Monitored by alerting on uptime/ping testing for dawson.ustaxcourt.gov and app.dawson.ustaxcourt.gov (and the equivalent in other environments).
    • The system health endpoints return red. Monitored by alerting on the system health endpoints (as implemented in flexion#6281).
    • The Elasticsearch cluster has a non-green status. Monitored by alerting on the cluster health status.

At a future point, once we know what "normal" traffic looks like, we can consider monitoring things like:

  • Unexpected traffic volume.
  • Login or new signup rates abnormal.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024
  • Configured S3 buckets are not publicly available.
  • Elasticsearch cannot be accessed directly.

Speaking to these points, given the rate of change of these, I think they’d result in fragile tests. Considering that the "pass" state is a URL is inaccessible, I think we’d quickly be asserting things which were no longer of help (for example, that a nonsense URL was inaccessible).

Instead, I think we might want to consider introducing a security scanner for these. GitHub’s super-linter seems to be a good option, which uses tflint for identifying formatting/known linter problems and terrascan for identifying security risks.

I’ll file a new issue for these, for post-MVP.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

The UI is unavailable. Monitored by alerting on uptime/ping testing for dawson.ustaxcourt.gov and app.dawson.ustaxcourt.gov (and the equivalent in other environments).

I believe this can be accomplished with Route53 health checks which would trigger a CloudWatch alarm.

The system health endpoints return red. Monitored by alerting on the system health endpoints (as implemented in flexion#6281).

This can be achieved by a Route53 health check as well, hitting the health check endpoint on the public API. Unfortunately, we no longer have a set URL for the public API — it will be either https://public-api-green.dawson.ustaxcourt.gov/public-api/health or https://public-api-blue.dawson.ustaxcourt.gov/public-api/health. I’ll need to file an issue (closely related to flexion#6864) and have it fixed before I can fully implement this health check.

The Elasticsearch cluster has a non-green status. Monitored by alerting on the cluster health status.

Already handled by a CloudWatch metric, and we can add an alarm.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

CloudWatch Alarms looks like a natural clearing house for status, and it uses SNS for notifications.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

Speaking with @mmarcotte on direction here — we’ll use a simple SNS configuration for now and go with the Route53 approach.

We know we have a blindspot that if AWS has a catastrophic outage, we will not be notified (because the notification system may also be offline), and I’ll file an issue to consider using an external notification service like Opsgenie for post-MVP.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

Reported flexion#6903 to get a single API endpoint for the system health JSON.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

Speaking w/ Mike, he’d like the API endpoint covered by health alerts for the first pass as well. Updating the description! I misunderstood.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

Latest Elasticsearch alarms are in https://github.com/ustaxcourt/ef-cms/compare/add-es-alarms, blocked on running account-specific terraform steps, communication in Slack.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

flexion#6903 is completed by flexion#7177, awaiting PR to the Court.

from ef-cms.

adunkman avatar adunkman commented on August 20, 2024

Awaiting #608.

from ef-cms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.