Giter Site home page Giter Site logo

resultadosdigitais / hellper Goto Github PK

View Code? Open in Web Editor NEW
53.0 93.0 11.0 716 KB

Incident Manager Hellper bot aims to orchestrate the process and resolution of incidents, reducing the time spent with manual tasks and ensuring that the necessary steps are fulfilled in the right order. Also, it facilitates the measurement of impact and response rate through metrics.

License: MIT License

Dockerfile 0.18% Makefile 0.30% Go 99.48% Shell 0.03%
hacktoberfest incident slack

hellper's Introduction


Hellper - Your best friend in times of crisis
Hellper - Your best friend in times of crisis

Hellper bot aims to orchestrate the process and resolution of incidents, reducing the time spent with manual tasks and ensuring that the necessary steps are fulfilled in the right order. Also, it facilitates the measurement of impact and response rate through metrics.

A chance to help explore and develop a bot written in Go, integrated with multiple external platforms and tools.

Help us expand incident processes’ and understand the needs of other companies that may benefit from Hellper bot.

You’re just one PR away from joining the developing team of Hellper! Contribute

CircleCI Dependabot Status PRs welcome! License


Contents

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

  1. Docker Compose
  2. Slack Account
  3. G Suite Account

Installing

  1. Clone this repo
git clone [email protected]:ResultadosDigitais/hellper.git
  1. Configure Slack
  2. Configure Google
  3. Make a copy from configuration example
cp development.env.example development.env

Variables explanation

Variable Explanation Default value
HELLPER_BIND_ADDRESS Hellper local bind address :8080
HELLPER_DATABASE Database provider (supported values: postgres) postgres
HELLPER_DSN Your Data Source Name ---
HELLPER_ENVIRONMENT Current environment (supported values: production, staging) ---
HELLPER_GOOGLE_CREDENTIALS Google Credentials ---
HELLPER_GOOGLE_DRIVE_TOKEN Google Drive Token
HELLPER_GOOGLE_DRIVE_FILE_ID Google Drive FileId to your post-mortem template ---
HELLPER_GOOGLE_CALENDAR_TOKEN Google Calendar Token
HELLPER_GOOGLE_CALENDAR_ID Google Calendar Id to schedule your post-mortem
HELLPER_POSTMORTEM_GAP_DAYS Gap in days between resolve and postmortem event, by dafault the gap is 5 days if there is no variable 5
HELLPER_MATRIX_HOST Matrix URL host ---
HELLPER_PRODUCT_CHANNEL_ID The Product channel id used to notify new incidents ---
HELLPER_NOTIFY_ON_RESOLVE Notify the Product channel when resolve the incident true
HELLPER_NOTIFY_ON_CLOSE Notify the Product channel when close the incident true
HELLPER_NOTIFY_ON_CANCEL Notify the Product channel when cancel the incident true
HELLPER_SUPPORT_TEAM Support team identifier to notify ---
HELLPER_PRODUCT_LIST List of all products splitted by semicolon Product A;Product B;Product C;Product D
HELLPER_REMINDER_OPEN_STATUS_SECONDS Contains the time for the stat reminder to be triggered in open incidents, by default the time is 2 hours if there is no variable 7200
HELLPER_REMINDER_RESOLVED_STATUS_SECONDS Contains the time for the stat reminder to be triggered in resolved incidents, by default the time is 24 hours if there is no variable 86400
HELLPER_REMINDER_OPEN_NOTIFY_MSG Notify message when status is open Incident Status: Open - Update the status of this incident, just pin a message with status on the channel.
HELLPER_REMINDER_RESOLVED_NOTIFY_MSG Notify message when status is resolved Incident Status: Resolved - Update the status of this incident, just pin a message with status on the channel.
HELLPER_OAUTH_TOKEN Slack token to exeucte bot user actions ---
HELLPER_SLACK_SIGNING_SECRET Slack token to verify external requests ---
FILE_STORAGE Hellper file storage for postmortem document google_drive
TIMEZONE Timezone for Post Mortem Meeting America/Sao_Paulo
HELLPER_SLA_HOURS_TO_CLOSE Number of hours between the incident resolution and Hellper reminder to close the incident. 168

Running the Tests

  1. make test

Running the application

  1. make run

Deployment

Deploy

Setup database

  • Run this command and copy the address:

heroku config:get DATABASE_URL

  • Run this command and past it on the YOUR_DATABASE_URL:

heroku config:set HELLPER_DSN=YOUR_DATABASE_URL

  • Import the scheema changing YOUR_HEROKU_APP_NAME by your application name:

heroku pg:psql --app YOUR_HEROKU_APP_NAME < internal/model/sql/postgres/schema/hellper.sql

Optional Setup

Ngrok (To receive events from Slack)

Golang

OR

  1. Install gvm
  2. Follow gvm post install instructions
  3. Install go 1.14 as default

Database

psql $HELLPER_DSN -f "./internal/model/sql/postgres/schema/hellper.sql"

How to use

Commands

After Configuring Slack you can use the commands created. The commands are as it follows:

Command Short Description
/hellper_incident Starts Incident
/hellper_status Show all pinned messages
/hellper_close Closes Incident
/hellper_resolve Resolves Incident
/hellper_cancel Cancels Incident
/hellper_pause_notify Pauses incident notification
/hellper_update_dates Updates the dates for an incident

The first command /hellper_incident can be use at any channel and/or conversation on Slack. It will open a pop-up for the user to set and start an Incident, creating the channel, meeting room link and post-mortem doc.

The remaining commands must be used only on the Incident's channel since they act on the specific incident that is open.

Metrics

This metrics came from metrics view table, they are calculated by the following formulas:

Metric Description Formula
start_ts Date and time when the incident is started Date and time in UTC from db
identification_ts Date and time when the incident is identified Date and time in UTC from db
end_ts Date and time when the incident is resolved Date and time in UTC from db
acknowledgetime Time To Acknowledge identification_ts - start_ts
solutiontime Time To Solution end_ts - identification_ts
downtime Time in an incident end_ts - start_ts
MTTA Mean Time To Acknowledge total acknowledgetime / total incidents
MTTS Mean Time To Solution total solutiontime / total incidents
MTTR Mean Time To Recovery total downtime / total incidents

Alerts

Alerts are useful for notifying the status of incidents. Notifications can be used in different situations, such as requesting an update of the incident status or finalizing the post-mortem. For this you can use the CLI notify on your CronJob service.

To use the CLI you need to build the binary file:

go build -o notify cmd/notify/main.go

Example of use

SHELL=/bin/bash
BASH_ENV=/app/.env

# At 16:30 on every week-day, from Monday through Friday, it sends a report to a selected channel with all incidents not closed
30 16 * * 1-5 root /app/notify --type=report --to=YOUR_SLACK_CHANNEL_ID --status=all

#  Every 30th minute it sends a status update request alert for all open incidents
0/30 * * * * root /app/notify --type=channels --status=open

# At 13:30 on every week-day, from Monday through Friday, sends a post-mortem request alert for all resolved incidents
30 13 * * 1-5 root /app/notify --type=channels --status=resolved

Contributing

Thanks for being interested in contributing! We’re so glad you want to help! Please take a little bit of your time and look at our contributing guidelines. All type of contributions are welcome, such as bug fixes, issues or feature requests.

Code of Conduct

Everyone interacting in the Hellper project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

Need help?

If you need help with Hellper, feel free to open an issue with a description of the problem you're facing.

License

The Hellper is available as open source under the terms of the MIT License.

hellper's People

Contributors

angeliski avatar dependabot-preview[bot] avatar dependabot[bot] avatar fvbaltar avatar glofonseca avatar heylizzie avatar leonardo-denardi avatar lucasfalm avatar paulassis avatar rd-systems avatar soniaismad avatar tiagoemsi avatar tiagotor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hellper's Issues

Last Pin Error

When an incident channel has no pinned messages, the reminder searches for the last pinned message and not find it, it logs a Last Pin error and does not ask for the incident status.

jsonPayload: {
  app_name: "hellper"   
  channelID: "C015Q8L61A8"   
  error: "Lista está vazia"   
  level: "error"   
  log_type: "stdout"   
  message: "command/reminder.requestStatus LastPin error"   
  pod_name: "hellper-7f4cc474d7-89pvh"   
  raw: "{"level":"error","time":"2020-06-24T14:34:03.990Z","message":"command/reminder.requestStatus LastPin error","channelID":"C015Q8L61A8","error":"Lista está vazia","stack":"hellper/internal/log/zap.(*zapLoggerDelegate).Check\n\t/app/internal/log/zap/zap.go:32\nhellper/internal/log/zap.logger.log\n\t/app/internal/log/zap/zap.go:79\nhellper/internal/log/zap.logger.Error\n\t/app/internal/log/zap/zap.go:98\nhellper/internal/commands.requestStatus.func1\n\t/app/internal/commands/reminder.go:66\nhellper/internal/job.NewTrackable.func1\n\t/app/internal/job/job.go:26"}

Cleanup

What should be cleaned up or changed:
We should not break the execution of the reminder with this error, we should keep asking for a status message.

Provide any links for context:

Google Docs configs documentation

Documentation Feedback

We have the need to let the Google Docs documentation better documented, as anyone could do it from zero.

Affected documentation page: docs/CONFIGURING-GOOGLE.md

Slack commands documentation

Documentation Feedback

We need to document the use of our Slacks commands, and also improve the configuration documentation. Maybe draw the entire flow and showing where each command fits itself.

Affected documentation page: docs/CONFIGURING-SLACK.md

Fix the incident impact on create

Subject of the issue

When an incident is opened, the impact is filled with a 0 in the database.

Steps to reproduce

  1. Open a new incident
  2. Check the incident impact in the Postgres

Expected behaviour

The impact should not be filled with anything, so the impact in the database, after created, should be null

Actual behaviour

The incident impact is 0 when the incident is created in the Postgres.

Possible Solution

Hellper does not ask for a Incident Title in the opening form

Subject of the issue

When an incident is opened, we insert the channel name in the title column in the Postgres.

This is wrong, there is the Incident Title and the Incident Channel Name.

We need the Incident Title in the database when an incident is closed. So we can create the PostMortem doc with the proper title.

Steps to reproduce

  1. Open an incident with /hellper_incident
  2. Check the postmortem document
  3. Check the incident on Postgres

Expected behaviour

  • The opening form should ask for an Incident Title, not only a Channel Name.
  • The title column should be filled with this Title
  • The title in the postmortem document should have this Title, not the Channel Name.

Actual behaviour

We use the Channel Name from the opening form for everything, the title column, the `channel_name column and the postmortem title.

Possible Solution

Create a new field on the opening form called Title, and use this value to save in the title column. And after that use this column to fill the postmortem title.

Same postmortem for two incidents

Now we don't have a process when the same incident occurs again or when two incidents have the same root cause.

The main question is, we do the same postmortem? How to control and metrify this using the Hellper?

The bot owner is the user who is opening the channel

Subject of the issue

The owner of the bot, the one who reinstalled it, is the one that is opening the incident channel in Slack.

Steps to reproduce

Using the command /hellper_incident in any incident.

Expected behaviour

We have two options, that should be discussed:

  • The bot user opens the channel
    OR
  • The user that called the close command opens the channel

Actual behaviour

The owner of the bot is opening the channel.

Possible Solution

We need to revise the scopes of our Oauth in the Slack App, one of them should permit the bot to open a channel as itself or as the user that called him.

The bot owner is the user who is archiving the channel

Subject of the issue

The owner of the bot, the one who reinstalled it, is the one that is archiving the incident channel in Slack.

image

Steps to reproduce

Using the command /hellper_close in any incident.

Expected behaviour

We have two options, that should be discussed:

  • The bot user archives the channel
    OR
  • The user that called the close command archives the channel

Actual behaviour

The owner of the bot is archiving the channel.

Possible Solution

We need to revise the scopes of our Oauth in the Slack App, one of them should permit the bot to archive a channel as itself, or as the user that called him.

The usability of the Update Dates Dialog is not clear

Cleanup

What should be cleaned up or changed:
The change of the timezone in the update date dialog is not clear for the user.

The first field (timezone), do not dynamically update the other fields

image

Provide any links for context:

Only the owner of the bot is able to receive the return of private messages

Subject of the issue

We identified that when trying to send the command privately to the bot with a user other than the owner of the bot, the same can not reply on the channel saying that it does not exist.

This problem occurs because of the oauth token used in the operation. A bot in the slack has 2 types of oauth token and all are needed for the correct operation of the bot. We need to identify the correct mechanism for using these tokens.

Remove JoinConversation call

When an incident is opened we call the slack-go.JoinConversation, but this is no longer necessary.

Cleanup

What should be cleaned up or changed:
Remove JoinConversation call on open_incident command (internal/commands/open.go)

Provide any links for context:
image

Change the root cause field to a list of options

Feature Description

The Root Cause field is currently in the open text format, so is not possible to extract assertive metrics. Changing to a list of root causes options, that could be done.

Problem

Today we cannot extract assertive metrics related to the incident's Root Cause.

Expected behaviour

As Root Cause is an important characteristic of an incident, we want to extract metrics of them, to better focus our efforts to reduce the impact of those root causes.

Alternatives

Alternatives are welcome.

Migrate to Slack Conversation API

Cleanup

CRITICAL ISSUE: This issue blocks any new Hellper app on Slack from working!

What should be cleaned up or changed:

Some methods and permissions there we use directly and through the package slack-go are deprecated since January 7.

On June 10 new apps will not be allowed to use them, so new Hellper applications won't work.

On November 25 these methods will stop working.

Update: The new due date for these methods stops working is February 24th, 2021

Provide any links for context:
https://api.slack.com/changelog/2020-01-deprecating-antecedents-to-the-conversations-api?utm_medium=email&utm_source=newsletter&utm_campaign=fy21-Q205-changelog

/hellper_status mention the user group when the pinned message has a mention of this kind

Subject of the issue

/hellper_status mention the user group when the pinned message has a mention of this kind

Steps to reproduce

  1. Send a message mentioning a user group on it
  2. Pin this message
  3. Call the Hellper command /hellper_status

Expected behavior

The Hellper should print the message, but without the mention.

Actual behavior

The Hellper mentions the group again in the channel.

Possible Solution

Opening communication not showing the incident product

Subject of the issue

The product selected in the opening form is not being shown in the incident opening communication.

Steps to reproduce

  1. Open an incident with the command /hellper_incident
  2. Check opening communication on the created channel

Expected behaviour

We should inform on the opening communication witch product is being affected by the incident:
image

Actual behaviour

Nothing is being shown under Product:
image

Inserir Incident ID em todas as comunicações

Cleanup

What should be cleaned up or changed:
The communication, both in the incident's channel and in the HELLPER_PRODUCT_CHANNEL_ID channel, don't have the Incident ID (unless the opening communication)

Incident opening communication:
image

Incident resolution communication:
image

Incident closing communication:
image

Provide any links for context:

We don't have the Incident title in the communications

Subject of the issue

We have the field Incident Title, but don't have it in any communication at the channels.

Steps to reproduce

Create a New Incident;
Put a Incident Title;

Expected behaviour

when the incident is open, the incident tite should be in the incident channel communication

Actual behaviour

We have this field saved in the database, but never used

Possible Solution

Put Incident title in every communication

Open a silent incident

Feature Description

Add an option in the modal of the /hellper_incident command so that an incident is opened silently.

A silent incident is characterized by having a private channel and not notifying the main channel of incident when it opens.

Problem

There are some incidents that the entire organization should not be to know, for example, security incidents. That said, in such cases it is necessary to carry out the entire process of opening the incident manually, including its insertion in the databases.

[WIP] When a deleted user sended a pinned message /hellper_status does not work

Subject of the issue

Describe your issue here.

Steps to reproduce

Tell us how to reproduce this issue. Please provide a working and simplified example.

Expected behaviour

What should happen?

Actual behaviour

What happens instead?

Possible Solution

What are the alternative solutions? Please describe what else you have considered?

Create automated tests for bot commands

Cleanup

What should be cleaned up or changed:
We need to write tests for the Hellper commands. Available commands:

 close     Close the incident
 help      Show this help
 list      List all available commands
 list-all  List all active incidents
 ping      Test bot connectivity
 start     Create a new incident
 state     Get incident state and timeline

Provide any links for context:

Hellper Language

Cleanup

What should be cleaned up or changed:
Hellper doesn't have consistency in language. The majority of communication is in English, but somethings reference "pt-br", like the env:
image

The main language for the application must be English.

Adicionar pessoas essenciais em incidentes com alta severidade

Feature Description

When an incident with the severity of 1 or 0 occurs, we need to add some directors in the channel manually.
We could do this automatically through Hellper.

Problem

Add manually some directors in some incidents.

Expected behaviour

Every people that should always be in the incident, whatever is the type of incident, could be included in the channel automatically.

Alternatives

Alternatives are welcome!

The timezone of the updated dates are hard coded

Cleanup

What should be cleaned up or changed:
When we update the incident's date with a timezone other than the UTC, we are using the hardcoded timezone with the name Custom/Location.

image

Provide any links for context:

Create more alerts for Slack

Feature Description

The Hellper Bot could have more alerts, to ensure the date's filling, incident's closening, postmortem, etc... If needed, it could even send a @ channel.

Problem

Some incidents were not closed properly, had missing dates values or had the postmortem done many days after the incident occurred.

Expected behaviour

We need a way to remind the incident commander to fill in the incident's dates, close the incident and also do the postmortem as soon as possible.

Upgrade Slack Dialog to Slack Modal

Cleanup

What should be cleaned up or changed:
The mode that we use to get user info (Slack Dialog) is currently outmoded.
image
We should upgrade to Slack Modal.

Provide any links for context: Slack Modal

/hellper_status sends a @here when a pinned message has it

Subject of the issue

As our timeline is the pinned messages in the incident channel, when a pinned message has a @ here or @ channel in it, when the Hellper sends the timeline, he also sends this slack command.

image

Steps to reproduce

  1. Send a message with a @ here on it
  2. Pin this message
  3. Call the Hellper command /hellper_status

Expected behaviour

The Hellper should print the message, but without the @ here, our scaping this command.

Actual behaviour

The Hellper sends a @ here in the channel.

Possible Solution

O que falta pro Open Source?

O Hellper sem dúvidas é uma ferramenta fantástica e nos ajuda (e muito) no pior dos momentos: incidentes.

A ideia aqui é discutir o que falta (Features/ Acomplamento com a RD/Documentação/ Outros) para disponibilizamos essa ferramenta como open source.

Um bom exemplo de uma ferramenta é o Dispatch da Netflix.
O resultado dessa issue é um Roadmap pra a nossa V1 Open Source

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.