Giter Site home page Giter Site logo

sensu-plugins / sensu-plugins-pagerduty Goto Github PK

View Code? Open in Web Editor NEW
20.0 11.0 32.0 105 KB

This plugin provides a Sensu handler for PagerDuty.

Home Page: http://sensu-plugins.io

License: MIT License

Ruby 100.00%
sensu-plugins sensu-handler sensu-mutator

sensu-plugins-pagerduty's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sensu-plugins-pagerduty's Issues

pager_team variable with array of teams in check-definition

I am specifying the following in my check-definition in sensu:

      "interval": 30,
      "handle": true,
      "occurrences": 5,
      "refresh": 500,
      "pager_team": [
        "devtools_sensu",
        "devcloud_support_sensu"
      ],
      "notification": "Github predix dev2 snmp used storage percent"
    }
  }
}

Is pager_team per check a 1:1 or can we specify multiple pager_teams per above? It seems to only be alerting the first team specified (devtools_sensu).

Thanks,
Gary

How to run rspec for this repo

Hi,

I've tried to run Rspec for this repo, but it keep running forever. The command I use is Rake spec

 G/s  pagerduty   master  rake spec
/Users/cduong/.rubies/ruby-2.1.5/bin/ruby -I/Users/cduong/.gem/ruby/2.1.5/gems/rspec-support-3.5.0/lib:/Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/exe/rspec test/bin/handler-pagerduty_spec.rb test/bin/mutator-pagerduty-priority-override_spec.rb

^C
RSpec is shutting down and will print the summary report... Interrupt again to force quit.
rake aborted!
Interrupt:
/Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:79:in system' /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:79:inrun_task'
/Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:96:in block (2 levels) in define' /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:94:inblock in define'
/Users/cduong/.gem/ruby/2.1.5/gems/rake-11.2.2/exe/rake:27:in <top (required)>' Tasks: TOP => spec (See full trace by running task with --trace) error reading event: Input/output error @ io_fread - <STDIN> /Users/cduong/.rubies/ruby-2.1.5/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:inrequire': cannot load such file -- pagerduty (LoadError)
from /Users/cduong/.rubies/ruby-2.1.5/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in require' from /Users/cduong/GitHub/sensu-plugins/pagerduty/bin/handler-pagerduty.rb:21:in<top (required)>'
from /Users/cduong/GitHub/sensu-plugins/pagerduty/test/bin/handler-pagerduty_spec.rb:3:in require_relative' from /Users/cduong/GitHub/sensu-plugins/pagerduty/test/bin/handler-pagerduty_spec.rb:3:in<top (required)>'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1435:in load' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1435:inblock in load_spec_files'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1433:in each' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1433:inload_spec_files'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:100:in setup' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:86:inrun'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:71:in run' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:45:ininvoke'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/exe/rspec:4:in `

'

Thanks for reading.

Release Cycle

Hey just wondering what the release cycle is like for the new sensu plug in repos? We want to maintain parity with the repo and released gems but will need the the sensu client override before we roll out sense in our org (to replace nagios)

(this is copied from merged PR but not sure if you are notified when comments on closed items)

How are client keepalives handed?

I see that you can set a team at the host level and per check, but is there any way to handle the client keepalives? Ideally, I would like to have the keepalive alerts sent to a separate team.

Incidents not resolving

As yet I can't replicate this 100% of the time. But often, I find that when I trigger an incident, when the condition that triggered said incident is resolved and the handler-pagerduty.rb says it resolved the incident, that in fact doesn't happen. I'm running sensu-plugins-pagerduty as installed by gem, version 0.0.9.

An example resolution issued here:

{"timestamp":"2016-02-27T18:23:33.733153+0000","level":"info","message":"handler output","handler":{"type":"pipe","command":"handler-pagerduty.rb","mutator":"pagerduty","name":"pagerduty"},"output":["/var/lib/gems/2.3.0/gems/sensu-plugin-1.2.0/lib/sensu-handler.rb:134:in `block in filter_silenced': Object#timeout is deprecated, use Timeout.timeout instead.\n/var/lib/gems/2.3.0/gems/sensu-plugin-1.2.0/lib/sensu-handler.rb:134:in `block in filter_silenced': Object#timeout is deprecated, use Timeout.timeout instead.\n/var/lib/gems/2.3.0/gems/sensu-plugin-1.2.0/lib/sensu-handler.rb:134:in `block in filter_silenced': Object#timeout is deprecated, use Timeout.timeout instead.\n/var/lib/gems/2.3.0/gems/sensu-plugins-pagerduty-0.0.9/bin/handler-pagerduty.rb:57:in `handle': Object#timeout is deprecated, use Timeout.timeout instead.\npagerduty -- Resolved incident -- routing3-valhalla_dev_us-east/routing3_cpu\n"]}

Left me with this issue still in pagerduty:

Details
Description
routing3-valhalla_dev_us-east/routing3_cpu : CheckCPU TOTAL WARNING: total=79.71 user=24.64 nice=0.0 system=4.93 idle=20.29 iowait=48.12 irq=0.0 s...
Details
timestamp     1456597053
action    create
occurrences   10
check     
total_state_change    5
history   ["0","0","0","0","0","0","0","0","0","0","0","1","1","1","1","1","1","1","1","1","1"]
status    1
output    CheckCPU TOTAL WARNING: total=79.71 user=24.64 nice=0.0 system=4.93 idle=20.29 iowait=48.12 irq=0.0 softirq=1.74 steal=0.29 guest=0.0
duration      2.069
executed      1456597050
issued    1456597050
name      routing3_cpu
refresh   3600
occurrences   10
interval      60
standalone    true
handlers      ["default"]
command   check-cpu.rb --sleep 2 -w 70 -c 90

client    
timestamp     1456597052
version   0.22.0
keepalive     
handle    true
refresh   1800
handlers      ["default"]
thresholds    
critical      600
warning   300


pd_override   
baseline      
warning   routing_low


pager_team    routing_low
address   192.168.2.94
name      routing3-valhalla_dev_us-east

id    024d99b8-431f-4d68-a09d-fa3918b07180

Any thoughts before I start delving into this more?

add retry support

There's times when the PD API just doesn't do what it's meant to, for whatever reasons. Perhaps your network is screwed, who knows?

Anyway, I think this handler should have the option to retry creating and resolving an incident X number of times. Perhaps we should put it in the JSON config?

Documentation update

It'd be helpful to know you can add pager_team to checks and not just at the client level. I had to dig into the code to get my pages to route properly on a per-metric basis. I also feel this is a much easier solution to implement than pd_override, especially when defining in something like puppet.

At first I had client level pd_override settings on a per metric basis (and also in dedup) and that didn't work properly. Adding pager_team to the check worked however.

enable bidirectional functionality

When working with Pagerduty and Sensu it would be cool to have sensu automatically pull status also from pager duty because you don’t want to look into different places to downtime and ACK and what not.
So we might want to create a downtime if we find that an alert has been ACK’ed in Pagerduty for example

Unused dependency

This gem depends on pagerduty and redphone gems. But only redphone is used. So please consider removing the dependency on pagerduty.

Missing config[:json_config]

@mattyjones where is config[:json_config] setup? When using sensu-plugins-pagerduty (0.0.1), I get the following error:

bin/handler-pagerduty.rb:34:in `handle': undefined method `[]' for nil:NilClass (NoMethodError)

See #1

Pagerduty handler is not honororing "occurrences" property of checks, and is additionally being triggered when checks are silenced.

We have the pagerduty handler set up in parallel with our email handler. We notice that on checks where "occurrences" : 5 is being set, email notifications are properly suppressed until the 5th occurrence, however a pagerduty incident is created immediately upon the first failed check.

Additionally, when silencing hosts within uchiwa in anticipation of a maintenance window, email handler properly suppresses notification during the maintenance window, however pagerduty handler begins spamming the operations team despite checks being silenced within the dashboard.

Proxy suuport

So it doesn't seem that there's any option available to allow this plugin to work behind a firewall. Unless I'm missing something, this is a feature that needs to be added.

I'm not a ruby programmer, but if you can point me in the right direction, I can probably figure out how to put together a pull request.

Thanks.

--Greg Chavez

Dependabot can't resolve your Ruby dependency files

Dependabot can't resolve your Ruby dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

Bundler::VersionConflict with message: Bundler could not find compatible versions for gem "bundler":
  In Gemfile:
    bundler (~> 1.7)

  Current Bundler version:
    bundler (2.2.15)

Your bundle requires a different version of Bundler than the one you're running.
Install the necessary version with `gem install bundler:1.17.3` and rerun bundler using `run.rb _1.17.3_`

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

View the update logs.

An issue that was a warning that goes to critical doesn't send the resolution state

I think this is semi by design, but any views appreciated.

We've come across this during testing. Using CPU use as an example;

If we set a warning threshold of 50%, an email gets sent. If it then goes to Critical at 70% the alert in PagerDuty is triggered. The problem for us is, if the CPU goes back to below the critical threshold, but still within the warning threshold, the resolution isn't sent PagerDuty as the incident is still live, even though not in a critical state.

Any thoughts or workarounds for this?

Thanks,
Dave.

Resolution sometimes does not trigger pagerduty

I've seen this occur a number of times where there are bulk (10 or more at a time) incidents triggered, when they are resolved in sensu they are not resolved at PD.

Could be a network issue, perhaps a bulk request problem... maybe even an API bug or limitation at PD... But i thought i'd ask here, if others have seen this happen on their own infrastructure.

I have not started debugging and have not re-created the issue. Just noticed it a number of times when we have a large outage, and intermittently at other times. v0.23.3

Please release new gem with contexts support!

hey there,

The new support for contexts is awesome. Thanks @zroger! I was wondering if you'd consider bumping the version and uploading a new gem with this version. As it is, it is not very easy to install a version of this plugin with support for contexts.

add proxy support

Hi,

We need a proxy support of this plugin. Unfortunately, It looks like we cannot implement such feature until jaxxstorm/redphone#9 pull request is merged to redphone library.

When did the severity filter get dropped?

Once upon a time, my configuration worked:

"pagerduty": { "command": "handler-pagerduty.rb", "filters": [ "occurrences" ], "severities": [ "critical" ], "type": "pipe" }

I just upgraded to sensu-plugins-pagerduty 3.0.1 and am seeing this in my Sensu master's log:

"message":"handler does not handle event severity","handler":{"command":"handler-pagerduty.rb"...

'severities' is a generic handler attribute so I'm wondering why I'm seeing this error in my logs? Assuming the handler no longer respects severities, how would be able to route only critical events to my pagerduty handler?

API Request issues, missing api.json??

setting up pagerduty plugin in my new sensu install, getting this error:

10/4/2016 3:19:54 PM
{
  "timestamp": "2016-10-04T22:19:54.234034+0000",
  "level": "info",
  "message": "handler output",
  "handler": {
    "type": "pipe",
    "command": "\/etc\/sensu\/handlers\/handler-pagerduty.rb",
    "name": "pagerduty"
  },
  "output": [
    "\/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:98:in `api_request': api.json settings not found. (RuntimeError)\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:145:in `stash_exists?'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:157:in `block (2 levels) in filter_silenced'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:91:in `block in timeout'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:33:in `block in catch'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:33:in `catch'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:33:in `catch'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:106:in `timeout'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:156:in `block in filter_silenced'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:154:in `each'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:154:in `filter_silenced'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:33:in `filter'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:56:in `block in <class:Handler>'\n"
  ]
}

the config i'm using:

{
  "handlers": {
    "pagerduty": {
      "type": "pipe",
      "command": "/etc/sensu/handlers/handler-pagerduty.rb"
    }
  },
  "pagerduty": {
    "api_key": "REDACTED"
  }
}

Method misused.

In bin/handler-pagerduty.rb, a method called resolve is called to resolve incident.

pagerduty.get_incident([incident_key_prefix, incident_key].compact.join('')).resolve(
            description: [description_prefix, event_summary].compact.join(' '), details: @event)

The resolve method is defined in lib pagerduty like this:

def resolve(description = nil, details = nil)
    modify_incident("resolve", description, details)
 end

Notice that resolve method is not defined with Ruby keyword parameter, but it is invoked with keyword parameter. Actually, the description and details are not treated as two parameters but treated as one Hash type parameter.

This issue will cause Pagerduty incidents would not be resolved when the length of description is greater than 1024. And Pagerduty does not return an error when description is passed as Hash type( will be convert to Json object when sending to Pagerduty as payload).

multiple teams

anyway to have a check alert multiple teams. i know i can setup 2 alerts but that seems over kill especially for some checks. i would rather be able to do

pager_team: "dbas,sysadmins"

or something similar.

Consider re-ordering setting overrides for client/check/json_config

Currently a client level pager_team overrides all check based pager_team setting.

However, it would make more sense (to me at least) to have check level settings override client level settings. This allows the client level setting to be the "default" for all checks, but allows various checks to be able to define teams. This comes in especially handy when we have applications doing checking and routing to their responsible teams where system level checks will typically go to our system engineering team.

Pagerduty alert text is not updated by handler

The alert text in Pagerduty does not change even when the output from the check has changed.
This means alerts can get suddenly worse, without me being aware of it.

Example:

Suppose I have a disk usage check which runs every ten minutes.

The disk is filling up so the check begins alerting (as seen in Uchiwa):
CheckDisk WARNING: / 71%

I get a Pagerduty alert with text:
CheckDisk WARNING: / 71%

Ten minutes later, the disk is filling up really fast so the alert changes (again as seen in Uchiwa):
CheckDisk CRITICAL: / 95%

But the alert in Pagerduty still says:
CheckDisk WARNING: / 71%

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.