sensu-plugins / sensu-plugins-pagerduty Goto Github PK
View Code? Open in Web Editor NEWThis plugin provides a Sensu handler for PagerDuty.
Home Page: http://sensu-plugins.io
License: MIT License
This plugin provides a Sensu handler for PagerDuty.
Home Page: http://sensu-plugins.io
License: MIT License
I am specifying the following in my check-definition in sensu:
"interval": 30,
"handle": true,
"occurrences": 5,
"refresh": 500,
"pager_team": [
"devtools_sensu",
"devcloud_support_sensu"
],
"notification": "Github predix dev2 snmp used storage percent"
}
}
}
Is pager_team per check a 1:1 or can we specify multiple pager_teams per above? It seems to only be alerting the first team specified (devtools_sensu).
Thanks,
Gary
Hi,
I've tried to run Rspec for this repo, but it keep running forever. The command I use is Rake spec
G/s pagerduty master rake spec
/Users/cduong/.rubies/ruby-2.1.5/bin/ruby -I/Users/cduong/.gem/ruby/2.1.5/gems/rspec-support-3.5.0/lib:/Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/exe/rspec test/bin/handler-pagerduty_spec.rb test/bin/mutator-pagerduty-priority-override_spec.rb^C
'
RSpec is shutting down and will print the summary report... Interrupt again to force quit.
rake aborted!
Interrupt:
/Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:79:insystem' /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:79:in
run_task'
/Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:96:inblock (2 levels) in define' /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/rake_task.rb:94:in
block in define'
/Users/cduong/.gem/ruby/2.1.5/gems/rake-11.2.2/exe/rake:27:in<top (required)>' Tasks: TOP => spec (See full trace by running task with --trace) error reading event: Input/output error @ io_fread - <STDIN> /Users/cduong/.rubies/ruby-2.1.5/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in
require': cannot load such file -- pagerduty (LoadError)
from /Users/cduong/.rubies/ruby-2.1.5/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:inrequire' from /Users/cduong/GitHub/sensu-plugins/pagerduty/bin/handler-pagerduty.rb:21:in
<top (required)>'
from /Users/cduong/GitHub/sensu-plugins/pagerduty/test/bin/handler-pagerduty_spec.rb:3:inrequire_relative' from /Users/cduong/GitHub/sensu-plugins/pagerduty/test/bin/handler-pagerduty_spec.rb:3:in
<top (required)>'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1435:inload' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1435:in
block in load_spec_files'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1433:ineach' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/configuration.rb:1433:in
load_spec_files'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:100:insetup' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:86:in
run'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:71:inrun' from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/lib/rspec/core/runner.rb:45:in
invoke'
from /Users/cduong/.gem/ruby/2.1.5/gems/rspec-core-3.5.1/exe/rspec:4:in `
Thanks for reading.
It looks to me like redphone won't be supported by @portertech in the near future
https://twitter.com/portertech/status/654188507921956864
I think it's time we moved this plugin to use another library that's actively maintained and is specifically used for pagerduty.
This looks like a good candidate
Hey just wondering what the release cycle is like for the new sensu plug in repos? We want to maintain parity with the repo and released gems but will need the the sensu client override before we roll out sense in our org (to replace nagios)
(this is copied from merged PR but not sure if you are notified when comments on closed items)
Does this support the Pagerduty v2 API ?
I see that you can set a team at the host level and per check, but is there any way to handle the client keepalives? Ideally, I would like to have the keepalive alerts sent to a separate team.
As yet I can't replicate this 100% of the time. But often, I find that when I trigger an incident, when the condition that triggered said incident is resolved and the handler-pagerduty.rb says it resolved the incident, that in fact doesn't happen. I'm running sensu-plugins-pagerduty as installed by gem, version 0.0.9.
An example resolution issued here:
{"timestamp":"2016-02-27T18:23:33.733153+0000","level":"info","message":"handler output","handler":{"type":"pipe","command":"handler-pagerduty.rb","mutator":"pagerduty","name":"pagerduty"},"output":["/var/lib/gems/2.3.0/gems/sensu-plugin-1.2.0/lib/sensu-handler.rb:134:in `block in filter_silenced': Object#timeout is deprecated, use Timeout.timeout instead.\n/var/lib/gems/2.3.0/gems/sensu-plugin-1.2.0/lib/sensu-handler.rb:134:in `block in filter_silenced': Object#timeout is deprecated, use Timeout.timeout instead.\n/var/lib/gems/2.3.0/gems/sensu-plugin-1.2.0/lib/sensu-handler.rb:134:in `block in filter_silenced': Object#timeout is deprecated, use Timeout.timeout instead.\n/var/lib/gems/2.3.0/gems/sensu-plugins-pagerduty-0.0.9/bin/handler-pagerduty.rb:57:in `handle': Object#timeout is deprecated, use Timeout.timeout instead.\npagerduty -- Resolved incident -- routing3-valhalla_dev_us-east/routing3_cpu\n"]}
Left me with this issue still in pagerduty:
Details
Description
routing3-valhalla_dev_us-east/routing3_cpu : CheckCPU TOTAL WARNING: total=79.71 user=24.64 nice=0.0 system=4.93 idle=20.29 iowait=48.12 irq=0.0 s...
Details
timestamp 1456597053
action create
occurrences 10
check
total_state_change 5
history ["0","0","0","0","0","0","0","0","0","0","0","1","1","1","1","1","1","1","1","1","1"]
status 1
output CheckCPU TOTAL WARNING: total=79.71 user=24.64 nice=0.0 system=4.93 idle=20.29 iowait=48.12 irq=0.0 softirq=1.74 steal=0.29 guest=0.0
duration 2.069
executed 1456597050
issued 1456597050
name routing3_cpu
refresh 3600
occurrences 10
interval 60
standalone true
handlers ["default"]
command check-cpu.rb --sleep 2 -w 70 -c 90
client
timestamp 1456597052
version 0.22.0
keepalive
handle true
refresh 1800
handlers ["default"]
thresholds
critical 600
warning 300
pd_override
baseline
warning routing_low
pager_team routing_low
address 192.168.2.94
name routing3-valhalla_dev_us-east
id 024d99b8-431f-4d68-a09d-fa3918b07180
Any thoughts before I start delving into this more?
There's times when the PD API just doesn't do what it's meant to, for whatever reasons. Perhaps your network is screwed, who knows?
Anyway, I think this handler should have the option to retry creating and resolving an incident X number of times. Perhaps we should put it in the JSON config?
It'd be helpful to know you can add pager_team to checks and not just at the client level. I had to dig into the code to get my pages to route properly on a per-metric basis. I also feel this is a much easier solution to implement than pd_override, especially when defining in something like puppet.
At first I had client level pd_override settings on a per metric basis (and also in dedup) and that didn't work properly. Adding pager_team to the check worked however.
When working with Pagerduty and Sensu it would be cool to have sensu automatically pull status also from pager duty because you don’t want to look into different places to downtime and ACK and what not.
So we might want to create a downtime if we find that an alert has been ACK’ed in Pagerduty for example
This gem depends on pagerduty
and redphone
gems. But only redphone
is used. So please consider removing the dependency on pagerduty
.
@mattyjones where is config[:json_config]
setup? When using sensu-plugins-pagerduty (0.0.1), I get the following error:
bin/handler-pagerduty.rb:34:in `handle': undefined method `[]' for nil:NilClass (NoMethodError)
See #1
We have the pagerduty handler set up in parallel with our email handler. We notice that on checks where "occurrences" : 5
is being set, email notifications are properly suppressed until the 5th occurrence, however a pagerduty incident is created immediately upon the first failed check.
Additionally, when silencing hosts within uchiwa in anticipation of a maintenance window, email handler properly suppresses notification during the maintenance window, however pagerduty handler begins spamming the operations team despite checks being silenced within the dashboard.
So it doesn't seem that there's any option available to allow this plugin to work behind a firewall. Unless I'm missing something, this is a feature that needs to be added.
I'm not a ruby programmer, but if you can point me in the right direction, I can probably figure out how to put together a pull request.
Thanks.
--Greg Chavez
Dependabot can't resolve your Ruby dependency files.
As a result, Dependabot couldn't update your dependencies.
The error Dependabot encountered was:
Bundler::VersionConflict with message: Bundler could not find compatible versions for gem "bundler":
In Gemfile:
bundler (~> 1.7)
Current Bundler version:
bundler (2.2.15)
Your bundle requires a different version of Bundler than the one you're running.
Install the necessary version with `gem install bundler:1.17.3` and rerun bundler using `run.rb _1.17.3_`
If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.
I think this is semi by design, but any views appreciated.
We've come across this during testing. Using CPU use as an example;
If we set a warning threshold of 50%, an email gets sent. If it then goes to Critical at 70% the alert in PagerDuty is triggered. The problem for us is, if the CPU goes back to below the critical threshold, but still within the warning threshold, the resolution isn't sent PagerDuty as the incident is still live, even though not in a critical state.
Any thoughts or workarounds for this?
Thanks,
Dave.
I've seen this occur a number of times where there are bulk (10 or more at a time) incidents triggered, when they are resolved in sensu they are not resolved at PD.
Could be a network issue, perhaps a bulk request problem... maybe even an API bug or limitation at PD... But i thought i'd ask here, if others have seen this happen on their own infrastructure.
I have not started debugging and have not re-created the issue. Just noticed it a number of times when we have a large outage, and intermittently at other times. v0.23.3
hey there,
The new support for contexts is awesome. Thanks @zroger! I was wondering if you'd consider bumping the version and uploading a new gem with this version. As it is, it is not very easy to install a version of this plugin with support for contexts.
With tests covering both scripts (pending #15) I think this repo will be in stable enough condition to update the meta data to be production ready.
Hi,
We need a proxy support of this plugin. Unfortunately, It looks like we cannot implement such feature until jaxxstorm/redphone#9 pull request is merged to redphone library.
Sensu can generate events with action flapping
, and the handler should be able to create incident when events are flapping.
Once upon a time, my configuration worked:
"pagerduty": {
"command": "handler-pagerduty.rb",
"filters": [
"occurrences"
],
"severities": [
"critical"
],
"type": "pipe"
}
I just upgraded to sensu-plugins-pagerduty 3.0.1 and am seeing this in my Sensu master's log:
"message":"handler does not handle event severity","handler":{"command":"handler-pagerduty.rb"...
'severities' is a generic handler attribute so I'm wondering why I'm seeing this error in my logs? Assuming the handler no longer respects severities, how would be able to route only critical events to my pagerduty handler?
setting up pagerduty plugin in my new sensu install, getting this error:
10/4/2016 3:19:54 PM
{
"timestamp": "2016-10-04T22:19:54.234034+0000",
"level": "info",
"message": "handler output",
"handler": {
"type": "pipe",
"command": "\/etc\/sensu\/handlers\/handler-pagerduty.rb",
"name": "pagerduty"
},
"output": [
"\/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:98:in `api_request': api.json settings not found. (RuntimeError)\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:145:in `stash_exists?'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:157:in `block (2 levels) in filter_silenced'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:91:in `block in timeout'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:33:in `block in catch'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:33:in `catch'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:33:in `catch'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/2.3.0\/timeout.rb:106:in `timeout'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:156:in `block in filter_silenced'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:154:in `each'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:154:in `filter_silenced'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:33:in `filter'\n\tfrom \/opt\/sensu\/embedded\/lib\/ruby\/gems\/2.3.0\/gems\/sensu-plugin-1.3.0\/lib\/sensu-handler.rb:56:in `block in <class:Handler>'\n"
]
}
the config i'm using:
{
"handlers": {
"pagerduty": {
"type": "pipe",
"command": "/etc/sensu/handlers/handler-pagerduty.rb"
}
},
"pagerduty": {
"api_key": "REDACTED"
}
}
In bin/handler-pagerduty.rb, a method called resolve is called to resolve incident.
pagerduty.get_incident([incident_key_prefix, incident_key].compact.join('')).resolve(
description: [description_prefix, event_summary].compact.join(' '), details: @event)
The resolve method is defined in lib pagerduty like this:
def resolve(description = nil, details = nil)
modify_incident("resolve", description, details)
end
Notice that resolve method is not defined with Ruby keyword parameter, but it is invoked with keyword parameter. Actually, the description and details are not treated as two parameters but treated as one Hash type parameter.
This issue will cause Pagerduty incidents would not be resolved when the length of description is greater than 1024. And Pagerduty does not return an error when description is passed as Hash type( will be convert to Json object when sending to Pagerduty as payload).
As per here, event filtering in sensu-plugin is deprecated. Upgrade sensu-plugin version to 2.0.0
anyway to have a check alert multiple teams. i know i can setup 2 alerts but that seems over kill especially for some checks. i would rather be able to do
pager_team: "dbas,sysadmins"
or something similar.
sensu-plugin 2.7 has new logic to aid mutators and handlers map sensu 2 event data into sensu 1.x event data.
[root@sensuserver handlers]# gem cert --add <(curl -Ls https://raw.githubusercontent.com/sensu-plugins/sensu-plugins-pagerduty/master/certs/sensu-plugins.pem)
Added '/CN=mattjones/DC=yieldbot/DC=com'
[root@sensuserver handlers]# gem install sensu-plugins-pagerduty -P LowSecurity
ERROR: While executing gem ... (Gem::Security::Exception)
certificate /CN=mattjones/DC=yieldbot/DC=com not valid after 2016-01-28 21:02:51 UTC
Currently a client level pager_team overrides all check based pager_team setting.
However, it would make more sense (to me at least) to have check level settings override client level settings. This allows the client level setting to be the "default" for all checks, but allows various checks to be able to define teams. This comes in especially handy when we have applications doing checking and routing to their responsible teams where system level checks will typically go to our system engineering team.
The alert text in Pagerduty does not change even when the output from the check has changed.
This means alerts can get suddenly worse, without me being aware of it.
Example:
Suppose I have a disk usage check which runs every ten minutes.
The disk is filling up so the check begins alerting (as seen in Uchiwa):
CheckDisk WARNING: / 71%
I get a Pagerduty alert with text:
CheckDisk WARNING: / 71%
Ten minutes later, the disk is filling up really fast so the alert changes (again as seen in Uchiwa):
CheckDisk CRITICAL: / 95%
But the alert in Pagerduty still says:
CheckDisk WARNING: / 71%
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.