Giter Site home page Giter Site logo

elarm's Introduction

Elarm Build Status

Elarm is an Alarm Manager for Erlang. It is designed to be easy to include in an Erlang based system. Most functions are implemented through plugins so it is easy to change the behaviour if necessary.

Quick Start

Elarm is designed to be part of an Erlang system.

But it can also be started quickly from an Erlang shell:

$ git clone [email protected]:esl/elarm.git
[...]
$ cd elarm
$ make
[...]
$ erl -pa ebin
> application:start(elarm).
ok
> elarm:raise(partition_full, "/dev/hda2", [{level,90}]).
ok
> elarm:get_alarms().
{ok,[{alarm,partition_full,undefined,"/dev/hda2",
            {{2014,5,12},{10,46,45}},
            {1399,891605,536270},
            indeterminate,<<>>,<<>>,<<>>,
            [{level,90}],
            [],[],undefined,undefined,new,undefined}]}
> elarm:clear(partition_full, "/dev/hda2").
ok
> elarm:get_alarms().
{ok,[]}
>

Alarms vs Events

Alarms and Events are often mixed up, but there are some important differences.

An Event is stateless, it just says "Something happened", e.g. "failed to open file" or "invoice created", and that is all. Logging tools like lager do event logging.

An Alarm on the other hand has state, it is an indicator that something is wrong in the system. Once an alarm is raised it remains active until the error condition connected to the alarm no longer applies. When the error condition is removed the alarm is cleared.

Functions

Alarm List

Elarm keeps a list of all currently active alarms. It is possible for a Manager application to subscribe to all changes in the alarm list.

A user can acknowledge alarm, that is to tell Elarm that he is aware of the alarm. It is also possible to add comments to an alarm. Finally it is possible to manually clear an alarm. Normally an alarm is cleared by the application that raised the alarm, but in some cases that is not done, e.g. if the alarms are received as SNMP traps from another system.

Alarm Log

All changes to an alarm are logged in an alarm log.

Alarm Configuration

An application raising an alarm should not have to know too much about the alarm handling, so when an alarm is raised the application only has to supply a name of the alarm and the identity of the entity the alarm applies to and optionally some additional information, as an example the name could be 'partition_full' and the entity "/dev/hda1" and the additional information could be "90%".

For an operator it is useful to have some additional information when handling the alarms. This information includes, "severity" how serious is the alarm, "probable_cause" what is the likely reason for the alarm, "proposed_repair_action" how can I fix the problem. In Elarm it is possible to add this information to the alarms by adding configuration data for an alarm. If no configuration data is found when an alarm is raised, default data is used. The default data is defined in the elarm.app file and can be overridden by data in sys.config.

To know which alarms need configuration data Elarm is recording all alarms that have been raised for which there is no configuration. It is possible to query Elarm for a list of alarms that are missing configuration using elarm:get_unconfigured/0,1. Alarm configuration can be added via elarm:add_configuration/2.

Using

Starting

When Elarm is started it starts one instance of the alarm manager, with the name elarm_server. It is possible to run several alarm managers at once. By putting an environment variable named servers in the system configuration file, with the value a list of tuples {ServerName, Opts}, Elarm will start one alarm manager for each tuple. It is also possible to manually start a new alarm manager using elarm:start_server(Name, Opts).

Application

An application wanting to raise an alarm just have to call

ok = elarm:raise(partition_full, "/dev/hda2", [{level,90}])

and to clear it

ok = elarm:clear(partition_full, "/dev/hda2")

So Name and Entity uniquely identify an alarm.

Manager

Get all alarms

A Management application can request a list of all currently active alarms by elarm:get_alarms/1.

Subscriptions

Alarm List

To subscribe to all alarm events use elarm:subscribe(Server, Filter). This will return a {Ref, AlarmList}, where Ref is a reference that will be included in all received messages and to cancel the subscription. AlarmList is a list of all the currently active alarms that match the filter. For all changes to the alarms, matching the filter, a message will be received.

The filter is a list of filter elements, if one filter element matches then the filter matches:

FilterElement = all | {type, alarm_type()} | {src, alarm_src()}
  • all, matches all alarms
  • {type, Type}, matches alarms with alarm_type == Type
  • {src, Src}, matches alarms with alarm_src == Src

The type and src filter elements may appear several times, in that case each one is tried and if one matches then the filter matches.

The format of the messages are:

  • new alarm:

     {elarm, Ref, alarm()}
  • acknowledged alarm:

     {elarm, Ref, {ack, alarm_id(), alarm_src(), event_id(), ack_info()}}
  • unacknowledged alarm:

     {elarm, Ref, {unack, alarm_id(), alarm_src(), event_id(), ack_info()}}
  • cleared alarm:

     {elarm, Ref, {clear, alarm_id(), alarm_src(), event_id(), Reason :: ok | source_gone | term()}}
  • manual cleared alarm:

     {elarm, Ref, {manual_clear, alarm_id(), alarm_src(), event_id(), {manual, user_id()}}}
  • comment added:

     {elarm, Ref, {add_comment, alarm_id(), alarm_src(), event_id, comment()}}

Alarm Summary

Alarm Summary gives a summary of the presence or absence of unacknowledged and acknowledged alarms of the various severities. This is useful for e.g. show the status on maps or other overview user interfaces.

To start a subscription use elarm:subscribe_summary(Server, Filter).

Filter is the same as for alarm list subscriptions. Every time the summary changes a message will be received.

The format of the messages is:

{elarm, Ref, #alarm_summary{}}

Acknowledge Alarms

An alarm has two states, initially it is new, to show the user that it is a new alarm. When the user has seen the alarm he can acknowledge it using elarm:acknowledge/3. This will change the alarm state to acknowledged, and a timestamp and the UserId will be added to the alarm.

Comment Alarms

It is possible to add comments to alarms, e.g. to add notes of troubleshooting or corrective actions taken. Each comment will be stored with a timestamp and the UserId of the user.

Clear Alarm

Normally alarms are cleared automatically by the application by calling elarm:clear, but in some cases it may be necessary to manually clear an alarm, this can be done using elarm:manual_clear/3.

Plugins

elarm's People

Contributors

dszoboszlay avatar hcs42 avatar jonasrichard avatar lastres avatar luos avatar nygge avatar olikasg avatar surik avatar zsoltdudas-esl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elarm's Issues

Remove dependency on gproc

Why does this application need gproc? Why not let users choose how they want to register stuff? Couldn't a simple local name be given to the elarm_summary_sup process?

Retrieving Alarm Log

Unable to get alarm log using elarm:read_log/1 with filter 'all'.

Can anyone help me with an example for retrieving Alarm Log ?

Receiving clear events while using filters

Unable to receive clear event while using filters {type, Type} or {src, Src}. Only receiving alarm raise info.
while using filters {type, Type} or {src, Src}, elarm is acknowledging raise info to the subscribed process. but it is not acknowledging for clear info to subscribed process.
when i use filter as "all", both raise and clear acknowledgement is given by elarm.

I have attached source code below.
subscribed_process.txt
monitor_process.txt

Can anyone suggest me how to receive both alarm and clear events by using filter "type" or "src"?

Hex Package

Make sure the repo is rebar3-compatible and its deps are also hex.pm packages, then publish it on hex.pm.

How often should clear be called? In after

Is this the pattern to use? Always call clear if things went well regardless if there is a raise before?

try
    : code towards /dev/hda2
    ok = elarm:clear(partition_full, "/dev/hda2")
 catch
      ok = elarm:raise(partition_full, "/dev/hda2", [{level,90}])
 end

elarm functions could return whether the alarm already existed

We had an idea with Richard that elarm:raise and elarm:clear could return whether they raised/cleared an alarm or did nothing because the alarm was already existing/non-existing.

This way we could write code like this, which logs the problems only once:

case elarm:raise(...) of
    ok ->
        lager:error(...);
    existing ->
        ok
end.

@nygge, what do you think?

Update additional info when reraising an alarm

When raising an alarm twice, the additional info is not updated.

% There is not much space left on hda2
> ok = elarm:raise(partition_full, "/dev/hda2", [{level,90}])

% There is even less space left on hda2
> ok = elarm:raise(partition_full, "/dev/hda2", [{level,92}])

Afterwards, I would expect level to be 92, but it is 90:

> elarm:get_alarms().
{ok,[{alarm,partition_full,undefined,"/dev/hda2",
            {{2014,6,20},{9,53,59}},
            {1403,258039,196826},
            indeterminate,<<>>,<<>>,<<>>,
            [{level,90}],
            [],[],undefined,undefined,new,undefined}]}

I suggest changing elarm so that it will update the additional info with each raise.

The question arises whether we need to send notifications about the additional info being changed. In the longer term, probably yes, but I suggest not implementing that yet, because another missing Elarm feature is changing the severity of an existing alarm, and these may affect each other.

Use of Thresholds

How the threshold field in the alarm configuration is being used. Can you provide any one Use case ?

thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.