shamil / graphout Goto Github PK

View Code? Open in Web Editor NEW

10.0 3.0 1.0 215 KB

Graphout lets you query Graphite or Prometheus, then forward the results to different external services

Home Page: http://shamil.github.io/graphout

License: MIT License

JavaScript 98.32% Dockerfile 1.68%

graphite prometheus statuspage cloudwatch zabbix metrics

graphout's Introduction

What is Graphout

Graphout lets you query graphite or prometheus, then forward the results to different external services. Like Zabbix or StatusPage.io custom metrics.

The project considered BETA, however everything should work. Submit issues and/or suggestions. Pull requests are always welcome.

Why?

Graphite collects metrics, this is very cool, but how can I make use of these metrics? And not just for visualizing them. What if I have a central monitoring system like Zabbix, that responsible to send alerts, and I want to alert based on Graphite data? Or what if I want to do AWS Auto-scaling based on graphite data? How can I get this data into CloudWatch? I'm sure you have your own reasons to get this data out of Graphite to some external tool or service.

So, I decided that I need something that can answer the above questions. This is how Graphout was started. 💪

Features

Can take any query from Graphite render API
HTTPS and HTTP basic authentication
Average, maximum and minimum calculation (per output)
Filter queries (per output)
Log, Zabbix, CloudWatch and StatusPage.io outputs
New output modules very easy to write
New, support for Prometheus as query source
Docker image available in the DockerHub

Future work

Allow set interval per query
Write unit tests (help needed)
Create Upstart and Systemd service scripts
Nice to have: prepare a puppet module

Quick start guide

Install

# npm install --global graphout

Usage

# graphout --help
usage: graphout --config <config-path> --pid <pid-path> [-v]

Run

download example configuration, and save it to /etc/graphout/graphout.json
change the configuration to meet your graphite settings
make sure the example query will work on your environment, if not change it
Now, you can run Graphout

graphout --pid /tmp/graphout.pid --config /etc/graphout/graphout.json

Result

If all good, you should see data goes to a log file (/tmp/logoutput.log) written by the logoutput module. If not, try to set log_level to debug in configuration or post your issue(s) and I'll try to help you getting started.

Configuration

Configuration is a typical JSON file(s), with one addition that you can have comments in it. Also you can include configuration files from master config. See include option. The configuration file(s) validated using JSON schema, invalid configuration properties will cause Graphout to exit immediately. Read the schema for the accepted configuration format.

Query engines

Starting from Graphout version 0.4.0, there is support for query engines. Which allows to use query source other than graphite. Currently prometheus query engine supported in addition to graphite.

Graphout allows to use single query engine per configuration. Which means you can't use graphite and prometheus together. Thus you have to specify which query engine you want to use.

Minimal configuration

Note: by default Graphout uses graphite query engine.

graphite_url is mandatory
at least one query must be configured
at least one output must be configured

Example

{
    "graphite_url": "http://graphite.example.com:8080",

    "queries":
    {
        "go-carbon.updateOperations":
        {
            "query": "sumSeries(carbon.agents.*.persister.updateOperations)",
            "from": "-1min",
            "until": "now"
        }
    },

    "outputs":
    {
        "logfile":
        {
            "output": "./logoutput",
            "params": {
                "path": "/tmp/logoutput.log"
            }
        }
    }
}

Available configuration options

query_engine

Which query engine to use when executing queries, one of graphite or prometheus, default is graphite.

graphite_url/prometheus_url

URL to the graphite-web or prometheus-api. The option must conform to the URI format.

graphite_auth/prometheus_auth

HTTP basic authentication option in <username>:<password> format, optional.

interval

Query interval in seconds, default is 60 seconds

log_file

Full path to the log file, default is /var/log/graphout/graphout.log. Set this to /dev/stdout to print to console.

log_level

Minimal log level that will be printed, default is info. Available levels are: error, warn, info and debug.

splay

Delay each query by consistent random of seconds. If enabled, delay between 1 second and the query interval. Default is false

include

The include option is a list of configuration files to load. The files are loaded and merged in the specified order. Each include element can have glob based wildcards.

Example:

"include": ["/etc/graphout/conf.d/*.json", "/etc/graphout/example.json"]

queries

Query objects, for graphite or prometheus.

For graphite, the format is:

// Alphanumeric unique query name, with dots and hyphens allowed.
"go-carbon.updateOperations":
{
    // the graphite target
    "query": "sumSeries(carbon.agents.*.persister.updateOperations)",

    // relative or absolute time period
    "from": "-1min",

    // relative or absolute time period
    "until": "now",
}

For more information about the query (target), from and until options, read the Graphite Render URL API manual.

Note that, Graphout uses the maxDataPoints API option, to return 60 consolidated data points at most. The maxDataPoints option available since Graphite 0.9.13. So it's best that you have the latest version of graphite-web.

For prometheus, the format is:

// Alphanumeric unique query name, with dots and hyphens allowed.
"prometheus_cpu.5m.avg":
{
    // the prometheus instant-query
    "query": "sum(irate(node_cpu{role='prometheus', mode!='idle'}[5m])) * 100",

    // time=<rfc3339 | unix_timestamp>: evaluation timestamp, optional.
    "time": ""
}

For more information about the query (instant-query) and the time options, read the Prometheus HTTP API manual. Currently Graphout supports only vector result types. Open feature request, if you need the matrix type as well.

outputs

Output objects. The format is:

// Alphanumeric unique output name
"logfile":
{
    // ouput module name, Graphaut will use "require" to load the module
    "output": "./logoutput",

    // filter can be used to process only matched queries (using regular expression)
    // default: all queries are processed by the outputs.
    "filter": ".*",

    // the calculation method of the values received from query_engine
    // available methods: "avg", "min", "max"
    // default: "avg"
    "calculation": "avg"

    // "params" properties are specific to the "output" module
    "params": {
        "path": "/tmp/logoutput.log"
    }
}

Outputs configuration

Each output is a Node.js module. The only exception is a built-in logoutput output, which is part of this project. The other currently available outputs are CloudWatch, Statuspage.io and Zabbix they are separate packages. Those outputs are dependencies of this project, so they're installed automatically when you install Graphout.

logoutput

The only param for this output is path, to the log file where all queries results will be written to.

Documentaion of supported outputs:

CloudWatch output
StatusPage.io output
Zabbix output

Custom outputs

Custom outputs are very easy to write. You just write a function that accepts 3 arguments. Inside your function you listen to upcoming events and process them as you desire. Just take a look at the logoutput output as an example.

Function arguments

events (EventEmitter), where all the events will be sent to.
logger, the logger where you can send your logs to.
params, the output parameters, all the parameters that were passed to the output module (read above about output params)

Available events

raw

A very first event which includes exactly same data as it was retrieved from the query_engine, as JavaScript Object. Two arguments passed to the event, first is the raw data, second is the query options object.

values

The values array of the query, which still not passed any calculation. (nulls are omitted) Two arguments passed to the event, first is the values array, second is the query options object.

result

The calculated result, after calculation of avg, min or max. Depends what was requested in the query options. Two arguments passed to the event, first is the result value, second is the query options object.

Internal architecture

License

Licensed under the MIT License. See the LICENSE file for details.

graphout's People

Contributors

Stargazers

Watchers

Forkers

boomerchi

graphout's Issues

error with prometheus api

I get an error while trying graphout with prometheus v2.

2018-10-15 18:21:44 severity="info" interval="60000"
2018-10-15 18:21:44 severity="info" log_level="debug"
2018-10-15 18:21:44 severity="info" query_engine="prometheus"
2018-10-15 18:21:44 severity="info" splay="false"
2018-10-15 18:21:44 message="loading query engine" severity="info" engine="prometheus"
2018-10-15 18:21:44 message="loading output" severity="info" output="logfile" module="graphout-output-statuspage-io"
2018-10-15 18:21:44 message="executing query" severity="debug" query="prometheus_cpu.5m.avg" request="{'_pd':null,'protocol':'https:','hostname':XXX','port':null,'path':'/api/v1/query?_=123456&query=up&time=rfc3339','method':'GET','headers':{'Accept':'application/json, text/javascript'}}"
2018-10-15 18:21:44 message="query failed" severity="error" query="prometheus_cpu.5m.avg" error="bad HTTP status (400)"

A simple curl on the prometheus retrieve the metrics, without the _=124356 parameter. What is the purpose of this specific parameter ?

add option in outputs configuration to filter queries

CloudWatch

Thanks for this useful tool, that seems to work well. I have tested reading from Prometheus and writing to CloudWatch, and the basic functionality work.

I have two small issues though, that I would like to report.

Decimals are rounded to the nearest value. I use this query to get the memory pressure of my Kubernetes cluster, and would like to get the result into CloudWatch with one or two decimals. An configurable rounding options would be nice

    // queries section
    "queries":
    {
		"k8sprod2.WorkerMemoryUtilization":
		{
			// the prometheus instant-query
			"query": "sum(container_memory_working_set_bytes{id='/',kube_aws_coreos_com_role='worker'}) / sum(machine_memory_bytes{kube_aws_coreos_com_role='worker'}) * 100",

			// time=<rfc3339 | unix_timestamp>: evaluation timestamp, optional.
			"time": ""
		}
    },

There is no dimension assigned to the metric

The binary is run from root user

For security reasons it is better to use the unprivileged user nobody or similar

graphite metric values passed by graphout are different than provided by graphite

it seems that graphout logs a value that is different than as returned by graphite

From Graphite:
[{"target": "netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy", "datapoints": [[25.33446831696655, 1461842400], [25.7734740653523, 1461842520], [24.3570468278944, 1461842640], [23.711759390607803, 1461842760], [20.9118678938947, 1461842880], [19.24948126962095, 1461843000], [17.3139677711872, 1461843120], [16.9410851940636, 1461843240], [18.42542053910585, 1461843360], [17.364125610583102, 1461843480], [17.9149330001911, 1461843600], [20.77903744127045, 1461843720], [24.8942426859944, 1461843840], [24.571219714625748, 1461843960], [24.243233581547052, 1461844080], [25.46816505205415, 1461844200], [39.16704786828345, 1461844320], [39.40435667720285, 1461844440], [36.336761017686854, 1461844560], [22.27124974191695, 1461844680], [18.248599238215952, 1461844800], [16.91937757664115, 1461844920], [17.973630250759697, 1461845040], [18.342401167242002, 1461845160], [18.70834590831095, 1461845280], [18.0758846361824, 1461845400], [18.6187341950368, 1461845520], [31.877802993933052, 1461845640], [28.141148342675102, 1461845760], [32.63457091455205, 1461845880], [24.5021316754224, 1461846000], [17.8243685593147, 1461846120], [16.870662293467298, 1461846240], [16.56807217429585, 1461846360], [18.128912716322553, 1461846480], [17.182529652129297, 1461846600], [19.72172020432835, 1461846720], [18.298302881740298, 1461846840], [25.06567155973265, 1461846960], [37.972886343796546, 1461847080], [36.64555787314655, 1461847200], [23.7312807510069, 1461847320], [22.4088348458965, 1461847440], [23.09373883044265, 1461847560], [16.236113348037648, 1461847680], [16.26398928346125, 1461847800], [17.06447199255425, 1461847920], [16.961864672827247, 1461848040], [19.62282370327315, 1461848160], [20.3817087678584, 1461848280]]}]

From /tmp/logoutput.log:

2016-04-28 20:17:47 severity="info" result="25.5372" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:18:47 severity="info" result="25.5465" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:19:47 severity="info" result="25.5524" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:20:47 severity="info" result="25.5496" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:21:47 severity="info" result="25.5442" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:22:47 severity="info" result="25.5406" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:23:47 severity="info" result="25.5363" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:24:48 severity="info" result="25.5319" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:25:48 severity="info" result="25.5261" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:26:48 severity="info" result="25.5212" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:27:48 severity="info" result="25.519" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min"
 until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:28:48 severity="info" result="25.5176" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:29:48 severity="info" result="25.5154" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:30:48 severity="info" result="25.5133" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:31:48 severity="info" result="25.5123" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:32:48 severity="info" result="25.5123" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:33:48 severity="info" result="25.5112" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:34:48 severity="info" result="25.5098" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:35:48 severity="info" result="25.5085" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:36:48 severity="info" result="25.5146" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:37:48 severity="info" result="25.5271" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:38:43 severity="info" result="25.5378" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:43:47 severity="info" result="25.5603" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:45:53 severity="info" result="25.5607" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-1min
" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:46:18 severity="info" result="25.5659" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:47:18 severity="info" result="25.5654" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:48:18 severity="info" result="25.5656" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:49:18 severity="info" result="25.5641" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:50:06 severity="info" result="25.5592" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:51:06 severity="info" result="25.5538" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:52:06 severity="info" result="25.5468" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:53:06 severity="info" result="25.5413" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:54:32 severity="info" result="25.5352" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"
2016-04-28 20:55:42 severity="info" result="25.5266" query="netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy" from="-10mi
n" until="now" name="sdt-cdot1-01.avg_processor_busy"

my "/etc/graphout/graphout.json" file:

/**
 * example Graphout configuration file
 */
{
    // graphite-web options
    "graphite_url": "http://sdt-graphite.nltestlab.hq.netapp.com:81",

    // log file options
    "log_file": "/dev/stdout",
    "log_level": "debug",

    // query interval (in seconds)
    "interval": 60,

    // delay each query by consistent random of seconds
    // if enabled, delay between 1 second and the query interval
    "splay": false,

    // queries section
    "queries":
    {
        "sdt-cdot1-01.avg_processor_busy":
        {
            "query": "netapp.perf.nl.sdt-cdot1.node.sdt-cdot1-01.processor.avg_processor_busy",
            "from": "-1min",
            "until": "now"
        }
    },

    // outputs section
    "outputs":
    {
        "logfile":
        {
            "output": "./logoutput",
            "params": {
                "path": "/tmp/logoutput.log"
            }
        },
        "zabbix":
        {
                "output": "graphout-output-zabbix",
                "params":
                {
                        "host": "localhost",
                        "port": 10051,
                        "target": "monitor",
                        "namespace": "graphout"
                }
        }
    }
}

Am I doing something wrong here or is there a bug?