Giter Site home page Giter Site logo

criteo / cassandra_exporter Goto Github PK

View Code? Open in Web Editor NEW
168.0 22.0 96.0 278 KB

Apache Cassandra® metrics exporter for Prometheus

License: Apache License 2.0

Java 90.17% Shell 3.19% Python 4.17% Dockerfile 1.72% Jinja 0.75%
cassandra prometheus prometheus-exporter

cassandra_exporter's Introduction

Cassandra Exporter travis badge

logo

Description

Cassandra exporter is a standalone application which exports Apache Cassandra® metrics throught a prometheus friendly endpoint. This project is originally a fork of JMX exporter but aims at an easier integration with Apache Cassandra®.

Specifically, this project brings :

  • Exporting EstimatedHistogram metrics specific to Apache Cassandra®
  • Filtering on mbean's attributes
  • Metrics naming that respect the mbean hierarchy
  • Comprehensive config file

An essential design choice the project makes is to not let prometheus drive the scraping frequency. This decision has been taken because a lot of Apache Cassandra® metrics are expensive to scrap and can hinder the performance of the node. As we don't want this kind of situation to happen in production, the scrape frequency is restricted via the configuration of Cassandra Exporter.

Grafana Grafana

Design explanation

The project has two focus: safety and maintainability.

Every time a tradeoff had to be made, the solution that prioritize one of those points got the advantage

Why not provide the exporter as an agent for cassandra ?
  • Safety: The agent share the same jvm than cassandra itself and I don't want metrics calls to be able to hammer down cassandra nodes.
  • Safety: If there is a bug/leak in the exporter itself it should not impact cassandra
  • Maintainability: Upgrading the exporter should not require to restart the cassandra cluster
Why cache metrics results, this is not the prometheus way ?
  • Safety: JMX is an heayweight RPC mechanism and some cassandra metrics calls are expensive to scrap (i.e: snapshots size) as they trigger some heavy operations for cassandra. Not caching results mean that you can bring down your nodes by just requesting the metrics page
Why not make more use of labels, be more prometheus way ?
  • Maintainability: I want the exporter to be able to support multiple version of cassandra (2.2.X/3.X/4.X) without having to hand tune the metrics labels for each version of cassandra. Metrics path change between versions of cassandra and I want to avoid the hustle of having to maintain the mapping
Why this exporter is slower than jmx_exporter ?
  • Maintainability: When your cluster grow in number of nodes, the cardinality of metrics start to put too much pressure on Prometheus itself. A lot of this cardinality is due to the not too much usefulness of metrics like 999thpercentile and others. This exporter let you choose to not export them, which is not possible with jmx_exporter, but at the cost of a small runtime penality in order to discover them. So this exporter let you reach a bigger scale before you have to rely on metric aggregation in order to scale more.

Unless you have hundreds of tables, the scrap time will stay below 10sec

Why the exporter is not written in GO ?
  • Cassandra metrics are only available trought JMX, which in turn is only accessible with Java.

How to use

To start the application

java -jar cassandra_exporter.jar config.yml

The Cassandra exporter needs to run on every Cassandra nodes to get all the informations regarding the whole cluster.

You can have a look at a full configuration file here The 2 main parts are :

  1. blacklist
  2. maxScrapFrequencyInSec

In the blacklist block, you specify the metrics you don't want the exporter to scrape. This is important as JMX is an RPC mechanism and you don't want to trigger some of those RPC. For example, mbeans endpoint from org:apache:cassandra:db:.* does not expose any metrics but are used to trigger actions on Cassandra's nodes.

In the maxScrapFrequencyInSec, you specify the metrics you want to be scraped at which frequency. Basically, starting from the set of all mbeans, the blacklist is applied first to filter this set and then the maxScrapFrequencyInSec is applied as a whitelist to filter the resulting set.

As an example, if we take as input set the metrics {a, b, c} and the config file is

blacklist:
  - a
maxScrapFrequencyInSec:
  50:
    - .*
  3600:
    - b

Cassandra Exporter will have the following behavior:

  1. The metrics matching the blacklisted entries will never be scraped, here the metric a won't be available
  2. In reverse order of frequency the metrics matching maxScrapFrequencyInSec will be scraped
    1. Metric b will be scraped every hour
    2. Remaining metrics will be scrapped every 50s, here only c

Resulting in :

Metric Scrap Frequency
a never
b every hour
c every 50 seconds

Once started the prometheus endpoint will be available at localhost:listenPort/ or localhost:listenPort/metrics and metrics format will look like the one below

cassandra_stats{name="org:apache:cassandra:metrics:table:biggraphite:datapoints_5760p_3600s_aggr:writelatency:50thpercentile",} 35.425000000000004

How to debug

Run the program with the following options:

java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter.jar config.yml --oneshot

You will get the duration of how long it took to scrape individual MBean, this is useful to understand which metrics are expansive to scrape.

Goods sources of information to understand what Mbeans are doing/create your dashboards are:

  1. https://cassandra.apache.org/doc/latest/operating/metrics.html
  2. https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/metrics
  3. http://thelastpickle.com/blog/2017/12/05/datadog-tlp-dashboards.html
  4. https://www.youtube.com/watch?v=Q9AAR4UQzMk

Config file example

host: localhost:7199
ssl: False
user:
password:
listenAddress: 0.0.0.0
listenPort: 8080
# Regular expression to match environment variables that will be added
# as labels to all data points. The name of the label will be either
# $1 from the regex below, or the entire environment variable name if no match groups are defined
#
# Example:
# additionalLabelsFromEnvvars: "^ADDL\_(.*)$"
additionalLabelsFromEnvvars:
blacklist:
   # Unaccessible metrics (not enough privilege)
   - java:lang:memorypool:.*usagethreshold.*

   # Leaf attributes not interesting for us but that are presents in many path (reduce cardinality of metrics)
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min

   # Path present in many metrics but uninterresting
   - .*:viewlockacquiretime:.*
   - .*:viewreadtime:.*
   - .*:cas[a-z]+latency:.*
   - .*:colupdatetimedeltahistogram:.*

   # Mostly for RPC, do not scrap them
   - org:apache:cassandra:db:.*

   # columnfamily is an alias for Table metrics in cassandra 3.x
   # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
   - org:apache:cassandra:metrics:columnfamily:.*

   # Should we export metrics for system keyspaces/tables ?
   - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*

   # Don't scrape us
   - com:criteo:nosql:cassandra:exporter:.*

maxScrapFrequencyInSec:
  50:
    - .*

  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*
    - .*:estimated.*
    - .*:totaldiskspaceused:.*

Docker

You can pull an image directly from Dockerhub:

docker pull criteord/cassandra_exporter:latest

Run docker in read-only mode (/tmp must be mounted as tmpfs to authorize sed on the config.yml when using dedicated env variables)

docker run -e CASSANDRA_EXPORTER_CONFIG_host=localhost:7198 --read-only --tmpfs=/tmp criteord/cassandra_exporter:latest

Kubernetes

To get an idea on how to integrate Cassandra Exporter in Kubernetes, you can look at this helm Chart.

Grafana

Dedicated dashboards can be found here

cassandra_exporter's People

Contributors

antonmatsiuk avatar ashangit avatar deimosfr avatar dmitriysafronov avatar erebe avatar geobeau avatar javsalgar avatar krolser avatar marianschmotzer avatar mpereira avatar part-time-githubber avatar pgoron avatar podile avatar porridge avatar rjshrjndrn avatar seleznev avatar therb1 avatar tsl-karlp avatar vdemonchy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra_exporter's Issues

Metrics in different gauges and labels such as keyspace and table names

Current implementation is such that all metrics are labels in a gauge named as "cassandra_stats".

I think, a better design would be similar to what JMX prometheus exporter does. Gauges are separated out. And labels can be configurable such as keyspace name or table name etc.

Any thoughts?

Query issues for "large" clusters

Hello,

We have a ~540 node cassandra cluster that are exporting ~1500 metrics each. We're sending over 800k time series in the cassandra_stats metric namespace. This is causing a lot of issues when querying Prometheus since the index gets hit so hard. Recording rules are definitely an option, but we don't always know in advance when something should have a recording rule to perform any aggregation.

Is there a workaround for this in the current code base? If not, would you be open to exploring a change with us?

No Pattern Input

rules:

  • pattern: 'org.apache.cassandra.metrics<type=(\w+), name=(\w+)><>Value: (\d+)'
    name: cassandra_$1_$2
    value: $3
    valueFactor: 0.001
    labels: {}
    help: "Cassandra metric $1 $2"
    type: GAUGE
    attrNameSnakeCase: false

prometheus yaml configuration missing

do we have prometheus yml for this ? I am able to see the metrics but I believe grafana is only configured on prometheus server . How can I access prometheus UI in this case ?

How can i start exporter?

to start the application
java -jar cassandra_exporter.jar config.yml

git clone https://github.com/criteo/cassandra_exporter.git
cd cassandra && java -jar cassandra_exporter.jar config.ym
java -jar cassandra_exporter.jar config.ym
Error: Unable to access jarfile cassandra_exporter.jar

find / -name "cassandra_exporter.jar" 2> /devnull
and the result is nothing
how can i start the exporter?

Skip metrics by data type

Hi, Thanks for this awesome project.

It looks like scraping non-numeric metrics causing more time for the whole run.

Debug log:
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=OperatingSystem as it as an unknown type java.lang.String with value Linux

In our case, whole run scraping took ~10000ms for ~850 metrics, but none of those are expensive scrapes.

If possible, exclude scraping/parsing non-numeric metrics which could improve performance.

Definition for $datacenter

Definition for $datacenter getting wrong value for variable.Could you please let mw know the correct values.
Screen Shot 2019-03-14 at 10 29 34 AM

[Help needed] Minimal metrics setup to monitor node health

I'm very new to Cassandra and need to healthcheck a cluster of 5 Cassandra nodes with ~1000 of keyspaces created.

With the default setup, I'm getting a huge number of metrics in my Prometheus, which makes it impossible to query for a time range of more than 1 hour, then it gets killed by OOM. And all those metrics aren't informative to me.

So for the sake of future users, who faced the same problem, I wonder is there a minimal config for the metrics to scrape so average administrator could take a look at and see if there is (or gonna be) something wrong with Cassandra?

Thanks in advance for any kind of help and excuse me for my barbarian English.

Can't set configs on docker run

I am trying to set some configs like user and password on my cassandra_exporter, however, everytime I try to mount the config on the container, I have this error mentioned below.

I tried to set those configs using -e "VARKey value" but it also didn't work.

As I see there are many people using it, I just assume I'm doing it wrong, but I didn't figure out how to properly use it.

The configs are currently on /tmp/config.yml, but it was in different directories before, I was just trying to move around to check permissions.

docker run --privileged --rm -ti -v /tmp/config.yml:/etc/cassandra_exporter/config.yml --name cassandra-exporter criteord/cassandra_exporter

Starting Cassandra exporter
JVM_OPTS: 
CASSANDRA_EXPORTER_CONFIG_user 
sed: cannot rename /etc/cassandra_exporter/sedjzhwca: Device or resource busy

Streaming related metrics are missing

Hi,

I'm using cassandra_exporter-2.2.1 pre-built library for exporting cassandra metrics and found that streaming related metrics are missing.

Do you guys aware of this problem or I'm I missing something here?

Thanks.

Is it possible to actually make use of labels?

There is a lot of good stuff here, but I hate that it lumps everything under cassandra_stats with the name label looking like a full graphite path that is ':' separated. This stuff should be broken into multiple metrics with multiple labels to really leverage prometheus. Does the exporter support the JMX exporters ability to change the exported metric format?

Grafana dashboard

Hi, thanks for the project!

Do you have grafana dashboard for cassandra metrics?

Missing table and keyspace flags in cassandra_stats

Hello there,

I am in the progress of adding the Cassandra exporter to my Cassandra cluster to measure the amount of tombstones, but unfortunately I am not getting any table and keyspace attributes along with my statistics inside the cassandra_stats object. I do see these stats show up in multiple examples among the documentation. Am I missing something or is this a bug in the later versions of Apache Cassandra?

Environment details
Cassandra version: 3.11.1
Cassandra exporter version: 2.2.0

Configuration

   host: localhost:7199
    ssl: False
    user:
    password:
    listenAddress: 0.0.0.0
    listenPort: 8080
    blacklist:
       # To profile the duration of jmx call you can start the program with the following options
       # > java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter.jar config.yml --oneshot
       #
       # To get intuition of what is done by cassandra when something is called you can look in cassandra
       # https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/metrics
       # Please avoid to scrape frequently those calls that are iterating over all sstables
       # Unaccessible metrics (not enough privilege)
       - java:lang:memorypool:.*usagethreshold.*
       # Leaf attributes not interesting for us but that are presents in many path
       - .*:999thpercentile
       - .*:95thpercentile
       - .*:fifteenminuterate
       - .*:fiveminuterate
       - .*:durationunit
       - .*:rateunit
       - .*:stddev
       - .*:meanrate
       - .*:mean
       - .*:min
       # Path present in many metrics but uninterresting
       - .*:viewlockacquiretime:.*
       - .*:viewreadtime:.*
       - .*:cas[a-z]+latency:.*
       - .*:colupdatetimedeltahistogram:.*
       # Mostly for RPC, do not scrap them
       - org:apache:cassandra:db:.*
       # columnfamily is an alias for Table metrics
       # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
       - org:apache:cassandra:metrics:columnfamily:.*
       # Should we export metrics for system keyspaces/tables ?
       - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*
       # Don't scrap us
       - com:criteo:nosql:cassandra:exporter:.*
    maxScrapFrequencyInSec:
      50:
        - .*
      # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
      3600:
        - .*:snapshotssize:.*
        - .*:estimated.*
        - .*:totaldiskspaceused:.*

exporter fails with 4.0/trunk

I haven't had time to dig into the root of this exception yet, but wanted to bring it up. Running the exporter with 4.0 throws an exception:

java -jar ./build/libs/cassandra_exporter-2.2.1-all.jar config.yml                                 ✭master
[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from config.yml
[main] ERROR com.criteo.nosql.cassandra.exporter.Main - Scrapper stopped due to uncaught exception
java.lang.ClassCastException: java.util.ArrayList cannot be cast to [J
	at com.criteo.nosql.cassandra.exporter.JmxScraper.updateMetric(JmxScraper.java:300)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.lambda$run$7(JmxScraper.java:164)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:164)
	at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:36)

Embedded Cassadnra Exporter into to an existing agent

We would like to embed cassandra exporter in to our existing Cassandra agent process which does lot other things other than capturing metric. Please let me know if the feature is available, I will start work on it otherwise.

Error while running generate.py

While running generate.py, there is an error showing Error during export dashboard: cassandra_default,
Error during export dashboard: cassandra_kubernetes.

Cannot retrieve the datacenter name error

Hi. I'm seeing the following error on 2.2.1 (Cassandra 2.0.11.93):

ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node

Full output:

$ java -jar cassandra_exporter-2.2.1-all.jar config.yml
[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from config.yml
[main] ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node
javax.management.AttributeNotFoundException: No such attribute: HostIdToEndpoint
	at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:81)
	at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
	at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:639)
	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
	at sun.rmi.transport.Transport$1.run(Transport.java:200)
	at sun.rmi.transport.Transport$1.run(Transport.java:197)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:276)
	at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:253)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:162)
	at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
	at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source)
	at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:903)
	at com.criteo.nosql.cassandra.exporter.JmxScraper$NodeInfo.getNodeInfo(JmxScraper.java:375)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:155)
	at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:36)

config.yml

host: localhost:10144
ssl: False
listenAddress: 0.0.0.0
listenPort: 9198
blacklist:
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min
maxScrapFrequencyInSec:
  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*

I'm able to query JMX successfully with jmxterm on localhost:10144.

Thanks!

Run with Cassandra in Container?

How is this meant to be run if you already have Cassandra in a docker container, if it can't be a javaagent like jmx_exporter?

Nodes up/down status

Hi,
I couldn't find any metric which gives the node status.
The closest metric is using "up" but this only indicates if the exporter is up/down.

Question on configuration variable passing

I think its possibly a bug or my configuration mistake

I tried setting host as remote ip address

$ head -1 /app/config.yml
host: 10.42.10.34:7199

On starting it still tries to connect to localhost, here are the logs:

$ java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter-1.0.1-all.jar /app/config.yml
[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from /app/config.yml
[main] TRACE com.criteo.nosql.cassandra.exporter.Config - com.criteo.nosql.cassandra.exporter.Config@42dafa95
[main] ERROR com.criteo.nosql.cassandra.exporter.Main - Scrapper stopped due to uncaught exception
java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
	java.net.ConnectException: Connection refused (Connection refused)
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129)
	at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source)
	at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2430)
	at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:308)
	at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:104)
	at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:36)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at java.net.Socket.connect(Socket.java:538)
	at java.net.Socket.<init>(Socket.java:434)
	at java.net.Socket.<init>(Socket.java:211)
	at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
	at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:148)
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)

prefix

Is there anyway to add prefix or hostname with metrics

Metrics for nodes status (up/down)

Hello,

nodetool status reports status of every node in the cluster (from current node point of view), that can help to detect network partitions and another weird issues like this (fourth node thinks that all is ok, but another nodes disagree):

root@cassandra-0:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.48 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
DN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

root@cassandra-1:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.44 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
DN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

root@cassandra-2:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.44 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
DN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

root@cassandra-3:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.44 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
UN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

This information is available via JMX, but corresponding MBean attribute has java.util.List type (which unsupported by exporter):

# java -jar jmxterm-1.0.0-uber.jar --url localhost:7199
Welcome to JMX terminal. Type "help" for available commands.
$>bean org.apache.cassandra.db:type=StorageService
#bean is set to org.apache.cassandra.db:type=StorageService
$>get LiveNodes
#mbean = org.apache.cassandra.db:type=StorageService:
LiveNodes = ( 10.140.80.4, 10.140.81.4, 10.140.82.4, 10.140.83.2 );

$>get UnreachableNodes
#mbean = org.apache.cassandra.db:type=StorageService:
UnreachableNodes = (  );

Is there a good way to detect issues like this?


Also, I looked on criteo/casspoke, but, if I understand it right, it can detect nodes that unavailable for client, but not inner communication issues. Looks like it can't cover this case. :(

Wrong java path in docker image

I'm afraid the latest release of the docker image does not work:

Starting Cassandra exporter
JVM_OPTS: 
[dumb-init] /usr/bin/java: No such file or directory

Looks like Java is now installed in a different way and location, but run.sh hardcodes the old path:

$ docker run -t -i criteord/cassandra_exporter:2.3.2 grep java /run.sh
/sbin/dumb-init /usr/bin/java ${JVM_OPTS} -jar /opt/cassandra_exporter/cassandra_exporter.jar /etc/cassandra_exporter/config.yml
$ docker run -t -i criteord/cassandra_exporter:2.3.2 which java
/usr/local/openjdk-11/bin/java
$ docker run -t -i criteord/cassandra_exporter:2.3.2 ls -l /usr/bin/java
ls: cannot access '/usr/bin/java': No such file or directory
$ 

Seeing only `cassandra_stats` in output

Hi there!
Thanks for the exporter, original prometheus jmx exporter is somewhat unstable in our environment.

Before i start heavy digging, i'd like to ask why i can only see cassandra_stats.
There are lot's of stuff to collect, and it seems things like clientrequest and columnfamily are not shown.

Am i missing something obvious here?

Config:

---
host: localhost:7199
ssl: False
listenPort: 4067
blacklist:
   # Unaccessible metrics (not enough privilege)
   - java:lang:memorypool:.*usagethreshold.*
   # Leaf attributes not interesting for us but that are presents in many path (reduce cardinality of metrics)
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min
   # Path present in many metrics but uninterresting
   - .*:viewlockacquiretime:.*
   - .*:viewreadtime:.*
   - .*:cas[a-z]+latency:.*
   - .*:colupdatetimedeltahistogram:.*
   # Mostly for RPC, do not scrap them
   - org:apache:cassandra:db:.*
   # columnfamily is an alias for Table metrics in cassandra 3.x
   # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
   - org:apache:cassandra:metrics:columnfamily:.*
   # Should we export metrics for system keyspaces/tables ?
   - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*
   # Don't scrape us
   - com:criteo:nosql:cassandra:exporter:.*
maxScrapFrequencyInSec:
  50:
    - .*
  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*
    - .*:estimated.*
    - .*:totaldiskspaceused:.*

JMX is fine

Thanks a lot in advance!

P.S. Sample output | head -n 10:

# TYPE cassandra_stats gauge
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:clientrequest:rangeslice:unavailables:count",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indextable:someenv:newmessages:newmessages_deleted_idx:rangelatency:99thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indextable:someenv:usersessions:usersessions_deleted_idx:writelatency:count",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indextable:someenv:newchatusers:newchatusers_chat_type_unencr_idx:coordinatorreadlatency:99thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="someenv",table="stickers",name="org:apache:cassandra:metrics:table:someenv:stickers:readtotallatency:count",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="someenv",table="drafts",name="org:apache:cassandra:metrics:table:someenv:drafts:readlatency:98thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indexcolumnfamily:someenv:newuserchats:newuserchats_chat_type_unencr_idx:tombstonescannedhistogram:50thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="channels",table="channels",name="org:apache:cassandra:metrics:table:channels:channels:readlatency:50thpercentile",} 0.0```

Separated statistics per node

Hi,

thanks for your efforts of putting this together! Is there a way to get statistics per node in the cluster as well? Sometimes single nodes misbehave where a dashboard helps to quickly identify a faulty node.

Kind regards,
Christian

Support listen on specific address

There is no way to configure cassandra_exporter listen on specific address in current version.

I use ansible control my cluster running on aws, I want exporter listen on EC2 instance's secondary private ip address to simplify my ansible settings.

Thank you for this great project.

Option to scrap only whitelisted metrics

Hi. Thanks a lot for the project!
While trying to optimize the amount of metrics I store and process in Prometheus, I was wondering, is there a way in cassandra-exporter to scrap a list of metrics or metrics families that we know we want, instead of blacklisting the rest?
The whitelist is much easier to make and maintain in my opinion.

Thanks a lot.

Cannot retrieve the datacenter name information for the node

Hi Team,

When i run the command to start the exporter(2.3.4) getting the below error.

[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from config.yml
[main] TRACE com.criteo.nosql.cassandra.exporter.Config - com.criteo.nosql.cassandra.exporter.Config@887af79
[main] ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node
javax.management.AttributeNotFoundException: No such attribute: HostIdToEndpoint
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:81)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:639)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:276)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:253)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:162)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source)
at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:903)
at com.criteo.nosql.cassandra.exporter.JmxScraper$NodeInfo.getNodeInfo(JmxScraper.java:413)
at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:175)
at com.criteo.nosql.cassandra.exporter.Main.start(Main.java:38)
at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:30)

Map Cassandra metrics with non-numeric value into metric name or label

Hi,

I'm particularly interested in the metric:

org.aparche.cassandra.metrics.compaction.pendingtasksbytablename

The value is a map like {columnfamily1:number, columnfamily2:number,...}.

Is there a way to tell cassandra-exporter to map this metric into something compatible with prometheus, like using prometheus labels, or changing the metric name?
E.g.:
cassandra_stats{name="org.aparche.cassandra.metrics.compaction.pendingtasksbytablename:<name_of_cf>"}

That way we can store in Prometheus the pending compactions with column family granularity and detect if a particular CF is suffering specially from compaction.

Thanks, regards,
Miguel

entrypoint overrride

Greetings,

I'm having a small implementation issue with this exporter.

I have created my own dockerfile which is using this image "criteord/cassandra_exporter". The dockerfile will overwrite the entrypoint of the exporter image, with a script that will put the exporter on hold and will wait until cassandra is up and the port is listening. After the DB is up, the CMD's of the exporter will launch, via the script -this was the procedure that worked on other DB's with images that had a simple /bin/exporter entrypoint with and/or connection string(that connected to a DB).

I couldn't make it with this exporter and would like to know if you could offer some suggestion on how to implement it via the method above or other methods that work.

Thank you in advance.

Cant retrieve Datacenter name ERROR

Hi,

When I am trying to run the exporter (v2.0.3) I receive:
ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node
javax.management.AttributeNotFoundException: No such attribute: Datacenter

In the config.yml file we tried to blacklist the node - org:apache:cassandra:db:.* but it didnt help.

It seems that the error is related with the code on JmxScraper.java:373. We tried to use and older (v1.0.1) which is not parsing the Datacenter name and it worked fine.

We are using Cassandra 3.0.6,

Cheers

Does not look you can have multiple metrics in a queries for arismathic evaluation?

This seems to be due to one stats design: cassasndra_stats. Any way to make this work?
Example with calculating percent:
this one works (used/used):
(cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:used"} / cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:used"}) * 100
this one does not (used/max):
(cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:used"} / cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:max"}) * 100

MBeanInfo parsing errors: wrong metrics type detection for C* 2.2.8

Hello,

there are numerous problems with MBeanInfo parser for Cassandra 2.2.8.
Both exporter and cassandra runtime logs are attached.

Could you confirm this behaviour with 2.x branch?
Thanks!

[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=MemoryPool,name=Compressed Class Space as it as an unknown type java.lang.String with value NON_HEAP
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=MemoryPool,name=Compressed Class Space as it as an unknown type javax.management.ObjectName with value java.lang:type=MemoryPool,name=Compressed Class Space
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse org.apache.cassandra.metrics:type=Cache,scope=RowCache,name=HitRate as it as an unknown type java.lang.Object with value NaN
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=MemoryManager,name=Metaspace Manager as it as an unknown type [Ljava.lang.String; with value [Metaspace, Compressed Class Space]

cassandra.system.log

cassandra_exporter.debug.log

I Can't build project

Hi:
I download this project, and build by gradle like
" gradle build"

But there have some problems, the error show as :

Could not resolve all artifacts for configuration ':classpath'.
Could not download shadow.jar (com.github.jengelman.gradle.plugins:shadow:2.0.1)
> Could not get resource 'https://jcenter.bintray.com/com/github/jengelman/gradle/plugins/shadow/2.0.1/shadow-2.0.1.jar'.
> Could not GET 'https://jcenter.bintray.com/com/github/jengelman/gradle/plugins/shadow/2.0.1/shadow-2.0.1.jar'.
> Connect to d29vzk4ow07wi7.cloudfront.net:443 [d29vzk4ow07wi7.cloudfront.net/52.222.217.47, d29vzk4ow07wi7.cloudfront.net/52.222.217.83, d29vzk4ow07wi7.cloudfront.net/52.222.217.198, d29vzk4ow07wi7.cloudfront.net/52.222.217.210] failed: Read timed out

Seems the shadow-2.0.1.jar needs some dependences.But I can use URL"https://jcenter.bintray.com/com/github/jengelman/gradle/plugins/shadow/2.0.1/shadow-2.0.1.jar" download this jar.

So I don't know how to fix this. Can you tell me how to build success?

thanks a lot .

Prometheus way to filtering metrics

Hi Team,

I am able to run the cassandra_exporter and able to see the metrics in UI. Usually we write the rules in exporter.yml to filter out the metrics, where how can we pass the rules yml file to filter out.

Or can we directly import the metrics in grafana and filter out as we have more number of cassandra servers checking on this.

Question about Datacenter metrics aggregation

Hi, I'm fairly new to the the exporter/prometheus world, looking for an alternative to Opscenter.
In Opscenter we can get metrics per IP, or per datacenter, consolidating all the IPs of that DC.
Is it something possible with this exporter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.