Giter Site home page Giter Site logo

criteo / cassandra_exporter Goto Github PK

View Code? Open in Web Editor NEW
167.0 23.0 95.0 278 KB

Apache Cassandra® metrics exporter for Prometheus

License: Apache License 2.0

Java 90.17% Shell 3.19% Python 4.17% Dockerfile 1.72% Jinja 0.75%
cassandra prometheus prometheus-exporter

cassandra_exporter's People

Contributors

antonmatsiuk avatar ashangit avatar deimosfr avatar dmitriysafronov avatar erebe avatar geobeau avatar javsalgar avatar krolser avatar marianschmotzer avatar mpereira avatar pankajmt avatar pgoron avatar podile avatar porridge avatar rjshrjndrn avatar seleznev avatar therb1 avatar tsl-karlp avatar vdemonchy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra_exporter's Issues

Can't set configs on docker run

I am trying to set some configs like user and password on my cassandra_exporter, however, everytime I try to mount the config on the container, I have this error mentioned below.

I tried to set those configs using -e "VARKey value" but it also didn't work.

As I see there are many people using it, I just assume I'm doing it wrong, but I didn't figure out how to properly use it.

The configs are currently on /tmp/config.yml, but it was in different directories before, I was just trying to move around to check permissions.

docker run --privileged --rm -ti -v /tmp/config.yml:/etc/cassandra_exporter/config.yml --name cassandra-exporter criteord/cassandra_exporter

Starting Cassandra exporter
JVM_OPTS: 
CASSANDRA_EXPORTER_CONFIG_user 
sed: cannot rename /etc/cassandra_exporter/sedjzhwca: Device or resource busy

Cant retrieve Datacenter name ERROR

Hi,

When I am trying to run the exporter (v2.0.3) I receive:
ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node
javax.management.AttributeNotFoundException: No such attribute: Datacenter

In the config.yml file we tried to blacklist the node - org:apache:cassandra:db:.* but it didnt help.

It seems that the error is related with the code on JmxScraper.java:373. We tried to use and older (v1.0.1) which is not parsing the Datacenter name and it worked fine.

We are using Cassandra 3.0.6,

Cheers

How can i start exporter?

to start the application
java -jar cassandra_exporter.jar config.yml

git clone https://github.com/criteo/cassandra_exporter.git
cd cassandra && java -jar cassandra_exporter.jar config.ym
java -jar cassandra_exporter.jar config.ym
Error: Unable to access jarfile cassandra_exporter.jar

find / -name "cassandra_exporter.jar" 2> /devnull
and the result is nothing
how can i start the exporter?

exporter fails with 4.0/trunk

I haven't had time to dig into the root of this exception yet, but wanted to bring it up. Running the exporter with 4.0 throws an exception:

java -jar ./build/libs/cassandra_exporter-2.2.1-all.jar config.yml                                 ✭master
[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from config.yml
[main] ERROR com.criteo.nosql.cassandra.exporter.Main - Scrapper stopped due to uncaught exception
java.lang.ClassCastException: java.util.ArrayList cannot be cast to [J
	at com.criteo.nosql.cassandra.exporter.JmxScraper.updateMetric(JmxScraper.java:300)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.lambda$run$7(JmxScraper.java:164)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:164)
	at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:36)

Seeing only `cassandra_stats` in output

Hi there!
Thanks for the exporter, original prometheus jmx exporter is somewhat unstable in our environment.

Before i start heavy digging, i'd like to ask why i can only see cassandra_stats.
There are lot's of stuff to collect, and it seems things like clientrequest and columnfamily are not shown.

Am i missing something obvious here?

Config:

---
host: localhost:7199
ssl: False
listenPort: 4067
blacklist:
   # Unaccessible metrics (not enough privilege)
   - java:lang:memorypool:.*usagethreshold.*
   # Leaf attributes not interesting for us but that are presents in many path (reduce cardinality of metrics)
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min
   # Path present in many metrics but uninterresting
   - .*:viewlockacquiretime:.*
   - .*:viewreadtime:.*
   - .*:cas[a-z]+latency:.*
   - .*:colupdatetimedeltahistogram:.*
   # Mostly for RPC, do not scrap them
   - org:apache:cassandra:db:.*
   # columnfamily is an alias for Table metrics in cassandra 3.x
   # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
   - org:apache:cassandra:metrics:columnfamily:.*
   # Should we export metrics for system keyspaces/tables ?
   - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*
   # Don't scrape us
   - com:criteo:nosql:cassandra:exporter:.*
maxScrapFrequencyInSec:
  50:
    - .*
  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*
    - .*:estimated.*
    - .*:totaldiskspaceused:.*

JMX is fine

Thanks a lot in advance!

P.S. Sample output | head -n 10:

# TYPE cassandra_stats gauge
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:clientrequest:rangeslice:unavailables:count",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indextable:someenv:newmessages:newmessages_deleted_idx:rangelatency:99thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indextable:someenv:usersessions:usersessions_deleted_idx:writelatency:count",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indextable:someenv:newchatusers:newchatusers_chat_type_unencr_idx:coordinatorreadlatency:99thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="someenv",table="stickers",name="org:apache:cassandra:metrics:table:someenv:stickers:readtotallatency:count",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="someenv",table="drafts",name="org:apache:cassandra:metrics:table:someenv:drafts:readlatency:98thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="",table="",name="org:apache:cassandra:metrics:indexcolumnfamily:someenv:newuserchats:newuserchats_chat_type_unencr_idx:tombstonescannedhistogram:50thpercentile",} 0.0
cassandra_stats{cluster="clustername",datacenter="datacenter1",keyspace="channels",table="channels",name="org:apache:cassandra:metrics:table:channels:channels:readlatency:50thpercentile",} 0.0```

Wrong java path in docker image

I'm afraid the latest release of the docker image does not work:

Starting Cassandra exporter
JVM_OPTS: 
[dumb-init] /usr/bin/java: No such file or directory

Looks like Java is now installed in a different way and location, but run.sh hardcodes the old path:

$ docker run -t -i criteord/cassandra_exporter:2.3.2 grep java /run.sh
/sbin/dumb-init /usr/bin/java ${JVM_OPTS} -jar /opt/cassandra_exporter/cassandra_exporter.jar /etc/cassandra_exporter/config.yml
$ docker run -t -i criteord/cassandra_exporter:2.3.2 which java
/usr/local/openjdk-11/bin/java
$ docker run -t -i criteord/cassandra_exporter:2.3.2 ls -l /usr/bin/java
ls: cannot access '/usr/bin/java': No such file or directory
$ 

Option to scrap only whitelisted metrics

Hi. Thanks a lot for the project!
While trying to optimize the amount of metrics I store and process in Prometheus, I was wondering, is there a way in cassandra-exporter to scrap a list of metrics or metrics families that we know we want, instead of blacklisting the rest?
The whitelist is much easier to make and maintain in my opinion.

Thanks a lot.

Streaming related metrics are missing

Hi,

I'm using cassandra_exporter-2.2.1 pre-built library for exporting cassandra metrics and found that streaming related metrics are missing.

Do you guys aware of this problem or I'm I missing something here?

Thanks.

prefix

Is there anyway to add prefix or hostname with metrics

Is it possible to actually make use of labels?

There is a lot of good stuff here, but I hate that it lumps everything under cassandra_stats with the name label looking like a full graphite path that is ':' separated. This stuff should be broken into multiple metrics with multiple labels to really leverage prometheus. Does the exporter support the JMX exporters ability to change the exported metric format?

Grafana dashboard

Hi, thanks for the project!

Do you have grafana dashboard for cassandra metrics?

entrypoint overrride

Greetings,

I'm having a small implementation issue with this exporter.

I have created my own dockerfile which is using this image "criteord/cassandra_exporter". The dockerfile will overwrite the entrypoint of the exporter image, with a script that will put the exporter on hold and will wait until cassandra is up and the port is listening. After the DB is up, the CMD's of the exporter will launch, via the script -this was the procedure that worked on other DB's with images that had a simple /bin/exporter entrypoint with and/or connection string(that connected to a DB).

I couldn't make it with this exporter and would like to know if you could offer some suggestion on how to implement it via the method above or other methods that work.

Thank you in advance.

Question about Datacenter metrics aggregation

Hi, I'm fairly new to the the exporter/prometheus world, looking for an alternative to Opscenter.
In Opscenter we can get metrics per IP, or per datacenter, consolidating all the IPs of that DC.
Is it something possible with this exporter?

Map Cassandra metrics with non-numeric value into metric name or label

Hi,

I'm particularly interested in the metric:

org.aparche.cassandra.metrics.compaction.pendingtasksbytablename

The value is a map like {columnfamily1:number, columnfamily2:number,...}.

Is there a way to tell cassandra-exporter to map this metric into something compatible with prometheus, like using prometheus labels, or changing the metric name?
E.g.:
cassandra_stats{name="org.aparche.cassandra.metrics.compaction.pendingtasksbytablename:<name_of_cf>"}

That way we can store in Prometheus the pending compactions with column family granularity and detect if a particular CF is suffering specially from compaction.

Thanks, regards,
Miguel

Skip metrics by data type

Hi, Thanks for this awesome project.

It looks like scraping non-numeric metrics causing more time for the whole run.

Debug log:
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=OperatingSystem as it as an unknown type java.lang.String with value Linux

In our case, whole run scraping took ~10000ms for ~850 metrics, but none of those are expensive scrapes.

If possible, exclude scraping/parsing non-numeric metrics which could improve performance.

Nodes up/down status

Hi,
I couldn't find any metric which gives the node status.
The closest metric is using "up" but this only indicates if the exporter is up/down.

[Help needed] Minimal metrics setup to monitor node health

I'm very new to Cassandra and need to healthcheck a cluster of 5 Cassandra nodes with ~1000 of keyspaces created.

With the default setup, I'm getting a huge number of metrics in my Prometheus, which makes it impossible to query for a time range of more than 1 hour, then it gets killed by OOM. And all those metrics aren't informative to me.

So for the sake of future users, who faced the same problem, I wonder is there a minimal config for the metrics to scrape so average administrator could take a look at and see if there is (or gonna be) something wrong with Cassandra?

Thanks in advance for any kind of help and excuse me for my barbarian English.

Metrics in different gauges and labels such as keyspace and table names

Current implementation is such that all metrics are labels in a gauge named as "cassandra_stats".

I think, a better design would be similar to what JMX prometheus exporter does. Gauges are separated out. And labels can be configurable such as keyspace name or table name etc.

Any thoughts?

Error while running generate.py

While running generate.py, there is an error showing Error during export dashboard: cassandra_default,
Error during export dashboard: cassandra_kubernetes.

No Pattern Input

rules:

  • pattern: 'org.apache.cassandra.metrics<type=(\w+), name=(\w+)><>Value: (\d+)'
    name: cassandra_$1_$2
    value: $3
    valueFactor: 0.001
    labels: {}
    help: "Cassandra metric $1 $2"
    type: GAUGE
    attrNameSnakeCase: false

Cannot retrieve the datacenter name information for the node

Hi Team,

When i run the command to start the exporter(2.3.4) getting the below error.

[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from config.yml
[main] TRACE com.criteo.nosql.cassandra.exporter.Config - com.criteo.nosql.cassandra.exporter.Config@887af79
[main] ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node
javax.management.AttributeNotFoundException: No such attribute: HostIdToEndpoint
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:81)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:639)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:276)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:253)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:162)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source)
at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:903)
at com.criteo.nosql.cassandra.exporter.JmxScraper$NodeInfo.getNodeInfo(JmxScraper.java:413)
at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:175)
at com.criteo.nosql.cassandra.exporter.Main.start(Main.java:38)
at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:30)

Definition for $datacenter

Definition for $datacenter getting wrong value for variable.Could you please let mw know the correct values.
Screen Shot 2019-03-14 at 10 29 34 AM

Support listen on specific address

There is no way to configure cassandra_exporter listen on specific address in current version.

I use ansible control my cluster running on aws, I want exporter listen on EC2 instance's secondary private ip address to simplify my ansible settings.

Thank you for this great project.

Embedded Cassadnra Exporter into to an existing agent

We would like to embed cassandra exporter in to our existing Cassandra agent process which does lot other things other than capturing metric. Please let me know if the feature is available, I will start work on it otherwise.

Missing table and keyspace flags in cassandra_stats

Hello there,

I am in the progress of adding the Cassandra exporter to my Cassandra cluster to measure the amount of tombstones, but unfortunately I am not getting any table and keyspace attributes along with my statistics inside the cassandra_stats object. I do see these stats show up in multiple examples among the documentation. Am I missing something or is this a bug in the later versions of Apache Cassandra?

Environment details
Cassandra version: 3.11.1
Cassandra exporter version: 2.2.0

Configuration

   host: localhost:7199
    ssl: False
    user:
    password:
    listenAddress: 0.0.0.0
    listenPort: 8080
    blacklist:
       # To profile the duration of jmx call you can start the program with the following options
       # > java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter.jar config.yml --oneshot
       #
       # To get intuition of what is done by cassandra when something is called you can look in cassandra
       # https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/metrics
       # Please avoid to scrape frequently those calls that are iterating over all sstables
       # Unaccessible metrics (not enough privilege)
       - java:lang:memorypool:.*usagethreshold.*
       # Leaf attributes not interesting for us but that are presents in many path
       - .*:999thpercentile
       - .*:95thpercentile
       - .*:fifteenminuterate
       - .*:fiveminuterate
       - .*:durationunit
       - .*:rateunit
       - .*:stddev
       - .*:meanrate
       - .*:mean
       - .*:min
       # Path present in many metrics but uninterresting
       - .*:viewlockacquiretime:.*
       - .*:viewreadtime:.*
       - .*:cas[a-z]+latency:.*
       - .*:colupdatetimedeltahistogram:.*
       # Mostly for RPC, do not scrap them
       - org:apache:cassandra:db:.*
       # columnfamily is an alias for Table metrics
       # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
       - org:apache:cassandra:metrics:columnfamily:.*
       # Should we export metrics for system keyspaces/tables ?
       - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*
       # Don't scrap us
       - com:criteo:nosql:cassandra:exporter:.*
    maxScrapFrequencyInSec:
      50:
        - .*
      # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
      3600:
        - .*:snapshotssize:.*
        - .*:estimated.*
        - .*:totaldiskspaceused:.*

Cannot retrieve the datacenter name error

Hi. I'm seeing the following error on 2.2.1 (Cassandra 2.0.11.93):

ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node

Full output:

$ java -jar cassandra_exporter-2.2.1-all.jar config.yml
[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from config.yml
[main] ERROR com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot retrieve the datacenter name information for the node
javax.management.AttributeNotFoundException: No such attribute: HostIdToEndpoint
	at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:81)
	at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
	at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:639)
	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
	at sun.rmi.transport.Transport$1.run(Transport.java:200)
	at sun.rmi.transport.Transport$1.run(Transport.java:197)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:276)
	at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:253)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:162)
	at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
	at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source)
	at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:903)
	at com.criteo.nosql.cassandra.exporter.JmxScraper$NodeInfo.getNodeInfo(JmxScraper.java:375)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:155)
	at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:36)

config.yml

host: localhost:10144
ssl: False
listenAddress: 0.0.0.0
listenPort: 9198
blacklist:
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min
maxScrapFrequencyInSec:
  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*

I'm able to query JMX successfully with jmxterm on localhost:10144.

Thanks!

Query issues for "large" clusters

Hello,

We have a ~540 node cassandra cluster that are exporting ~1500 metrics each. We're sending over 800k time series in the cassandra_stats metric namespace. This is causing a lot of issues when querying Prometheus since the index gets hit so hard. Recording rules are definitely an option, but we don't always know in advance when something should have a recording rule to perform any aggregation.

Is there a workaround for this in the current code base? If not, would you be open to exploring a change with us?

Does not look you can have multiple metrics in a queries for arismathic evaluation?

This seems to be due to one stats design: cassasndra_stats. Any way to make this work?
Example with calculating percent:
this one works (used/used):
(cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:used"} / cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:used"}) * 100
this one does not (used/max):
(cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:used"} / cassandra_stats{cluster="$cluster",datacenter="$datacenter",instance="$instance",job="cassandra",name="java:lang:memory:heapmemoryusage:max"}) * 100

prometheus yaml configuration missing

do we have prometheus yml for this ? I am able to see the metrics but I believe grafana is only configured on prometheus server . How can I access prometheus UI in this case ?

MBeanInfo parsing errors: wrong metrics type detection for C* 2.2.8

Hello,

there are numerous problems with MBeanInfo parser for Cassandra 2.2.8.
Both exporter and cassandra runtime logs are attached.

Could you confirm this behaviour with 2.x branch?
Thanks!

[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=MemoryPool,name=Compressed Class Space as it as an unknown type java.lang.String with value NON_HEAP
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=MemoryPool,name=Compressed Class Space as it as an unknown type javax.management.ObjectName with value java.lang:type=MemoryPool,name=Compressed Class Space
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse org.apache.cassandra.metrics:type=Cache,scope=RowCache,name=HitRate as it as an unknown type java.lang.Object with value NaN
[main] DEBUG com.criteo.nosql.cassandra.exporter.JmxScraper - Cannot parse java.lang:type=MemoryManager,name=Metaspace Manager as it as an unknown type [Ljava.lang.String; with value [Metaspace, Compressed Class Space]

cassandra.system.log

cassandra_exporter.debug.log

I Can't build project

Hi:
I download this project, and build by gradle like
" gradle build"

But there have some problems, the error show as :

Could not resolve all artifacts for configuration ':classpath'.
Could not download shadow.jar (com.github.jengelman.gradle.plugins:shadow:2.0.1)
> Could not get resource 'https://jcenter.bintray.com/com/github/jengelman/gradle/plugins/shadow/2.0.1/shadow-2.0.1.jar'.
> Could not GET 'https://jcenter.bintray.com/com/github/jengelman/gradle/plugins/shadow/2.0.1/shadow-2.0.1.jar'.
> Connect to d29vzk4ow07wi7.cloudfront.net:443 [d29vzk4ow07wi7.cloudfront.net/52.222.217.47, d29vzk4ow07wi7.cloudfront.net/52.222.217.83, d29vzk4ow07wi7.cloudfront.net/52.222.217.198, d29vzk4ow07wi7.cloudfront.net/52.222.217.210] failed: Read timed out

Seems the shadow-2.0.1.jar needs some dependences.But I can use URL"https://jcenter.bintray.com/com/github/jengelman/gradle/plugins/shadow/2.0.1/shadow-2.0.1.jar" download this jar.

So I don't know how to fix this. Can you tell me how to build success?

thanks a lot .

Prometheus way to filtering metrics

Hi Team,

I am able to run the cassandra_exporter and able to see the metrics in UI. Usually we write the rules in exporter.yml to filter out the metrics, where how can we pass the rules yml file to filter out.

Or can we directly import the metrics in grafana and filter out as we have more number of cassandra servers checking on this.

Question on configuration variable passing

I think its possibly a bug or my configuration mistake

I tried setting host as remote ip address

$ head -1 /app/config.yml
host: 10.42.10.34:7199

On starting it still tries to connect to localhost, here are the logs:

$ java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter-1.0.1-all.jar /app/config.yml
[main] INFO com.criteo.nosql.cassandra.exporter.Config - Loading yaml config from /app/config.yml
[main] TRACE com.criteo.nosql.cassandra.exporter.Config - com.criteo.nosql.cassandra.exporter.Config@42dafa95
[main] ERROR com.criteo.nosql.cassandra.exporter.Main - Scrapper stopped due to uncaught exception
java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
	java.net.ConnectException: Connection refused (Connection refused)
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129)
	at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source)
	at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2430)
	at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:308)
	at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
	at com.criteo.nosql.cassandra.exporter.JmxScraper.run(JmxScraper.java:104)
	at com.criteo.nosql.cassandra.exporter.Main.main(Main.java:36)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at java.net.Socket.connect(Socket.java:538)
	at java.net.Socket.<init>(Socket.java:434)
	at java.net.Socket.<init>(Socket.java:211)
	at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
	at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:148)
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)

Separated statistics per node

Hi,

thanks for your efforts of putting this together! Is there a way to get statistics per node in the cluster as well? Sometimes single nodes misbehave where a dashboard helps to quickly identify a faulty node.

Kind regards,
Christian

Metrics for nodes status (up/down)

Hello,

nodetool status reports status of every node in the cluster (from current node point of view), that can help to detect network partitions and another weird issues like this (fourth node thinks that all is ok, but another nodes disagree):

root@cassandra-0:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.48 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
DN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

root@cassandra-1:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.44 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
DN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

root@cassandra-2:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.44 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
DN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

root@cassandra-3:/# nodetool status
Datacenter: staging
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.140.80.4  947.44 GiB  96           ?       1be67d6c-5ec3-4352-874a-2cfa7b56966d  rack1
UN  10.140.81.4  1.04 TiB   96           ?       8dcbee81-6a73-4bcb-b95e-0833790394ac  rack1
UN  10.140.82.4  1.01 TiB   96           ?       4846412a-ab09-472c-a6da-10fb6834865e  rack1
UN  10.140.83.2  1.05 TiB   96           ?       72a13bd8-ae4b-4c20-833a-774b9688b264  rack1

This information is available via JMX, but corresponding MBean attribute has java.util.List type (which unsupported by exporter):

# java -jar jmxterm-1.0.0-uber.jar --url localhost:7199
Welcome to JMX terminal. Type "help" for available commands.
$>bean org.apache.cassandra.db:type=StorageService
#bean is set to org.apache.cassandra.db:type=StorageService
$>get LiveNodes
#mbean = org.apache.cassandra.db:type=StorageService:
LiveNodes = ( 10.140.80.4, 10.140.81.4, 10.140.82.4, 10.140.83.2 );

$>get UnreachableNodes
#mbean = org.apache.cassandra.db:type=StorageService:
UnreachableNodes = (  );

Is there a good way to detect issues like this?


Also, I looked on criteo/casspoke, but, if I understand it right, it can detect nodes that unavailable for client, but not inner communication issues. Looks like it can't cover this case. :(

Run with Cassandra in Container?

How is this meant to be run if you already have Cassandra in a docker container, if it can't be a javaagent like jmx_exporter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.