Giter Site home page Giter Site logo

howardwong-db / gangliaexport Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 1.0 154 KB

GangliaExport is a lightweight package provides an alternative way to monitor Databricks cluster utilization using Ganglia Web Service on the driver node of each cluster. The Ganglia metrics can be exported to any Spark datasource format which can be used to analyze cluster usages to avoid costly idle computing.

Home Page: https://github.com/howardwong-db/GangliaExport

Scala 100.00%
databricks ganglia monitoring scala spark

gangliaexport's Introduction

GangliaExport

This lightweight package provides an alternative way to monitor Databricks cluster utilization using Ganglia Web Service on the driver node of each cluster. The Ganglia metrics can be exported to any Spark datasource format which can be used to analyze cluster usages to avoid costly idle computing.

  • Metrics can be saved in cloud storage in any Spark supported format including Delta and use tools like DB SQL to monitor performance
  • No other monitoring dependencies which reduces complexity
  • Ideal for job optimization monitoring for automated job clusters

See this link for info on the Ganglia api service and xml schema.

Other options to monitor cluster utilization:

  • Monitor Ganglia while a job is running or look at Ganglia snapshots in each cluster manually
  • Install init scripts to send metrics to Datadog or Log Analytics as noted in Manage Clusters

Note:

  • UC clusters 11.3+ causes Missing Credential Scope when running on another thread using scheduled executors. (Fix/TODO)
  • Ganglia will not be avaliable on DBR 13+. This will be replaced with system tables via cluster metrics.

Support

Spark Version Scala Version Supported
Spark 2.x 2.11 yes
Spark 3.x 2.12 yes

See build file for specific Spark versions. Python support is TBD

Build

Package

sbt +package

Uber jar

sbt +assembly

Test

sbt +test

Usage

Load the ganglia export assembly lib into the all purpose cluster or add as a dependency lib in your automated job

Collect Metrics

Add the following code to your Spark application or notebook:

import com.databricks.gangliaexport.GangliaExport
val f = GangliaExport.exportMetrics(spark,"delta", Map.empty[String,String],"/tmp/[email protected]/gangliametricstest", 30)
//...

f.cancel(true) 

Or Collect as DataFrame and save:

val gc = new GangliaClient()
var df = GangliaExport.collectMetrics(spark, gc)
df.write.format("csv").partitionBy("reportingDate").mode("append").save("/tmp/testexport")

Read Metrics

val metrics = spark.read.format("delta").load("/tmp/[email protected]/gangliametricstest")

Schema:

root
 |-- clusterName: string (nullable = true)
 |-- clusterId: string (nullable = true)
 |-- hostName: string (nullable = true)
 |-- hostIp: string (nullable = true)
 |-- reportingTime: long (nullable = true)
 |-- metricName: string (nullable = true)
 |-- metricType: string (nullable = true)
 |-- metricUnits: string (nullable = true)
 |-- metricVal: double (nullable = true)
 |-- reportingDate: date (nullable = true)

Use a sample notebook to try this.

Sample Queries

Create a temporary view table

val metrics = spark.read.format("delta").load("/tmp/[email protected]/gangliametricstest")
metrics.createOrReplaceTempView("metrics")

CPU Load

%sql
select
  from_unixtime(reportingTime) as reportingTime,
  metricName,
  (100 - metricVal) as cpupercent
from
  metrics
where
  metricName = "cpu_idle"
  and clusterName = 'exporttest'
order by
  reportingTime

CPU Usage - Chart using series grouping by meticName and avg(metricVal)

%sql
select
  from_unixtime(reportingTime) as reportingTime,
  clusterName,
  metricName,
  metricVal
from
  metrics
where
  metricName like 'cpu%'
  and metricUnits = '%'
  and clusterName = 'exporttest'
  and reportingDate BETWEEN DATE '2020-01-01'
  AND DATE '2021-01-31'
order by
  reportingTime,
  clusterName,
  metricName

CPU Usages - Chart using pivot (metricVal rows to columns)

%sql
select
  *
from
  (
    select
      from_unixtime(reportingTime) as reportingTime,
      clusterName,
      metricName,
      metricVal
    from
      metrics
    where
      metricName like 'cpu%'
      and metricUnits = '%'
      and clusterName = 'exporttest'
      and reportingDate BETWEEN DATE '2020-01-01' AND DATE '2021-01-31'
  ) pivot (
    avg(metricVal) for metricName in (
      'cpu_idle' cpu_idle,
      'cpu_system' cpu_system,
      'cpu_user' cpu_user,
      'cpu_nice' cpu_nice
    )
  )
order by
  reportingTime

Memory Usage - Chart using pivot

%sql
select
  *
from
  (
    select
      from_unixtime(reportingTime) as reportingTime,
      clusterName,
      metricName,
      metricVal
    from
      metrics
    where
      metricName like 'cpu%'
      and metricUnits = '%'
      and clusterName = 'exporttest'
      and reportingTime BETWEEN DATE '2020-01-01' AND DATE '2021-01-31'
  ) pivot (
    avg(metricVal) for metricName in (
      'cpu_idle' cpu_idle,
      'cpu_system' cpu_system,
      'cpu_user' cpu_user,
      'cpu_nice' cpu_nice
    )
  )
order by
  reportingTime

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.