Giter Site home page Giter Site logo

kenniy / pnp-databricks-monitoring Goto Github PK

View Code? Open in Web Editor NEW

This project forked from santiagxf/pnp-databricks-monitoring

0.0 0.0 0.0 57 KB

This sample shows how to stream Databricks metrics to Azure Monitor (log analytics) workspace

Shell 1.86% Java 79.16% Scala 18.98%

pnp-databricks-monitoring's Introduction

pnp-databricks-monitoring

Azure Databricks is based on Apache Spark, and both use log4j as the standard library for logging. In addition to the default logging provided by Apache Spark, this pattern and practice sends logs and metrics to Azure Log Analytics. To achieve that, we need to deploy custom handlers for the logging events. While the Apache Spark logger messages are strings, Azure Log Analytics requires log messages to be formatted as JSON. The com.microsoft.pnp.log4j.LogAnalyticsAppender class transforms these messages to JSON.

Referenced architecture: https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/data/stream-processing-databricks

Configuration:

You require the Log Analytics workspace ID and primary key. The workspace ID is the workspaceId value from the Log Analytics resource in Azure. The primary key is the secret the resource specified in order to inteact with the service.

To configure log4j logging, open log4j.properties. Edit the following two values and save the file. We will use it later.

log4j.appender.A1.workspaceId=[Log Analytics workspace ID]
log4j.appender.A1.secret=[Log Analytics primary key]

To configure custom logging, open metrics.properties. Edit the following two values and save the file. We will use it later.

*.sink.loganalytics.workspaceId=[Log Analytics workspace ID]
*.sink.loganalytics.secret=[Log Analytics workspace ID]

Build the .jar files for the Databricks job and Databricks monitoring

We need to specified a way to convert the logs from the log4j format to the one Azure is expecting. We use a JAR module to achieve so. Use your Java IDE to import the Maven project file named pom.xml located in the root directory. Perform a clean build. The output of this build is files named azure-databricks-monitoring-0.9.jar. If you want to skip this, a prebuilt jar can be found in the built directory I created for your convenience. Version used in this case was JRE 1.8.0_191 with Maven 3.6.0.

Configure custom logging for the Databricks job

Copy the azure-databricks-monitoring-0.9.jar file to the Databricks file system by entering the following command in the Databricks CLI:
databricks fs cp --overwrite azure-databricks-monitoring-0.9.jar dbfs:/azure-databricks-job/azure-databricks-monitoring-0.9.jar

Copy the custom logging properties from metrics.properties to the Databricks file system by entering the following command:

databricks fs cp --overwrite metrics.properties dbfs:/azure-databricks-job/metrics.properties

Copy the initialization script from spark.metrics to the Databricks file system by entering the following command:

databricks fs cp --overwrite spark-metrics.sh dbfs:/databricks/init/[cluster-name]/spark-metrics.sh

Create a Databricks cluster

Below the Auto Termination dialog box, click on Init Scripts. Enter dbfs:/databricks/init/[cluster-name]/spark-metrics.sh, substituting the cluster name created

pnp-databricks-monitoring's People

Contributors

santiagxf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.