Giter Site home page Giter Site logo

datadog / datadog-kafka-connect-logs Goto Github PK

View Code? Open in Web Editor NEW
5.0 236.0 11.0 411 KB

A plugin for Kafka Connect to send Kafka records as logs to Datadog.

Home Page: https://www.confluent.io/hub/datadog/kafka-connect-logs

License: Apache License 2.0

Java 100.00%
kafka-connect datadog datadog-logs datadog-api kafka kafka-consumer logs confluent confluent-kafka confluent-platform

datadog-kafka-connect-logs's Introduction

Datadog Kafka Connect Logs

datadog-kafka-connect-logs is a Kafka Connector for sending records from Kafka as logs to the Datadog Logs Intake API.

It is a plugin meant to be installed on a Kafka Connect Cluster running besides a Kafka Broker.

Requirements

  1. Kafka version 1.0.0 and above.
  2. Java 8 and above.
  3. Confluent Platform 4.0.x and above (optional).

To install the plugin, one must have a working instance of Kafka Connect connected to a Kafka Broker. See also Confluent's documentation for easily setting this up.

Installation and Setup

Install from Confluent Hub

See Confluent's documentation and the connector's page on Confluent Hub.

Download from Github

Download the latest version from the GitHub releases page. Also see Confluent's documentation on installing community connectors.

Build from Source

  1. Clone the repo from https://github.com/DataDog/datadog-kafka-connect-logs
  2. Verify that Java8 JRE or JDK is installed.
  3. Run mvn clean compile package. This will build the jar in the /target directory. The name will be datadog-kafka-connect-logs-[VERSION].jar.
  4. The zip file for use on Confluent Hub can be found in target/components/packages.

Quick Start

  1. To install the plugin, place the plugin's jar file (see previous section on how to download or build it) in or under the location specified in plugin.path . If you use Confluent Platform, simply run confluent-hub install target/components/packages/<connector-zip-file>.
  2. Restart your Kafka Connect instance.
  3. Run the following command to manually create connector tasks. Adjust topics to configure the Kafka topic to be ingested and set your Datadog api_key.
  curl localhost:8083/connectors -X POST -H "Content-Type: application/json" -d '{
    "name": "datadog-kafka-connect-logs",
    "config": {
      "connector.class": "com.datadoghq.connect.logs.DatadogLogsSinkConnector",
      "datadog.api_key": "<YOUR_API_KEY>",
      "tasks.max": "3",
      "topics":"<YOUR_TOPIC>",
    }
  }'    
  1. You can verify that data is ingested to the Datadog platform by searching for source:kafka-connect in the Log Explorer tab
  2. Use the following commands to check status, and manage connectors and tasks:
    # List active connectors
    curl http://localhost:8083/connectors

    # Get datadog-kafka-connect-logs connector info
    curl http://localhost:8083/connectors/datadog-kafka-connect-logs

    # Get datadog-kafka-connect-logs connector config info
    curl http://localhost:8083/connectors/datadog-kafka-connect-logs/config

    # Delete datadog-kafka-connect-logs connector
    curl http://localhost:8083/connectors/datadog-kafka-connect-logs -X DELETE

    # Get datadog-kafka-connect-logs connector task info
    curl http://localhost:8083/connectors/datadog-kafka-connect-logs/tasks

See the the Confluent documentation for additional REST examples.

Configuration

After Kafka Connect is brought up on every host, all of the Kafka Connect instances will form a cluster automatically. A REST call can be executed against one of the cluster instances, and the configuration will automatically propagate to all instances in the cluster.

Parameters

Required Parameters

Name Description Default Value
name Connector name. A consumer group with this name will be created with tasks to be distributed evenly across the connector cluster nodes.
connector.class The Java class used to perform connector jobs. Keep the default unless you modify the connector. com.datadoghq.connect.logs.DatadogLogsSinkConnector
tasks.max The number of tasks generated to handle data collection jobs in parallel. The tasks will be spread evenly across all Datadog Kafka Connector nodes.
topics Comma separated list of Kafka topics for Datadog to consume. prod-topic1,prod-topic2,prod-topic3
datadog.api_key The API key of your Datadog platform.

General Optional Parameters

Name Description Default Value
datadog.site The site of the Datadog intake to send logs to (for example 'datadoghq.eu' to send data to the EU site) datadoghq.com
datadog.url Custom Datadog URL endpoint where your logs will be sent. datadog.url takes precedence over datadog.site. Example: http-intake.logs.datadoghq.com:443
datadog.tags Tags associated with your logs in a comma separated tag:value format.
datadog.service The name of the application or service generating the log events.
datadog.hostname The name of the originating host of the log.
datadog.proxy.url Proxy endpoint when logs are not directly forwarded to Datadog.
datadog.proxy.port Proxy port when logs are not directly forwarded to Datadog.
datadog.retry.max The number of retries before the output plugin stops. 5
datadog.retry.backoff_ms The time in milliseconds to wait following an error before a retry attempt is made. 3000
datadog.add_published_date Valid settings are true or false. When set to true, The timestamp is retrieved from the Kafka record and passed to Datadog as published_date

Troubleshooting performance

To improve performance of the connector, you can try the following options:

  • Update the number of records fetched per poll by setting consumer.override.max.poll.records in the plugin configuration. This plugin sends batches of records synchronously with each poll so a low number of records per poll will reduce throughput. Consider setting this to 500 or 1000.
  • Increase the number of parallel tasks by adjusting the tasks.max parameter. Only do this if the hardware is underutilized, such as low CPU, low memory usage, and low data injection throughput. Do not set more tasks than partitions.
  • Increase hardware resources on cluster nodes in case of resource exhaustion, such as high CPU, or high memory usage.
  • Increase the number of Kafka Connect nodes.

Single Message Transforms

Kafka Connect supports Single Message Transforms that let you change the structure or content of a message. To experiment with this feature, try adding these lines to your sink connector configuration:

transforms=addExtraField
transforms.addExtraField.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.addExtraField.static.field=extraField
transforms.addExtraField.static.value=extraValue

Now if you restart the sink connector and send some more test messages, each new record should have a extraField field with value value. For more in-depth video, see confluent's documentation.

Testing

Unit Tests

To run the supplied unit tests, run mvn test from the root of the project.

System Tests

We use Confluent Platform for a batteries-included Kafka environment for local testing. Follow the guide here to install the Confluent Platform.

Then, install the Confluent Kafka Datagen Connector to create sample data of arbitrary types. Install this Datadog Logs Connector by running confluent-hub install target/components/packages/<connector-zip-file>.

In the /test directory there are some .json configuration files to make it easy to create Connectors. There are configurations for both the Datagen Connector with various datatypes, as well as the Datadog Logs Connector. To the latter, you will need to add a valid Datadog API Key for once you upload the .json to Confluent Platform.

Once your connectors are set up, you will be able to test them and see relevant data. For performance tests, you can also use the following command packaged with Confluent platform:

kafka-producer-perf-test --topic perf-test --num-records 2000000 --record-size 100 --throughput 25000 --producer-props bootstrap.servers=localhost:9092 --print-metrics true

License

Datadog Kafka Connect Logs is licensed under the Apache License 2.0. Details can be found in the file LICENSE.

datadog-kafka-connect-logs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datadog-kafka-connect-logs's Issues

bytes conversion to decimal type

We have kafka messages specified based on Avro schema. Some fields are defined as bytes but having Decimal as the logical type, like this:

{"name": "myField", "type": ["null", {"type": "bytes", "logicalType": "decimal", "precision": 11, "scale": 2}], "default": null, "doc": "BigDecimal with precision 11 scale 2"},

When messages imported into datadog as logs, it appears that myField are set as bytes instead of its logical type Decimal.

Am I missing some configuration, or is this currently not supported?

Truncate messages if compressed size exceeds 5MB

The Logs Intake API has a limit for the total size of a payload (5MB). We should figure out a logic to truncate those messages while retaining their JSON structure so they can be ingested properly.

Example of structure:

[
  {
    "ddsource": "kafka-connect",
    "message": "hello"
  },
  {
    "ddsource": "kafka-connect",
    "message": "world"
  }
]

Change HTTP Connection to open at startup and close on shutdown

Currently, the connector opens and closes an HTTP connection or every batch it sends. To reduce network load, we can change the logic to open a connection once and keep it alive until either:

a) The task stops
b) The connect worker stops

Special care needs to be given so that the connection is closed even in error states to avoid resource leaks.

Implement exactly-once support

Kafka Connect allows sink connectors to optionally implement exactly once behavior
by tracking in the external system the offsets for each topic partition. Most users
would prefer exactly once if given the choice, so where possible and where feasible
sink connectors should implement this behavior.
To handle exactly-once semantics for message delivery, the Source Connector must
correctly map the committed offsets to the Kafka cluster with some analog within
the source data system, and then handle the necessary rewinding should messages
need to be re-delivered. For example, consider a trivial Source connector that
publishes the lines from an input file to a Kafka topic one line at a time ... prefixed by
the line number. The commit* methods for that connector would save the line
number of the posted record ... and then pick up at that location upon a restart.

Implement specialized logic for different HTTP error codes

The Logs Intake API returns different error codes for different situations:

413: For messages exceeding the 5MB limit
408: If the connection times out.

We should check if with some error codes, a simple retry is not the correct way to go and instead there could be logic to handle them better.

Hardcoded URL doesn't allow to send logs to different endpoint like EU

Describe what happened:
I am using this datadog kafka connector but my datadog system is operating in EU

Describe what you expected:
To be able to configure the URL from parameters in case a different URL than the US one is wanted to be used.

Steps to reproduce the issue:
Trying to send logs to the EU datadog endpoint

Additional environment details (Operating System, Cloud provider, etc):

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.