Comments (13)
Any updates on this issue?
from kafka-connect-elasticsearch.
Is there any update on this enhancement?
from kafka-connect-elasticsearch.
I'm also looking for this support as it's going to dramatically improve the ElasticSearch queries that we are currently running.
from kafka-connect-elasticsearch.
Why would passing the pipeline as a parameter be the desired approach rather than setting index.default_pipeline on the ES side of things?
You might need to apply different pipeline based on the logs type you're trying to index.
If i have, say, a topic "es-logs", which then i use to index documents in the index "es-prod-logs" through the connector and it contains both system and apache logs i would like to be able to get Apache logs parsed by the default filebeat apache pipeline, meanwhile System logs are getting parsed by the default filebeat syslog system pipeline.
That would be useful as Filebeat has a metadata field which specifies the ingest pipeline to use (https://www.elastic.co/guide/en/logstash/current/use-ingest-pipelines.html)
As a workaround you might want to have filebeat conditionally sending logs to a different topic.
If logs are from Apache folder logs then send them to the "elk_apache_logs" topic, which will then send them to the "elk_apache_logs" index which has the "index.default_pipeline" setting set to the Filebeat Apache Pipeline.
Same would apply for System logs and so on.
from kafka-connect-elasticsearch.
Why would passing the pipeline as a parameter be the desired approach rather than setting index.default_pipeline on the ES side of things?
from kafka-connect-elasticsearch.
Given that starting point, I can see the use case.
from kafka-connect-elasticsearch.
I think the issue is mostly because of Jest for now, as it does not support that.
from kafka-connect-elasticsearch.
Hello, we also need this feature. Since this elasticsearch connector is based on the Bulk API (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html), maybe it's possible to add an option in the configuration file to pass user defined url parameters ?
For example, while putting the data to elasticsearch, it will call http://<elasticsearh_url>/_bulk?pipeline=some_pipeline
, where the parameters come from an option defined in the properties file: bulk.url.parameters=?pipeline=some_pipeline
. By default, this option should be empty.
This is the basic idea, then maybe the parameters should be better defined with a map key-value pairs.
from kafka-connect-elasticsearch.
Is any update of this feature? We need it in our production environment too, thanks.
from kafka-connect-elasticsearch.
I see few commits on this issue and also code that provide pipeline config parameter. Does it work? I'm trying to specify pipeline in my sink conf file but nothing happens.
from kafka-connect-elasticsearch.
We also require this option in production. The only other viable workaround is to ditch this connector and use Logstash to consume messages from Apache Kafka, adding another potential point of failure and making our Elasticsearch deployment more complicated on scale.
Pipelines are an integral part of processing data using Elasticsearch, please consider merging the branches above.
from kafka-connect-elasticsearch.
I see few commits on this issue and also code that provide pipeline config parameter. Does it work? I'm trying to specify pipeline in my sink conf file but nothing happens.
There was a merge request (almost a year ago) but seems like master doesn't contain any related changes in config code:
ElasticsearchSinkConnectorConfig.java
And it's strange because this feature is requested by many users for a long time.
from kafka-connect-elasticsearch.
The lack of this feature is a deciding factor in use of Logstash over kafka-connect-elasticsearch.
As mentioned by agi0rgi, being able to apply a pipeline at runtime based on the topic, key, or message attributes is a common need for ingest processes.
from kafka-connect-elasticsearch.
Related Issues (20)
- How to Convert JSON String field to ES Object?
- Capture Kafka key without using it as ID HOT 2
- Suggestion for INSERT operation "Ignoring EXTERNAL version conflict for operation INDEX on document"
- Used Elastic Java REST client is deprecated in 7.15.0 HOT 1
- Error with `"behavior.on.null.values": "delete"`
- Consumer paused indefinitely when using `AsyncOffsetTracker` with lot of null values
- Cannot use data stream with time_series mode HOT 2
- Error: Cannot infer mapping without schema HOT 1
- Connector fails with payloads >20 MB HOT 1
- Can't create a connector even if its loaded in Strimzi
- Support requests per second configuration options
- Log when there are too many requests errors
- [BUG] `TOO_MANY_REQUESTS` error craches the tasks with a unrecoverable exceptions without retries
- Ignore 'document_parsing_exception' HOT 1
- Inconsistent Logging for Tombstone Messages in Elastic Sink Connector
- abnormal data loss question
- Data Stream naming is far too restrictive HOT 1
- Creating index based on Timestamp doesn't work
- Limit retry backoff (and unlimited retries) HOT 4
- add support for index templates other than logs and metrics as types when using data streams
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-elasticsearch.