Comments (14)
@ewencp I'd be happy to take a stab at this. I've got the code already and it is running well so far in our prototype connect deployment. I'll just productionalize it a bit and put up a PR for discussion.
from kafka-connect-elasticsearch.
We're using secure elasticsearch on the PROD, and now we need to sink some topics on IT using Kafka connect (We've been doing it using Spark streaming).
@zzbennett I think this feature is needed .
How can i help? @ewencp @thomasdziedzic @jdsiddon @zzbennett so we can move forward with this PR .
from kafka-connect-elasticsearch.
Linking for anyone else who comes across this, but it looks like there's a PR for this now #330
The complication we ran into with trying to use Elasticsearch on AWS via the IP range restriction suggested above is that it also limits requests to the Kibana instance that AWS gives you out of the box. It might not be a big problem depending on your use-case but its worth it to note.
from kafka-connect-elasticsearch.
This would be really useful. I'm wondering how to implement this without adding dependencies on a bunch of AWS libraries that most people will not need or want on their classpath.
Perhaps with a new property that specifies the class name of a request interceptor? Then this property could be populated with the classname of an AWS request interceptor (like this one: https://github.com/inreachventures/aws-signing-request-interceptor) which adds the required AWS authentication to the ES requests. Then, if you are using AWS's ES, you can drop the required jars into your classpath and specify the request interceptor config in your ES connector config. It's a little cumbersome, am open to other ideas.
I have forked this repo in order to add AWS request signing, but I would like to contribute a solution upstream so I don't need to maintain a separate fork just for AWS's auth stuff.
from kafka-connect-elasticsearch.
@thomasdziedzic @zzbennett Definitely seems like a good idea -- I think this will be a matter of exposing a few more configs that are specific to AWS and then wiring up the auth pieces. There's an example of how to do the auth steps in this Jest issue and #77 is working on adding basic authentication support. If anyone is interested in taking a stab, I'd be happy to guide development and review a PR!
from kafka-connect-elasticsearch.
Okay, so I'm back to working on the ES connector. I've been mulling this over and although the modifications involved for supporting the AWS authentication are simple, implementing them in a "pluggable" way is somewhat trickier.
Inspired by the pluggable partitioners and formatters in the S3/HDFS connector, this is a possible solution:
Abstract the ES client logic. Currently the connector depends directly on the JestClient and the JestClientFactory. Rather than depending directly on the JestClient for executing ES requests, we could add an ESClient interface and a default implementation that will use the current JestClient logic. A config would be added containing the classname of the ESClient implementation, which would get instantiated using reflection. Most people would use the default for this config, but for people needing the AWS auth (or any kind of special logic around querying ES), they could plop an implementation of the ESClient on their classpath that provides the AWS authentication and change the ESClient classname config. The downsides are it requires a new config that most people won't need to touch, and handling pluggability this way can get a bit unwieldy. It does give users complete control over how the connector queries ES, which could be useful, like if they are doing something fancy like routing to different ES clusters.
Honestly though, for this particular issue it might make more sense to stand up a reverse proxy that will handle the authentication. AWS's ES can do IP based access control, so you could just set up a vanilla nginx reverse proxy and whitelist its IP. Or you could set the proxy up with this.
I guess it boils down to whether it is worth it to abstract the ESClient or not. If the ESClient abstraction makes sense for purposes besides AWS authentication, then handling authentication that way could be easier, otherwise, the reverse proxy is probably the way to go.
from kafka-connect-elasticsearch.
Has there been any updates on this issue?
from kafka-connect-elasticsearch.
Since creating this issue AWS released VPC based Elasticsearch clusters, which don't require the auth signing of requests so there isn't as much of a need for this feature anymore.
from kafka-connect-elasticsearch.
@elarib What do you mean exactly by secure Elasticsearch? Is your Elasticsearch cluster deployed in AWS? And if so, is it deployed in a VPC? Or do you access it over the public internet
from kafka-connect-elasticsearch.
@zzbennett Yesterday, i created a pull request with a description of this Use case: #185
There is some use case to secure ES so we can have multitenancy capability, using ES xPack or Searchguard.
from kafka-connect-elasticsearch.
#216 implements basic auth via the JEST client. Does that satisfy this request? If so, we can close this issue.
from kafka-connect-elasticsearch.
Anyone working on a PR for this? Planning to do so myself if not...
We have a company policy that requires signing as per https://docs.aws.amazon.com/general/latest/gr/signature-v4-examples.html
Perhaps a fork specific to AWS elastic search to avoid adding AWS dependencies generally to this connector? Seems a bit heavyweight either way..
from kafka-connect-elasticsearch.
Hi,
about adding AWS specific support for security I do agree with @joncourt approach here. AWS has a lot of specific options (including necessary dependencies) that are very specific for AWS.
Important bit, as already commented out by @joncourt, is that you should issue Signature Version 4 signed requests, basically wrapping all your interaction with the search engine. This operation is of no benefit for any other Elasticsearch installation.
Access control is done with IAM policies, basically allowing or denying HTTP verbs against Resources. This policies let you authorise based on identity but as well on source, etc. This is where both the Signature and the policies take the work of doing the authorisation, at less to my understanding.
From their blog:
A note about authentication, which applies to both types of policies: you can use two strategies to authenticate Amazon ES requests. The first is based on the originating IP address. You can omit the Principal from your policy and specify an IP Condition. In this case, and barring a conflicting policy, any call from that IP address will be allowed access or be denied access to the resource in question. The second strategy is based on the originating Principal. In this case, you are required to include information that AWS can use to authenticate the requestor as part of every request to your Amazon ES endpoint, which you accomplish by signing the request using Signature Version 4. Later in this post, I provide an example of how you can sign a simple request against Amazon ES using Signature Version 4.
I would recommend doing it in a way where people not using AWS does not have to carry a heavy way of AWS deps, for example using a fork.
As well we should not forget that Elasticsearch has support for the security x-packs, this is another way of adding security on top of it as well, but not just that, a fewer people but as well people use https://search-guard.com/ as security solution for elasticsearch.
All of this calls for me for a solution that is portable and let people use their module for security and auth.
I hope it makes sense.
from kafka-connect-elasticsearch.
Hello All,
A bit curious. I am trying to pull data out of AWS MSK via connector to AWS ES. Can anyone throw some light as to how I can configure the signer or any other way to index to AWS ES.
PS : AWS MSK i am able to connect, just want some help to index to ES.
from kafka-connect-elasticsearch.
Related Issues (20)
- ERROR Failed to create client to verify connection (Invalid or missing build flavor [oss]) HOT 1
- Mapping of topic to specific index in elastic-sink connector HOT 6
- [BUG] connector crash without a legit reason [type:mapper_parsing_exception][reason: array_index_out_of_bounds_exception Index -1 out of bounds for length 0] HOT 1
- kafka-connect-elasticsearch error message Failed to execute the bulk request HOT 3
- Restriction of Data Stream Type
- How to Convert JSON String field to ES Object?
- Capture Kafka key without using it as ID HOT 2
- Suggestion for INSERT operation "Ignoring EXTERNAL version conflict for operation INDEX on document"
- Used Elastic Java REST client is deprecated in 7.15.0 HOT 1
- Error with `"behavior.on.null.values": "delete"`
- Consumer paused indefinitely when using `AsyncOffsetTracker` with lot of null values
- Cannot use data stream with time_series mode HOT 2
- Error: Cannot infer mapping without schema HOT 1
- Connector fails with payloads >20 MB HOT 1
- Can't create a connector even if its loaded in Strimzi
- Support requests per second configuration options
- Log when there are too many requests errors
- [BUG] `TOO_MANY_REQUESTS` error craches the tasks with a unrecoverable exceptions without retries
- Ignore 'document_parsing_exception' HOT 1
- Inconsistent Logging for Tombstone Messages in Elastic Sink Connector
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-elasticsearch.