databrickslabs / splunk-integration Goto Github PK

View Code? Open in Web Editor NEW

26.0 7.0 17.0 74.2 MB

Databricks Add-on for Splunk

Home Page: https://splunkbase.splunk.com/app/5416/

License: Other

Python 99.61% CSS 0.08% HTML 0.09% Ruby 0.01% JavaScript 0.08% C 0.13%

databricks databricks-notebooks splunk-addon cybersecurity

splunk-integration's People

Contributors

Stargazers

Watchers

Forkers

alexott cjwarner41 osuleiman zaferbil keerthanak-18 nskselva duanshuaimin shasidhar sunibhaikc madhavpandya-cds jchristov fatemag-crest jeffrodriguez dhruvilbhatt-crest budrobundy69 metrocavich luasampaio

splunk-integration's Issues

Splunk add-on doesn't turn on SQL warehouse - it needs to have a stable sql warehouse turn on

Hello team.

As Splunk DB connect doesn't supports SQL warehouse endpoint, the next option to take is using splunk add on tool: https://splunkbase.splunk.com/app/5416 but to use this option, the SQL warehouse endpoint must be turn on every single time I will need to fetch a data, for this issue, is there a way to trigger a turn on action, as you will do in splunk db connect https://splunkbase.splunk.com/app/2686 ?

Customer is not willing to use splunk add on if they need to get this sql gateway resource turn on every single time, any way to mimic how splunk db connect works with all purpose cluster, to allow sql warehouse endpoint?

Inconsistent/non-standard logging

According to the known issues section of the documentation the logging for the add-on is located within var/log/splunk/ta_databricks.log and var/log/TA-Databricks/<command_name>command.log. This is inconsistent with standard Splunk apps/add-on, as they should log under /var/log/splunk with a suitable filename to indicate the source (i.e., ta_databricks) and any subcomponent as required (as an example, ta_databricks_.log).

The logging format should also match that of the standard Splunk logs so that they are automatically ingested and processed correctly. Also, the documentation states that indistinct/unclear error messages may be displayed within the UI, which are not helpful to analysts who encounter them. A suitable/useful error message should always be provided in the UI to aid in troubleshooting, rather than having to inspect the logs each time there is a failure.

Version 1.2 on Splunkbase not working with Splunk Cloud Classic Version: 9.0.2303.201

We updated our add-on and the databricksquery command no longer works (via PAT or Azure Service Principal authentication). I can't find any clues in _internal related to the python error. The search log only shows:

07-28-2023 10:24:26.355 INFO  ServerConfig [1401872 searchOrchestrator] - Will add app jailing prefix /opt/splunk/bin/nsjail-wrapper for TA-Databricks
07-28-2023 10:24:26.355 INFO  ChunkedExternProcessor [1401872 searchOrchestrator] - Running process: /opt/splunk/bin/nsjail-wrapper /opt/splunk/bin/python3.7 /opt/splunk/etc/apps/TA-Databricks/bin/databricksquery.py
07-28-2023 10:24:27.238 INFO  ChunkedExternProcessor [1401872 searchOrchestrator] - Custom search command is a generating command.
07-28-2023 10:24:27.238 WARN  ChunkedExternProcessor [1401872 searchOrchestrator] - Error adding inspector message: invalid level or message already exists
.......
07-28-2023 10:24:27.329 ERROR ChunkedExternProcessor [1401944 phase_1] - Error in 'databricksquery' command: External search command exited unexpectedly with non-zero error code 1.

Also, version 1.2 does not appear to have been committed to this repo.
https://github.com/databrickslabs/splunk-integration/blob/8389c72498825c9bb9306e2b20fe33bfee209e35/app/app.manifest#L8C21-L8C21

Databricks SQL endpoint

I see this integration addon allows users to query Databricks from Splunk. Does it support Databricks SQL endpoints as well as interactive clusters?

Remove support for Python 2

Python 2 is already end of life, and won't receive security upgrades, etc. This removal will allow to use make code more readable & maintainable by supporting new features of Python 3

Splunk DB connect fails to connect to Databricks

Following guide Splunk DB Connect Guide for Databricks but when creating a connection to Databricks and trying to test it in Data Lab >> SQL Explorer, I'm getting error "Cannot get schemas"

This is the combo used (tried others as well but the error thrown is always the same)
DB connect: 3.12.2
JDBC driver: 2.6
JAVA: JRE 11

Connection fails with the following errors in splunk/var/log/splunk/splunk_app_db_connect_server.log

ERROR c.s.d.s.a.s.d.impl.DatabaseMetadataServiceImpl - Unable to get schemas metadata
[...]
.ErrorPropagationThriftHandler:runSafely:ErrorPropagationThriftHandler.scala:119], errorCode:0, errorMessage:Configuration CONNECTION_TYPE is not available.).

Authenticate via Azure Service Principal instead of Databricks PAT token

At present to authenticate a PAT token must be used which raises security contents to its usage (and potential misuse). Being able to use SPNs would improve security and traceability while making the situation easier from a compliance perspective.

Multiple cluster support appears missing

According to the setup documentation it reads as if only a single cluster can be defined within the platform at any given time. As it is feasible for multiple clusters to exist, and there to be a desire to search multiple clusters by specifying the 'cluster=' parameter shown in the screenshots, this functionality should be added. If this functionality already exists, the documentation should be updated to clearly state that multiple clusters can be added and leveraged at the same time, including screenshots for reference (to avoid any confusion).

db connect guide for databricks is outdated

the instructions are only valid for JDBC driver version 2.6.22 and earlier. After this the class paths have changed so db_connection_types.conf needs to reflect those changes

Elevated permissions required for general usage

According to the custom commands section of the documentation a user requires either 'admin_all_objects' or 'list_storage_passwords' to use the add-on. From a security perspective neither permission is viable as the first provides a user with full admin privileges on the platform, while the second allows a user to see all stored passwords for apps/add-ons they have access to.

This requirement prevents this app being used in the majority of environments, and really needs to be rewritten to use proper access control that doesn't reveal credentials to non-admins. While an admin should be able to see (and change) the configuration of any defined cluster, a normal user should only have access to clusters that share the same role (i.e., databricks_cluster_xxxxxx), similar to the functionality that DB Connect provides.

Add support for OAuth authentication on AWS

Databricks now supports:

OAuth for Databricks service principals: https://docs.databricks.com/dev-tools/auth.html#oauth-m2m-auth
OAuth for Databricks users: https://docs.databricks.com/dev-tools/auth.html#oauth-u2m-auth (need to validate if it could be done for Splunk extension)

Lack of examples / custom commands for retrieving list of notebooks / notebook parameters / job id's

There doesn't appear to be a documented approach to retrieving the allowed list of notebooks (and their parameters) and any job id's from the Databricks platform via any of the custom commands. From an integration perspective the ability to query these so that concise dashboards (and field verifications for example) can be provided to the end user.

Without these there is significant manual effort required to provide a usable front-end within Splunk, especially if there are frequent changes on the Databricks side. If they don't already exist, new custom commands to retrieve details of notebooks / jobs would be beneficial to this end.

Misleading sentense in line 75 of https://github.com/databrickslabs/splunk-integration/blob/master/notebooks/source/pull_from_splunk.py

An Azure Databricks customer requested the following correction in line 75 of https://github.com/databrickslabs/splunk-integration/blob/master/notebooks/source/pull_from_splunk.py:

Instead of "pushing data to splunk" we should read "pulling data from splunk"

If you confirm the issue please proceed with the requested correction.

Splunk DB Connect JDBC connection using token identity

hey @metrocavich @alexott

Looking at guide Configuring Splunk DB Connect App For Databricks it looks like the integration only supports configuring an identity using Username/Password.

From within Splunk DB Connect, navigate to the Configuration > Databases > Identities tab and click New Identity.
Fill in the appropriate details:
Identity Name: Unique name of the identities
Username: Enter your Databricks Email/Username and password that you use for the Databricks instance that you want to connect to.
Note: Ensure that the database user has sufficient access to the data you want to search. For example, you might create a database user account whose access is limited to the data you want Splunk Enterprise to consume.
Password: Enter the password for the user you entered in the Username field.

Is there an option to use a "Token Identity" to connect instead of "Basic Identity" (user/pass) ?
I have encountered cases when certain environments only employ token-based authentication methods, as opposed to the traditional username and password credentials.

Parameter check required for databricksquery command

As shown in the integration screenshot (databricksquery.png) a user can specify the command_timeout parameter to override how long a search can run for. As this has the potential to create a negative performance impact on both environments a maximum allowable value should be a configuration option to prevent users from setting a value too high. In the event of a user trying to go past this, the maximum value should be used instead.

Limitation in the number of results returned

According to the limitations section of the documentation the databricksquery custom command has a limit on the number of results that will be returned (though the limit doesn't appear defined). This limitation (stated as being part of the API) is an inhibitor to adoption as it means results cannot be relied upon as queries that may return a larger number of results may be truncated.

If this data is being used for security purposes then this truncation of results could create blind-spots in detections. Any query being performed should return either the full number of results for the query or a limited number based on a defined configuration parameter (to prevent billions of results being returned for example).

Support for Databricks SQL Analytics

Currently, we can run jobs using the Databricks add-on for splunk using Notebooks and pass parameters using the databricksjob command.
It doesn't support the SQL analytics features such as queries, alerts and dashboards that can be used in Databricks workflows. It would be great to add those to the add-on.

Splunk Cloud python import errors - ChunkedExternProcessorStderrLogger stderr import databricks_common_utils as utils

As of mid-October (don't have the exact date), the databricksquery command stopped working in Splunk cloud.

Errors from search job inspector below.

Could this be related to #9 and Splunk's removal of Python 2?

11-02-2021 16:56:27.691 ERROR ChunkedExternProcessor [35699 searchOrchestrator] - Error in 'databricksquery' command: External search command exited unexpectedly with non-zero error code 1.
11-02-2021 16:56:27.691 INFO  ScopedTimer [35699 searchOrchestrator] - search.optimize 0.656035955
11-02-2021 16:56:27.691 INFO  SearchPhaseGenerator [35699 searchOrchestrator] - Failed to create phases using AST:Error in 'databricksquery' command: External search command exited unexpectedly with non-zero error code 1.. Falling back to 2 phase mode.
11-02-2021 16:56:27.691 INFO  SearchPhaseGenerator [35699 searchOrchestrator] -  Executing two phase fallback for the search=| databricksquery query="SELECT * FROM silver.ProcessRollup2 LIMIT 1"
11-02-2021 16:56:27.691 INFO  SearchParser [35699 searchOrchestrator] - PARSING: | databricksquery query="SELECT * FROM silver.ProcessRollup2 LIMIT 1"
11-02-2021 16:56:27.691 INFO  ServerConfig [35699 searchOrchestrator] - Will add app jailing prefix /opt/splunk/bin/nsjail-wrapper for TA-Databricks
11-02-2021 16:56:27.691 INFO  ChunkedExternProcessor [35699 searchOrchestrator] - Running process: /opt/splunk/bin/nsjail-wrapper /opt/splunk/bin/python3.7 /opt/splunk/etc/apps/TA-Databricks/bin/databricksquery.py
11-02-2021 16:56:28.268 ERROR ChunkedExternProcessor [29051 ChunkedExternProcessorStderrLogger] - stderr: Traceback (most recent call last):
11-02-2021 16:56:28.268 ERROR ChunkedExternProcessor [29051 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-Databricks/bin/databricksquery.py", line 6, in <module>
11-02-2021 16:56:28.268 ERROR ChunkedExternProcessor [29051 ChunkedExternProcessorStderrLogger] - stderr:     import databricks_com as com
11-02-2021 16:56:28.268 ERROR ChunkedExternProcessor [29051 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-Databricks/bin/databricks_com.py", line 7, in <module>
11-02-2021 16:56:28.268 ERROR ChunkedExternProcessor [29051 ChunkedExternProcessorStderrLogger] - stderr:     import databricks_common_utils as utils
11-02-2021 16:56:28.268 ERROR ChunkedExternProcessor [29051 ChunkedExternProcessorStderrLogger] - stderr:   File "/opt/splunk/etc/apps/TA-Databricks/bin/databricks_common_utils.py", line 13, in <module>