databricks / databricks-cli Goto Github PK

View Code? Open in Web Editor NEW

379.0 379.0 237.0 895 KB

(Legacy) Command Line Interface for Databricks

License: Other

Python 99.62% Shell 0.21% Dockerfile 0.17%

databricks-cli's People

Contributors

Stargazers

Watchers

Forkers

phi-dbq andrewmchen ganeshchand morkot jric npoggi mrchristine granturing harsha2010 frosforever abhinavg6 mengxr sacheendra kunmin-db ericwang-db levamik diraol alinxie tomasatdatabricks bigrlab cirobarradov mangohero1985 aarondav smurching ccstevens gunslinger95 joelhulen dougbateman alonisser thunterdb hithwen aadamson jeffreybreen srinathshankar weifeng-db jkbradley jirapong catherineta justinmills lyle-nel timvancann q4living digideskio guptam federicolanusse-db teresafds kodelint kartikmehta09 rgoodwin iretex jwalkerwelltok mengjin001 danielvdende sahilduadb mukulmurthy arulajmani navalev areese agrawroh lianghe-databricks hjh17 yeniherdiyeni databricks-david-lewis appcoreopc bogdanrdc lu-wang-dl bonomali nkangdatabricks malthe dmitrij2 blados0514 shore-group svar29 ashok-architect guoch thegreatmoonmoon bogdanghita-db olehb adrian-ionescu saj1th mraju1231 dmoore247 nfx surapalli123 vidyasagarsvn xuanwang14 followanalytics null-sleep ccoombs72 andregoode sluggishsloth m1nkeh itaiw pravinbopps rgovind3 alexott aakash-db gotibhai calving-db sushi1998

databricks-cli's Issues

workspace import_dir should continue even if a notebook cannot be overwritten.

For example, if my local filesystem looks like

- example
  - a.py
  - b.py

And remote looks like

- example
  - a.py

databricks workspace import_dir ./example /example should fail for the first notebook but succeed for the second notebook.

Check credentials on "configure"

While debugging the "JSONDecodeError" message (see #102), I figured out I miss-configured the credentials (host + token).

I now see what I did wrong and everything works fine.

However, the debugging would have been avoided if "databricks configure" command checked the credentials instead of silently exiting.

The cp command does not return an error code when the target resource already exists

When copying a file to dbfs without --overwrite and the resource already exists on dbfs, no file is copied and no exception is raised.
This behaviour leads to expecting the cp was successful even when the cp action is not performed.

Reimplement ConfigParser so we can avoid using private APIs

Currently we use a private _sections API on ConfigParser so we can avoid taking the DEFAULT options. We should find a way to avoid depending on this private API.

Better error handling for invalid hosts

This config doesn't work and throws a bad error message.

[DEFAULT]
host = https://demo.cloud.databricks.com/m
token = 4b403146adec90ced8f37349bdd9[REDACTED]

$ databricks workspace ls                                                                                                                                                                                                                                                                                        [20:02:14]
Error: JSONDecodeError: Expecting value: line 1 column 1 (char 0)```

databricks configure - multiple deployments/hosts

Hi,

Is it possible to have multiple credential 'profiles' for different databricks deployments? We have a dev, test and prod account (i.e 3 different hostnames) - each env is a seperate DBricks subscriptions.

Is it possible to have these seperate profiles configured and used in a similar way to the aws cli --profile?

Thanks

Rename _get_api_client to get_api_client

_get_api_client looks like a method that should not be imported in python since methods that start with _ are ignored from the whole library root imports like from pyspark import * . Can _get_api_client be renamed to get_api_client.

Ignore hidden files and directories when importing

If a directory contains hidden files, a .git folder for instance, they seem to get imported when running import_dir. Should we be ignoring these by default?

databricks workspace export has misleading stdout information

$ databricks workspace export_dir /Users/[email protected]/example ./example                                                                                                             [11:46:10]
/Users/[email protected]/example/Usage Logs ETL already exists locally as ./example/Usage Logs ETL.scala. Skip.
/Users/[email protected]/example/Usage Logs ETL -> ./example/Usage Logs ETL.scala
/Users/[email protected]/example/Databricks CLI Usage Analysis already exists locally as ./example/Databricks CLI Usage Analysis.py. Skip.
/Users/[email protected]/example/Databricks CLI Usage Analysis -> ./example/Databricks CLI Usage Analysis.py
/Users/[email protected]/example/Adhoc SQL already exists locally as ./example/Adhoc SQL.sql. Skip.
/Users/[email protected]/example/Adhoc SQL -> ./example/Adhoc SQL.sql
/Users/[email protected]/example/Plotting Notebook already exists locally as ./example/Plotting Notebook.r. Skip.
/Users/[email protected]/example/Plotting Notebook -> ./example/Plotting Notebook.r

We should only print the Skip line.

Roadmap

Hi!

Awesome new repo very exciting- the CLI will be super useful for CI/CD and automation! Awesome - cant wait for full API support.

Is there are roadmap for API coverage?

As a side note, in the absence of the full CLI, does databricks offer some accelerator shell scripts for interacting with their various APIs?

Cheers

Error: AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'

installed databricks-cli with pip3. Double checked I have latest version of open-ssl and upgraded python. Still getting the same message.

databricks fs cp -r should have progress message.

We should make STDOUT for the recursive copy for dbfs resemble databricks workspace import_dir. In databricks workspace import_dir we show which files are being moved to which paths.

This progress message should also show which files were failed to be moved because the file already exists.

Add support for authentication with token service.

In future releases of Databricks, we will have the ability to authenticate via a token. We should support using this in the databricks-cli.

Missing option to permanently delete a cluster

There seems to be no option to use the cli to call this REST endpoint

2.0/clusters/permanent-delete

https://docs.azuredatabricks.net/api/latest/clusters.html#permanent-delete

Recursive copying of a single file from dbfs doesn't work well.

$ dbfs cp -r dbfs:/andrew/sample-results.csv . [11:56:56]
The host destination ./sample-results.csv already exists. You should provide the --overwrite flag.

Jobs api gives an error when there are no jobs

What happens:

CLI

$ databricks  jobs list
Error: KeyError: 'jobs'

SDK

>>> jobs_api.list_jobs()
{}

What is expected

CLI

$ databricks  jobs list

SDK

>>> jobs_api.list_jobs()
{'jobs': []}

Error when running on Python 3.6

I tried pip install databricks-cli (v 0.6.0) on Python 3.6.4 and when I try any command (say databricks clusters list). I get the following error
Error: TypeError: a bytes-like object is required, not 'str'

Does the cli support Python 3.6?

Support more formats in import_dir

We should consider adding the format JUPYTER to import-dir. The tricky part about doing this is making sure that we have export-dir be able to handle JUPYTER notebooks in a symmetric way.

workspace export_dir does not preserve newline at end of file

My source files have trailing newlines. If would be nice if

databricks workspace import_dir foo /Users/[email protected]/foo
databricks workspace export_dir /Users/[email protected]/foo bar
diff foo bar

resulted in no differences.

Unexpected dbfs cp behaviour

dbfs cp -r testSrc testDst/ (where testSrc is a directory) seems to behave like dbfs cp -r testSrc/* testDst/. To be consistent with cp it should probably behave like dbfs cp -r testSrc testDst/testSrc.

Add --json-file for --notebook-params under jobs run-now

I'm trying to run a job from CLI and I'm able to use a command like this:

databricks jobs run-now --job-id 1 --notebook-params '{"widget1": "widget1value","widget2": "widget2value"}'

This gets to be very long and cumbersome, I'm using 1 data source that requires 5 parameters. As I continue making more complex notebooks, I'm expecting to use more and more notebook parameters.

However, I'd like to instead provide a JSON file, similar to how cluster creation works at the CLI. Something like this:

databricks jobs run-now --job-id 1 --json-file parameters.json

Is that possible?

Specify config file location

Point cli to use a different config file such as ~/.databricks/.databrickscfg both from a cli flag and from an environment variable

Import_dir importing hidden folders and files

When I use import_dir to import many local files and folders if I have some hidden folder the API importing this folder as well. It is happening with ".git" hidden folder, for example.
Does it working as expected? Is there a way to avoid import hidden folders and files?

Add global --debug flag

It'd be useful to add a global flag called --debug which prints out the stack trace when an error is raised.

-f and -l imcompatibilities

Requiring -l is silly (and, in fact, ignored, as expected) in this use case:

databricks workspace import -f DBC -l SCALA /path/to/some.dbc

The entire DBC is loaded, without respect to the language(s) of the contained notebooks (which, of course, is what I want to happen). But I'm still required to specified -l <language>.

S3 Mount via cli or API

Hey, guys!

We need to have a programmatic (described in code) configuration for s3 bucket mounts in clusters.

Are there any plans for API/CLI that will allow mounting S3 buckets? And what is the possible workaround? How to call dbutils.fs.mount externally?

Add support for the Workspace API.

https://docs.databricks.com/api/latest/workspace.html to enable CI/CD with Databricks.

Implement cluster edit

The current clusters CLI does not support the edit API

Add way to configure databricks-cli without going through prompt.

Add support for databricks-cli for development shards.

We should add a conf flag which optionally toggles the SSL cert verification off.

SSLError when running any command

I just started having this problem a few days ago. I'm pretty sure I haven't installed any OSX updates that would have interfered.

I tried re-installing my entire python/brew environment and it's the same. I'm able to hit API endpoints in the browser just fine so maybe there's something broken in the certificate chain in the python code?

Here's some sample output:

Error: SSLError: HTTPSConnectionPool(host='<mycluster>.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/jobs/list (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))

Using Python 2.7.13 and databricks-cli (0.5.0).

As suggested on some other python projects for similar errors I tried installing requests[security], but it did not help.

Issue with 'Six' package during installation

Cannot uninstall 'six'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

It's unable to to install due to the above issue.

get_api_client used as library

It would be good if there was an option to be able to pass the host, username, password/token to get_api_client call and if they are not passed then read it from the config . Currently get_api_client just directly reads from the config on disk.

Support python 3.7

Probably works as is, but should be added to tests

This is more of a question Can databricks-cli be used to run a notebook

I have a python script that pulls a ton of data into s3 and I created a notebook in databricks to turn it into a parquet file. I was wondering if I can use the cli to run a job after i finish pulling in all my data?
Thanks

Passwords with %

The Databricks CLI fails with a string interpolation error if the password contains a %.

doug:~$ databricks workspace list /Users
Error: InterpolationSyntaxError: '%' must be followed by '%' or '(', found: v'%pXXXXXXXXXXX'

dbfs cp doesn't work

I have a windows os and i am trying to import a file from my local machine into databricks.
When i run the following code:

dbfs cp "train_dta.npy" dbfs:"/FileStore/train_data.npy"

I get this error

SyntaxError: invalid syntax

Bad error message when credentials are invalid

Used databricks cli to import workspace with bad credentials (invalid Token).
Received the following error message:

$ databricks workspace import_dir -o dummy/ /Shared/dummy
Error: ValueError: No JSON object could be decoded

get_config_for_profile may use entries from DEFAULT if they are not defined for the profile.

Add -o flag to `dbfs`.

$ databricks workspace -h
Usage: databricks workspace [OPTIONS] COMMAND [ARGS]...

  Utility to interact with the Databricks workspace. Workspace paths must
  be absolute and be prefixed with `/`.

Options:
  -v, --version  0.4.0
  -h, --help     Show this message and exit.

Commands:
  delete      Deletes objects from the Databricks workspace. rm and delete
              are synonyms.
  export      Exports a file from the Databricks workspace.
  export_dir  Recursively exports a directory from the Databricks workspace.
  import      Imports a file from local to the Databricks workspace.
  import_dir  Recursively imports a directory to the Databricks workspace.
  list        List objects in the Databricks Workspace. ls and list are
              synonyms.
  ls          List objects in the Databricks Workspace. ls and list are
              synonyms.
  mkdirs      Make directories in the Databricks Workspace.
  rm          Deletes objects from the Databricks workspace. rm and delete
              are synonyms.

$ databricks workspace import_dir -h 
Usage: databricks workspace import_dir [OPTIONS] SOURCE_PATH TARGET_PATH

  Recursively imports a directory from local to the Databricks workspace.

  Only directories and files with the extensions .scala, .py, .sql, .r, .R
  are imported. When imported, these extensions will be stripped off the
  name of the notebook.

Options:
  -o, --overwrite
  -h, --help       Show this message and exit.

Note that one has -o for overwrite while the other doesn't.

Should support `--profile` in different positions

Users expect all of the following work:

databricks --profile XXX fs ls
databricks fs --profile XXX ls
databricks fs ls --profile XXX

However, only the last one works with the current implementation.

CLI does not work with python 3

I get the following error

Traceback (most recent call last): File "/usr/local/bin/dbfs", line 7, in <module> from databricks_cli.dbfs.cli import dbfs_group File "/usr/local/lib/python3.6/site-packages/databricks_cli/dbfs/cli.py", line 31, in <module> from databricks_cli.configure.cli import configure_cli File "/usr/local/lib/python3.6/site-packages/databricks_cli/configure/cli.py", line 29, in <module> from databricks_cli.configure.config import DatabricksConfig File "/usr/local/lib/python3.6/site-packages/databricks_cli/configure/config.py", line 24, in <module> import ConfigParser ModuleNotFoundError: No module named 'ConfigParser'

Convert from Windows to Unix-style directory separators when importing and exporting

On Windows machines when running import_dir the directory separator is \ which causes files vs folders to get created in the workspaces. Likewise, when exporting all the / in the workspace paths should be normalized to the local OS separator

docker image won't build:./lint.sh: line 8: prospector: command not found

Dockerfile in master fails to build image with error ./lint.sh: line 8: prospector: command not found

$ docker build -t databricks-cli .
Sending build context to Docker daemon    834kB
Step 1/5 : FROM python:2.7
 ---> 17c0fe4e76a5
Step 2/5 : WORKDIR /usr/src/databricks-cli
 ---> Using cache
 ---> b5504b00245b
Step 3/5 : COPY . .
 ---> Using cache
 ---> a11d261149fd
Step 4/5 : RUN pip install --upgrade pip &&     pip install -r dev-requirements.txt &&     pip list &&     ./lint.sh &&     pip install . &&     pytest tests
 ---> Running in b6cca86af096
Requirement already up-to-date: pip in /usr/local/lib/python2.7/site-packages (18.0)
Collecting tox==2.9.1 (from -r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/1d/4e/20c679f8c5948f7c48591fde33d442e716af66a31a88f5791850a75041eb/tox-2.9.1-py2.py3-none-any.whl (73kB)
Requirement already satisfied: virtualenv>=1.11.2; python_version != "3.2" in /usr/local/lib/python2.7/site-packages (from tox==2.9.1->-r dev-requirements.txt (line 1)) (16.0.0)
Collecting py>=1.4.17 (from tox==2.9.1->-r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f3/bd/83369ff2dee18f22f27d16b78dd651e8939825af5f8b0b83c38729069962/py-1.5.4-py2.py3-none-any.whl (83kB)
Collecting six (from tox==2.9.1->-r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting pluggy<1.0,>=0.3.0 (from tox==2.9.1->-r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f5/f1/5a93c118663896d83f7bcbfb7f657ce1d0c0d617e6b4a443a53abcc658ca/pluggy-0.7.1-py2.py3-none-any.whl
Installing collected packages: py, six, pluggy, tox
Successfully installed pluggy-0.7.1 py-1.5.4 six-1.11.0 tox-2.9.1
Package    Version
---------- -------
pip        18.0
pluggy     0.7.1
py         1.5.4
setuptools 40.0.0
six        1.11.0
tox        2.9.1
virtualenv 16.0.0
wheel      0.31.1
./lint.sh: line 8: prospector: command not found
The command '/bin/sh -c pip install --upgrade pip &&     pip install -r dev-requirements.txt &&     pip list &&     ./lint.sh &&     pip install . &&     pytest tests' returned a non-zero code: 127

Running into path error when using workspace API on Windows Git Bash

I'm running into the error Invalid_Parameter_Value when using the workspace API on windows git bash. The workspace API works great in windows command prompt.

databricks clusters spark-versions does not work in Python3 with basic auth.

Please support wildcards when doing dbfs cp

dbfs cp does not seem to support wildcards. Any plans to support wildcards, especially for files which are on the remote dbfs.

databricks workspace --help is truncated

$ databricks workspace --help                                                                                                                                                                                                                                                                        [14:21:57]
Usage: databricks workspace [OPTIONS] COMMAND [ARGS]...

  Utility to interact with the Databricks Workspace. Workspace paths must be
  absolute and be prefixed with `/`.

Options:
  -v, --version
  -h, --help     Show this message and exit.

Commands:
  delete      Deletes objects from the Databricks...
  export      Exports a file from the Databricks workspace...
  export_dir  Recursively exports a directory from the...
  import      Imports a file from local to the Databricks...
  import_dir  Recursively imports a directory from local to...
  list        List objects in the Databricks Workspace
  ls          List objects in the Databricks Workspace
  mkdirs      Make directories in the Databricks Workspace.
  rm          Deletes objects from the Databricks...```

We should use https://github.com/pallets/click/issues/486

Add "workspace export_any" command

I propose to add an additional option to workspace so that the user doesn't need to know which type of object exists at the given path. When executing a "workspace ls" it is difficult to tell which type of object it is programmatically (without spoofing a color console and parsing the color codes.)
It looks like "workspace export" will fail on a directory and "workspace export_dir" will fail on a non-directory, so neither command is a surefire way to export a given path.
I propose to add "export_any" which will call one of the two above functions as appropriate, given the path.

Changing on filenames in importing

When I import my workspace from databricks, somes files have its name altered. Usually the last letter is missing.