Giter Site home page Giter Site logo

databricks-cli's People

Contributors

a-cong avatar aakash-db avatar aarondav avatar adamcain-db avatar alexott avatar alinxie avatar andrewmchen avatar andyl-db avatar areese avatar arulajmani avatar bogdanghita-db avatar deka108 avatar ericwang-db avatar fjakobs avatar granturing avatar koningjasper avatar kunmin-db avatar mengxr avatar mukulmurthy avatar nordp avatar pietern avatar pradeepgv-db avatar rgovind3 avatar shreyas-goenka avatar smurching avatar stormwindy avatar sweisdb avatar tomasatdatabricks avatar wchau avatar weifeng-db avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

databricks-cli's Issues

Check credentials on "configure"

While debugging the "JSONDecodeError" message (see #102), I figured out I miss-configured the credentials (host + token).

I now see what I did wrong and everything works fine.

However, the debugging would have been avoided if "databricks configure" command checked the credentials instead of silently exiting.

Better error handling for invalid hosts

This config doesn't work and throws a bad error message.

[DEFAULT]
host = https://demo.cloud.databricks.com/m
token = 4b403146adec90ced8f37349bdd9[REDACTED]
$ databricks workspace ls                                                                                                                                                                                                                                                                                        [20:02:14]
Error: JSONDecodeError: Expecting value: line 1 column 1 (char 0)```

databricks configure - multiple deployments/hosts

Hi,

Is it possible to have multiple credential 'profiles' for different databricks deployments? We have a dev, test and prod account (i.e 3 different hostnames) - each env is a seperate DBricks subscriptions.

Is it possible to have these seperate profiles configured and used in a similar way to the aws cli --profile?

Thanks

Rename _get_api_client to get_api_client

_get_api_client looks like a method that should not be imported in python since methods that start with _ are ignored from the whole library root imports like from pyspark import * . Can _get_api_client be renamed to get_api_client.

databricks workspace export has misleading stdout information

$ databricks workspace export_dir /Users/[email protected]/example ./example                                                                                                             [11:46:10]
/Users/[email protected]/example/Usage Logs ETL already exists locally as ./example/Usage Logs ETL.scala. Skip.
/Users/[email protected]/example/Usage Logs ETL -> ./example/Usage Logs ETL.scala
/Users/[email protected]/example/Databricks CLI Usage Analysis already exists locally as ./example/Databricks CLI Usage Analysis.py. Skip.
/Users/[email protected]/example/Databricks CLI Usage Analysis -> ./example/Databricks CLI Usage Analysis.py
/Users/[email protected]/example/Adhoc SQL already exists locally as ./example/Adhoc SQL.sql. Skip.
/Users/[email protected]/example/Adhoc SQL -> ./example/Adhoc SQL.sql
/Users/[email protected]/example/Plotting Notebook already exists locally as ./example/Plotting Notebook.r. Skip.
/Users/[email protected]/example/Plotting Notebook -> ./example/Plotting Notebook.r

We should only print the Skip line.

Roadmap

Hi!

Awesome new repo very exciting- the CLI will be super useful for CI/CD and automation! Awesome - cant wait for full API support.

Is there are roadmap for API coverage?

As a side note, in the absence of the full CLI, does databricks offer some accelerator shell scripts for interacting with their various APIs?

Cheers

databricks fs cp -r should have progress message.

We should make STDOUT for the recursive copy for dbfs resemble databricks workspace import_dir. In databricks workspace import_dir we show which files are being moved to which paths.

This progress message should also show which files were failed to be moved because the file already exists.

Jobs api gives an error when there are no jobs

What happens:

CLI

$ databricks  jobs list
Error: KeyError: 'jobs'

SDK

>>> jobs_api.list_jobs()
{}

What is expected

CLI

$ databricks  jobs list

SDK

>>> jobs_api.list_jobs()
{'jobs': []}

Error when running on Python 3.6

I tried pip install databricks-cli (v 0.6.0) on Python 3.6.4 and when I try any command (say databricks clusters list). I get the following error
Error: TypeError: a bytes-like object is required, not 'str'

Does the cli support Python 3.6?

Support more formats in import_dir

We should consider adding the format JUPYTER to import-dir. The tricky part about doing this is making sure that we have export-dir be able to handle JUPYTER notebooks in a symmetric way.

Unexpected dbfs cp behaviour

dbfs cp -r testSrc testDst/ (where testSrc is a directory) seems to behave like dbfs cp -r testSrc/* testDst/. To be consistent with cp it should probably behave like dbfs cp -r testSrc testDst/testSrc.

Add --json-file for --notebook-params under jobs run-now

I'm trying to run a job from CLI and I'm able to use a command like this:

databricks jobs run-now --job-id 1 --notebook-params '{"widget1": "widget1value","widget2": "widget2value"}'

This gets to be very long and cumbersome, I'm using 1 data source that requires 5 parameters. As I continue making more complex notebooks, I'm expecting to use more and more notebook parameters.

However, I'd like to instead provide a JSON file, similar to how cluster creation works at the CLI. Something like this:

databricks jobs run-now --job-id 1 --json-file parameters.json

Is that possible?

Specify config file location

Point cli to use a different config file such as ~/.databricks/.databrickscfg both from a cli flag and from an environment variable

Import_dir importing hidden folders and files

When I use import_dir to import many local files and folders if I have some hidden folder the API importing this folder as well. It is happening with ".git" hidden folder, for example.
Does it working as expected? Is there a way to avoid import hidden folders and files?

Add global --debug flag

It'd be useful to add a global flag called --debug which prints out the stack trace when an error is raised.

-f and -l imcompatibilities

Requiring -l is silly (and, in fact, ignored, as expected) in this use case:

databricks workspace import -f DBC -l SCALA /path/to/some.dbc

The entire DBC is loaded, without respect to the language(s) of the contained notebooks (which, of course, is what I want to happen). But I'm still required to specified -l <language>.

S3 Mount via cli or API

Hey, guys!

We need to have a programmatic (described in code) configuration for s3 bucket mounts in clusters.

Are there any plans for API/CLI that will allow mounting S3 buckets? And what is the possible workaround? How to call dbutils.fs.mount externally?

SSLError when running any command

I just started having this problem a few days ago. I'm pretty sure I haven't installed any OSX updates that would have interfered.

I tried re-installing my entire python/brew environment and it's the same. I'm able to hit API endpoints in the browser just fine so maybe there's something broken in the certificate chain in the python code?

Here's some sample output:

Error: SSLError: HTTPSConnectionPool(host='<mycluster>.cloud.databricks.com', port=443): Max retries exceeded with url: /api/2.0/jobs/list (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))

Using Python 2.7.13 and databricks-cli (0.5.0).

As suggested on some other python projects for similar errors I tried installing requests[security], but it did not help.

Issue with 'Six' package during installation

Cannot uninstall 'six'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

It's unable to to install due to the above issue.

get_api_client used as library

It would be good if there was an option to be able to pass the host, username, password/token to get_api_client call and if they are not passed then read it from the config . Currently get_api_client just directly reads from the config on disk.

Passwords with %

The Databricks CLI fails with a string interpolation error if the password contains a %.

doug:~$ databricks workspace list /Users
Error: InterpolationSyntaxError: '%' must be followed by '%' or '(', found: v'%pXXXXXXXXXXX'

dbfs cp doesn't work

I have a windows os and i am trying to import a file from my local machine into databricks.
When i run the following code:

dbfs cp "train_dta.npy" dbfs:"/FileStore/train_data.npy"

I get this error

SyntaxError: invalid syntax

Bad error message when credentials are invalid

Used databricks cli to import workspace with bad credentials (invalid Token).
Received the following error message:

$ databricks workspace import_dir -o dummy/ /Shared/dummy
Error: ValueError: No JSON object could be decoded

Add -o flag to `dbfs`.

$ databricks workspace -h
Usage: databricks workspace [OPTIONS] COMMAND [ARGS]...

  Utility to interact with the Databricks workspace. Workspace paths must
  be absolute and be prefixed with `/`.

Options:
  -v, --version  0.4.0
  -h, --help     Show this message and exit.

Commands:
  delete      Deletes objects from the Databricks workspace. rm and delete
              are synonyms.
  export      Exports a file from the Databricks workspace.
  export_dir  Recursively exports a directory from the Databricks workspace.
  import      Imports a file from local to the Databricks workspace.
  import_dir  Recursively imports a directory to the Databricks workspace.
  list        List objects in the Databricks Workspace. ls and list are
              synonyms.
  ls          List objects in the Databricks Workspace. ls and list are
              synonyms.
  mkdirs      Make directories in the Databricks Workspace.
  rm          Deletes objects from the Databricks workspace. rm and delete
              are synonyms.

$ databricks workspace import_dir -h 
Usage: databricks workspace import_dir [OPTIONS] SOURCE_PATH TARGET_PATH

  Recursively imports a directory from local to the Databricks workspace.

  Only directories and files with the extensions .scala, .py, .sql, .r, .R
  are imported. When imported, these extensions will be stripped off the
  name of the notebook.

Options:
  -o, --overwrite
  -h, --help       Show this message and exit.

Note that one has -o for overwrite while the other doesn't.

Should support `--profile` in different positions

Users expect all of the following work:

  • databricks --profile XXX fs ls
  • databricks fs --profile XXX ls
  • databricks fs ls --profile XXX

However, only the last one works with the current implementation.

CLI does not work with python 3

I get the following error

Traceback (most recent call last): File "/usr/local/bin/dbfs", line 7, in <module> from databricks_cli.dbfs.cli import dbfs_group File "/usr/local/lib/python3.6/site-packages/databricks_cli/dbfs/cli.py", line 31, in <module> from databricks_cli.configure.cli import configure_cli File "/usr/local/lib/python3.6/site-packages/databricks_cli/configure/cli.py", line 29, in <module> from databricks_cli.configure.config import DatabricksConfig File "/usr/local/lib/python3.6/site-packages/databricks_cli/configure/config.py", line 24, in <module> import ConfigParser ModuleNotFoundError: No module named 'ConfigParser'

docker image won't build:./lint.sh: line 8: prospector: command not found

Dockerfile in master fails to build image with error ./lint.sh: line 8: prospector: command not found

$ docker build -t databricks-cli .
Sending build context to Docker daemon    834kB
Step 1/5 : FROM python:2.7
 ---> 17c0fe4e76a5
Step 2/5 : WORKDIR /usr/src/databricks-cli
 ---> Using cache
 ---> b5504b00245b
Step 3/5 : COPY . .
 ---> Using cache
 ---> a11d261149fd
Step 4/5 : RUN pip install --upgrade pip &&     pip install -r dev-requirements.txt &&     pip list &&     ./lint.sh &&     pip install . &&     pytest tests
 ---> Running in b6cca86af096
Requirement already up-to-date: pip in /usr/local/lib/python2.7/site-packages (18.0)
Collecting tox==2.9.1 (from -r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/1d/4e/20c679f8c5948f7c48591fde33d442e716af66a31a88f5791850a75041eb/tox-2.9.1-py2.py3-none-any.whl (73kB)
Requirement already satisfied: virtualenv>=1.11.2; python_version != "3.2" in /usr/local/lib/python2.7/site-packages (from tox==2.9.1->-r dev-requirements.txt (line 1)) (16.0.0)
Collecting py>=1.4.17 (from tox==2.9.1->-r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f3/bd/83369ff2dee18f22f27d16b78dd651e8939825af5f8b0b83c38729069962/py-1.5.4-py2.py3-none-any.whl (83kB)
Collecting six (from tox==2.9.1->-r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting pluggy<1.0,>=0.3.0 (from tox==2.9.1->-r dev-requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f5/f1/5a93c118663896d83f7bcbfb7f657ce1d0c0d617e6b4a443a53abcc658ca/pluggy-0.7.1-py2.py3-none-any.whl
Installing collected packages: py, six, pluggy, tox
Successfully installed pluggy-0.7.1 py-1.5.4 six-1.11.0 tox-2.9.1
Package    Version
---------- -------
pip        18.0
pluggy     0.7.1
py         1.5.4
setuptools 40.0.0
six        1.11.0
tox        2.9.1
virtualenv 16.0.0
wheel      0.31.1
./lint.sh: line 8: prospector: command not found
The command '/bin/sh -c pip install --upgrade pip &&     pip install -r dev-requirements.txt &&     pip list &&     ./lint.sh &&     pip install . &&     pytest tests' returned a non-zero code: 127

databricks workspace --help is truncated

$ databricks workspace --help                                                                                                                                                                                                                                                                        [14:21:57]
Usage: databricks workspace [OPTIONS] COMMAND [ARGS]...

  Utility to interact with the Databricks Workspace. Workspace paths must be
  absolute and be prefixed with `/`.

Options:
  -v, --version
  -h, --help     Show this message and exit.

Commands:
  delete      Deletes objects from the Databricks...
  export      Exports a file from the Databricks workspace...
  export_dir  Recursively exports a directory from the...
  import      Imports a file from local to the Databricks...
  import_dir  Recursively imports a directory from local to...
  list        List objects in the Databricks Workspace
  ls          List objects in the Databricks Workspace
  mkdirs      Make directories in the Databricks Workspace.
  rm          Deletes objects from the Databricks...```

We should use https://github.com/pallets/click/issues/486

Add "workspace export_any" command

I propose to add an additional option to workspace so that the user doesn't need to know which type of object exists at the given path. When executing a "workspace ls" it is difficult to tell which type of object it is programmatically (without spoofing a color console and parsing the color codes.)
It looks like "workspace export" will fail on a directory and "workspace export_dir" will fail on a non-directory, so neither command is a surefire way to export a given path.
I propose to add "export_any" which will call one of the two above functions as appropriate, given the path.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.