globus / globus-cli Goto Github PK

View Code? Open in Web Editor NEW

74.0 74.0 21.0 3.3 MB

A command line interface to Globus

Home Page: https://docs.globus.org/cli

License: Apache License 2.0

Python 99.89% Makefile 0.08% Jinja 0.04%

globus-cli's People

Contributors

Stargazers

Watchers

globus-cli's Issues

Add manpages for all commands

--help documentation for async-transfer has already started to run up against the limits of simple inline help.
The helpdoc is really long, and it's hard to document the --batch behavior in there.

I didn't expect to come up against this so soon, but I think we already need man pages.

Likely this will be sphinx autodoc fed back into the CLI as a set of commands.
My personal tendency at present is to add --man as a global option that behaves much like the aws help subcommands.

Decide on whether or not to use positional arguments for commands with (at present) required options

This is a discussion that we've had on and off pretty much since day one.
When we originally started work on this project, I imitated the awscli's decision to do required options without giving much consideration to whether or not that's correct.

awscli is a good model for us to follow because (1) Amazon has many services, we are looking at supporting several services of our own, (2) Amazon employs at least as many Smart People as we do, probably more, and (3) we have experience using this CLI in administrative contexts and can vouch for the cohesion in its design.

However, this particular design decision may not sit well with our CLI, so we should evaluate it within our own context. I have my own bias, but will try to enumerate all of the arguments that I'm aware of.

Favoring Positional Arguments:

Shorter commands / easier to type
Clear and well-understood signal of what is and is not required
Using meaningful ordering (e.x. SourceID SourcePath DestID DestPath) maintains readability
Follows example of git, traditional UNIX toolchain (no other known modern tools)

I don't mean to denigrate that first point. It's a matter of good interactive "ergonomics", as we're fond of calling it, and it is important.

Favoring Required Options:

Reads unambiguously for those unfamiliar with CLI commands
Separates (visually, not programmatically) "command name" from "arguments and options"
Can replace required options with config file loads in some cases, as in aws --region ... and the region config opt
Follows example of awscli, knife-ec2, no other known tools

Add Shared Endpoint Creation

There is no CLI command for creating a shared endpoint.

I don't even quite know where this should go in the CLI command structure.
globus transfer endpoint share?
globus transfer endpoint create --shared?

If it's a behavior of endpoint create, that makes argument handling there much more complex, but I think it might have better feeling semantics once you figure out what command you want.

Show latest version on output from --version

If there's a reasonable way to do it, we could pull the latest published version from pypi, otherwise we could go by GitHub releases.

Example:

jasons-mbp:src jtw$ vagrant version
Installed Version: 1.8.6
Latest Version: 1.8.6

You're running an up-to-date version of Vagrant!

`globus transfer async-transfer` and `globus-transfer async-delete` should print out the Task ID in text mode

The textual output from the task submission commands doesn't include the Task ID -- just the "message" field back from the API.

This is a usability problem, as immediately after submission, starting something like a globus transfer task wait requires doing a task listing.
The JSON output includes all of the fields, so this isn't a problem for scripting, but rather for interactive use.

The text output format should probably be something like this:

$ globus transfer async-transfer ...
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID:    aaaaa-bbbbb-ccccc-ddddd

That's just for consistency with the various other parts of the CLI that output multi-field text info in a non-columnar format.
The alternative is to ditch that format and go with

$ globus transfer async-transfer ...
The transfer has been accepted and a task has been created and queued for execution
Task ID:    aaaaa-bbbbb-ccccc-ddddd

To me, those two are indistinguishable for a user, so we should just go with the one that matches other existing output.

Transfer: task wait behavior (e.g. status code)

~~Polling interval (#31)~~
Make sure --heartbeat behavior is what is expected
Existing CLI progress bar shows ... if we are still directory scanning. Not sure if the rest-api exposes that info (nr_directory_expansions_pending)
Make sure exit status is doing what is expected (currently a "cancel" returns 1, existing CLI returns 0)

Move globus transfer acl commands to be an endpoint subcommand

ACLs are secondary resources on endpoints (S3 and Shared), so they should be namespaced inside of globus transfer endpoint acl.

Karl suggested this, and I pushed back slightly, thinking they should be in globus transfer share acl.
When he informed me that S3 eps can have ACLs, that convinced me.
No one else seems to have a strong opinion.

Transfer: implement replacement for "details -t"

Existing CLI: details -t to show a task's subtasks.
New CLI: implement https://docs.globus.org/api/transfer/task/#get_task_successful_transfers

Control python warnings with an env var

I think this might just be something that we hook into GLOBUS_CLI_DEBUGMODE.
We should suppress warnings during normal operations -- they're chatty and confusing for end users.
However, being able to see warnings is a useful feature for developers working on the project, so being able to expose them seems good.

Set default umask when writing ~/.globus.cfg

The globus login command writes values to the config file, but doesn't set a umask. On a lot of systems, we'll be protected by ordinary home-directory permissions, but the file will be world-readable.
Because the file contains refresh tokens, it should start with perms of 0600.

Enumerate and address all cases of commands where `--format TEXT` is the JSON output

There are a number of commands which I took shortcuts on before I had complete and working tooling for table output.
To get things working without fiddling with output too much, I used the JSON output for both output formats.

In some cases, this may ultimately be the most sensible approach (probably not).
Regardless, we should list all of these as items to address before considering the CLI "done".

Add `endpoint server *` commands

server add and server delete are two management commands for physical endpoints supported by the current CLI, but missing here.
Additionally, we should have server show, for completeness, even though the old CLI doesn't include it.

Additionally, promote server-list from a command into a command group server with a subcommand list, a sibling of these other new commands.

Keep track of "logged-in" Identity

Provide a way for a user to see what identity-context they're operating under. Something like globus whoami that shows the primary/effective identity that the current set of tokens was issued to.

See also #38 for more "CLI session lifecycle" material.

Restrict broken pipe handling to IO locations

Right now, all of execution is wrapped in a handler for broken pipes designed to handle the possibility that python IO is being truncated by a consumer like head.
This is somewhat dangerous, and it would be better to wrap the IO sites with this handling.

This can be a context manager, but a special purpose globus_cli.safeio.write(..., file=sys.stdout) might be desirable.

Needs a bit more thought, and some investigation of click.echo -- maybe that does what we want, and just needs to be wrapped in our own package so that we can slot something there later if we replace Click.

Transfer: is "profile -n off" relevant in the new CLI?

Existing CLI provides a way to turn off task notifications. Is this relevant in the new cli?

Transfer: support "-a" option on endpoint ls

Current behavior shows all files by default. Desired behavior is to not show dotfiles by default but provide a flag (e.g. -a).

Set a minimum polling interval on `transfer task wait`

Allowing sub-second polling may be desirable -- a strong case can be made for writing a script which does

# start transfer
transfer_out="$(globus transfer async-transfer --format JSON ...)"
# parse & extract task ID
task_id="$(echo "$transfer_out" | jq ...)"
# wait for task to complete
echo "waiting on task $task_id"
globus transfer task wait --task-id "$task_id" --polling-interval 0.1 --heartbeat
# do more stuff...
...

I don't think that forcing someone running this kind of script to wait up to 1 second for a task to complete is entirely fair. If they're running many small tasks in succession, that cost will escalate to significant delays.

However, there is really no case to justify

globus transfer task wait --task-id "$task_id" --polling-interval 0.001 --heartbeat

If you make a call like this, you've grossly misunderstood the performance characteristics of the CLI and are unnecessarily hammering the API.

I think that making 0.1 the minimum allowed would be reasonable -- it means that going into hundredths-digit precision is a clear-cut sign that you're probably doing it wrong without being too limiting.

Allow `globus endpoint permission` to be used for ACL re-assignment

The driving use case is this:
A user with a large number of shares wants to change the local user associated with those shares.
In order to do so, the user needs to create new shares under the new local user and then migrate a large number of ACLs from the old shares to the new ones.

This changes the share IDs, but their attributes can be replicated with a little bit of scripting, so search-based lookups will continue to work for users of the shares.

The key to making this use case work smoothly is allowing globus transfer endpoint acl add-rule (or a new, similar command) capable of consuming the JSON output of globus transfer endpoint acl list.
I think we can change the behavior of add-rule based on a flag, so we get this:

$ oldshare="aabb"
$ newshare="xxyy"  # not 48, XXYY Syndrome, just a fake ID
$ globus transfer endpoint acl list --endpoint-id "$oldshare" --format JSON | \
  globus transfer endpoint acl add-rule --from-list --endpoint-id "$newshare"

This is highly desirable for a very small number of users.
Potentially offering a solution purely through the SDK is more desirable, but this seems like a clear-cut case for "added value" on top of core SDK features. We shouldn't pollute it with rarefied usage like this.

Recursive dir listing has potentially dangerous implications service-side -- limit it or remove it

The case we're concerned about is the pathological case of hundreds of nested directories expanding to huge numbers of files (m/billions).
Per notes from Karl, we'd need to switch traversal to be breadth-first and unbuffered output to handle that case at all -- makes perfect sense, though unbuffered JSON output will require careful handling and some testing to ensure that the results are valid JSON.

To resolve, we have three paths forward (there may be more?):

Remove recursive ls from the CLI
Add it to the SDK and limit the total number of dirs to ls to some "small" but reasonable number -- e.x. 50
Add it to the SDK and enforce an N second sleep every M dirs, where N and M are both small (guessing 1s every 25 dirs)

This is only worth discussing as an SDK addition (and not purely as a CLI feature) because we've made it very easy to build your own version of recursive ls.
Unless we make it easier to use the "correct" version than to build your own, we haven't actually solved the problem.

Removing it from the CLI is viable but may be worse than adding a recursive=True option to the SDK ls -- I just don't trust that people won't go building this themselves. It therefore seems a poor way of protecting the service.

Given the choice between a hard limit and a sleep, the sleep has much better semantics.
It doesn't mess with error modes at all -- an SDKLimitExceeded error will be confusing to people, and is likely to break scripts unexpectedly as their inputs scale up.
The sleep behavior also sets the tone correctly for a future universe in which that kind of rate limiting is enforced service side in addition to being done client-side.

There's already been some discussion of these, but I'm moving it into issues to better track status and decisions.
I'd appreciate input from @karlito-go and @ranantha on this one.

Add `config` command for setting up and modifying the `~/.globus.cfg`

This was a feature request I got from Steve -- he seemed to feel pretty strongly about it being a necessary intro point.

My instinct is that semantics should be somewhere between what knife configure and git config offer.

So, globus config init (knife-like) should be prompt-oriented and allow users to enter info line by line.
For example,

$ globus config init
Please enter your auth token: XXXXX
Please enter your transfer token: XXXXX
...
$ globus config init --auth-token XXXXX
Please enter your transfer token: XXXXX

However, it might also be useful to have git-like setter/getter functionality. Maybe something more like this:

$ globus config show general.auth_token # echoes the value
...
$ globus config add general.auth_token XXXXX

of course, anything without a dot should be treated as part of the general config section, so these are equivalent to

$ globus config show auth_token
...
$ globus config add auth_token XXXXX

and of course we'd want globus config remove

Questions I still have:

Should we support a --system flag, like git-config, to operate on /etc/globus.cfg ?
Are the verbs "get", "set", and "unset" (like git) or "show", "add", and "remove" (like the other parts of the CLI for endpoints &c.)?
Support for globus config edit which does $EDITOR ~/.globus.cfg?
Should the config command know that certain values (like tokens) are secret, and show redacted versions of them?
What does show/get do when the key is absent? Empty string seems ambiguous -- it's a valid value.
What does add/set do when the key is already present? Should there be a --overwrite flag?

@bd4 I'm concerned that this needs direct access to the SDK's GlobusConfigParser. It's a very unusual internal use-case, and I don't think we want to support it for external groups.
I'm happy to from globus_sdk.config import GlobusConfigParser, but I'll need a new function attached to flush its state to a file (the internal SafeConfigParser should support this for us), and I'll need to update the GlobusConfigParser to support reading only one file. Otherwise, it would load system and default config, and flush all of that back out when it's done, which would be bad.
There are some alternatives:

Should we do a 0.3.0 of the SDK in which the config parser insists that there is a [general] section heading, like a typical ConfigParser? Then this would not need the specialized GlobusConfigParser at all. That breaks compatibility with existing config, which I don't like. We'd need clear messaging for anyone who has started using it.
I could make a CLI config parser with implements its own specialized _read()

Create Contributing Guide

It can be based loosely on the SDK guide: https://github.com/globus/globus-sdk-python/blob/master/CONTRIBUTING.md

However, it should have a different emphasis, particularly for reporting bugs that may look arcane to non-pythonistas.

Allow two-step (i.e. retry-safe) task submission

globus transfer async-transfer and globus transfer async-delete currently hide away the submission ID generation. That means that the CLI doesn't support reliable, safe-to-retry task submission.

To resolve, we should expose this behavior with a new command and a couple of flags.

Namely, we want globus transfer task generate-submission-id to produce the submission ID in raw text, and globus transfer async-transfer --submission-id to consume it.
The same option should exist for async-delete, of course.

Investigate moving off `six`, along with SDK

Per globus/globus-sdk-python#51 , we're trying to get off of the dependency on six because OSX makes things difficult.
This is largely matter of supporting the naive developers who don't realize that there's a strong reason to use virtualenv.

At present, we're using

six.string_types (easy to sub with some version dispatch)
six.reraise

The reraise functionality is hard to imitate, so we'll need to take another look at where and how that's happening. I don't think we can do the current exception handling logic without a reraise, but maybe we don't need to preserve context (I think that's just paranoia about how click might behave).

Produce more useful error on endpoints where autoactivation fails to activate

Right now, the error output is not prescriptive, and just notes that an endpoint needs to be activated.
Example from Steve:

$ globus transfer ls --endpoint-id 9d6d994a-6d04-11e5-ba46-22000b92c6ec
Globus CLI Error: A Transfer API Error Occurred.
HTTP status:      400
request_id:       9sHmaySoX
code:             ClientError.ActivationRequired
message:          The endpoint '9d6d994a-6d04-11e5-ba46-22000b92c6ec' is not activated

This is actually the LS call returning the ActivationRequired error, not a processed activation requirements doc.

What we really need to do is this: autoactivate any endpoint that we see, and capture any activation requirements document that we get back.
If the response has a success code, proceed as usual.
However, if we get NotActivated, we need to parse and present the ActivationRequirements.
I think in that case, we should exit with status 1 -- the command failed, after all.

As for the error presentation, it depends a bit on the required activation type.
For web activation, we just need to show the activation link, that's pretty easy.
For CLI activation, we need some text that clarifies that we mean the hosted CLI.
Possibly something like this:

$ globus transfer ls --endpoint-id 9d6d994a-6d04-11e5-ba46-22000b92c6ec
Globus CLI Error: A Globus Endpoint Requires Activation
code:             ClientError.ActivationRequired
message:          The endpoint '9d6d994a-6d04-11e5-ba46-22000b92c6ec' is not activated

To activate 9d6d994a-6d04-11e5-ba46-22000b92c6ec , you must use the Hosted Globus CLI.
Follow this guide to connect <link>
Then run
  endpoint-activate 9d6d994a-6d04-11e5-ba46-22000b92c6ec

Note that I omit some of the HTTP detail (it's a success there, after all), but still want the ClientError.ActivationRequired code noted.

This may have some unforeseen impact on #21, so we need to think about that.

Ideally, the SDK would handle parsing the activation requirements document a bit, and we could inspect an activation_type property on it to see CLI vs. Web activation.

Replace global excepthook with high level try-except

For some reason, an early version of the CLI used sys.excepthook to do global error handling.
The goal was to hide any exceptions which occur from the end user, and present a clean "minimal" error message that wouldn't be too scary.

This is all well and good, but it should just be a try-except block in the main command method.
Any exceptions thrown and handled by parsing are presented nicely already anyway.

Make debugmode a hidden option, rather than environment var

This also has implictations for #17.

When I started using GLOBUS_CLI_DEBUGMODE, I didn't have the HiddenOption class written yet, and there was no CommandState location to store that value anyway.

This should be fairly easy to add to the common_options option set.

Add --batch mode to async-delete

globus transfer async-delete should have a --batch mode, just like globus transfer async-transfer

Transfer: add options to task event-list

filter
limit

ls -R support

An option on ls mimicking ls -R behavior would be helpful. Maximum depth options and global policy are probably prudent. In JSON output mode, behavior such that the path (at least relative to the root path given on the command line) is displayed would be needed.

Transfer: endpoint create/update functionality

Missing option to add subscription_id
Support network_use, location, and disable-verify

Add --map-http-status global option to convert HTTP statuses into nonzero exit codes

Karl suggested this a while back, and I've had it in my queue for a while.
I definitely want this before we consider the CLI done, but I'm not sure when we'll get to it.

Several commands have a set of non-200 HTTP responses that might still be considered part of normal operations. For example, a command like globus transfer endpoint delete ... could be considered successful on a 200 or a 404.

It may be desirable for us to hand that information -- the 404 response, vs. a 403 or other error, to the caller, but doing so with some fixed mapping is burdensome for scripting (you need to know our statuses table), and not very extensible if we omit codes that we start using later.
The generic solution is to just let the invoking script specify what it would like us to translate various HTTP statuses into.
That lets them get meaningful status codes, avoids a magical lookup table, and makes those scripts more readable because they specify what the various error codes that they're dispatching on mean / come from.

So, the basic form would be something like this:

globus transfer endpoint delete --map-http-status "404=54" ...

and we can even allow crazy things like this check for nonexistence:

globus transfer endpoint show --map-http-status "404=0" --map-http-status "200=1"

In the original version of this proposal the idea was to restrict the caller to using the range from 50 to 99 (reserving other exit codes for our own purposes into the future).
However, I think there's a case for allowing clients to say "allow 404s" by mapping to 0.
Whether or not we allow mapping to 1 is a little bit more tenuous, but I think still acceptable, since it just means "generic error".

There's also a question of whether or not it should be

globus transfer endpoint delete --map-http-status "404=54" --map-http-status "403=53"

globus transfer endpoint delete --map-http-status "404=54,403=53"

That second is a little bit nicer, but also a little harder to implement.

Globus not foolproof

The partial commands:

globus transfer task 
globus transfer
globus

all give:

AttributeError: 'Namespace' object has no attribute 'fun'

Allow the Transfer service to figure out what the default path should be when `ls`ing

Instead of having the globus transfer ls command default to /~/, don't send any path to the Transfer service and let the Transfer service figure out what the default path is supposed to be.

Fixes S3 endpoints (which don't support /~/), shared endpoints (which should usually default to /), and any kind of endpoint that has had a default_directory configured.

Remove `globus transfer endpoint autoactivate`

Dan mentioned this to me.
Specifically, trying to autoactivate an activated endpoint which does not support autoactivate is confusing:

$ globus transfer endpoint autoactivate --endpoint-id danielpowers#prod01
{
  "code": "AutoActivationFailed",
  "resource": "/endpoint/danielpowers%23prod01/autoactivate",
  ...
  "message": "The endpoint could not be auto activated, fill in the returned activation_requirements and POST them back to /activate to perform manual activation.",
  ...
  "oauth_server": null,
  "subject": null
}

Really, the autoactivate call in this case is unhelpful / unimportant, as the activation requirements document only applies if the expires_in field you get back is unacceptable.
autoactivate calls on autoactivate-capable endpoints works fine and has acceptable (but noisy) output.

Additionally, this call is generally unnecessary because we automatically attempt to autoactivate any endpoint which we see, so trying an ls or similar operation will result in an implicit and less scary looking autoactivation.

If we keep autoactivate as an explicit operation, we need to figure out better (textmode) output for it.
If we make it only magically autoactivate, this will feed into #30 as well.

Transfer: add "--all" option to task cancel

Existing CLI has an --all option that will cancel all ACTIVE and INACTIVE tasks.

Possible Feature: Global `--grep` option for basic filtering on results

@bd4, @corpulentcoffee: I'm particularly interested in your opinions on this, as we've had a number of discussions about the CLI design/layout.
Thoughts and opinions from everyone and anyone welcome, of course.

There's a potential feature that I'd like to think about and ask questions about.
For now, I'm calling it --grep because I think it's the most intuitive name for it, but I'm not thinking of doing full regex matching necessarily.
--grep is probably a bad name for this if I add it, since it promises too much, but I don't have a good name yet. Maybe --cli-filter

Right now, a lot of commands have grep-friendly text output.
I'm going to use bookmark list for a bunch of examples because it has very simple output.

$ globus transfer bookmark list -Ftext
Name                             | Endpoint ID                          | Bookmark ID                          | Path
-------------------------------- | ------------------------------------ | ------------------------------------ | ----
Crummy Bookmark                  | d1763b75-6d04-11e5-ba46-22000b92c6ec | 6c0a7d14-f796-11e5-a6f9-22000bf2d559 | /abc/123/
EP1 GoData                       | ddb59aef-6d04-11e5-ba46-22000b92c6ec | ec207bf4-eb91-11e5-9829-22000b9da45e | /share/godata/
Test Bookmark Creation           | d1763b75-6d04-11e5-ba46-22000b92c6ec | 46e4d1b8-f78e-11e5-a6f9-22000bf2d559 | /abc/123/

In the case of commands like this, it's natural to want to be able to filter on columns in some way. My beloved awk can do great things here with | as its delimiter, and plain grep without column awareness is pretty useful too.

$ globus transfer bookmark list -Ftext | grep 'ec207bf4-eb91-11e5-9829-22000b9da45e'
EP1 GoData                       | ddb59aef-6d04-11e5-ba46-22000b92c6ec | ec207bf4-eb91-11e5-9829-22000b9da45e | /share/godata/

is a decent example of what we'll see people doing.

This really doesn't work well on the JSON output, however, for obvious reasons:

$ globus transfer bookmark list -Fjson | grep 'ec207bf4-eb91-11e5-9829-22000b9da45e' 
      "id": "ec207bf4-eb91-11e5-9829-22000b9da45e"

Now, what if that filtering were done inside of the CLI code itself, so instead of globus ... | grep <expr>, we had globus ... --grep <expr>?
We could apply that expression to text and JSON output uniformly -- quick and easy filtering for your results.

$ globus transfer bookmark list -Ftext --grep 'EP1'
EP1 GoData                       | ddb59aef-6d04-11e5-ba46-22000b92c6ec | ec207bf4-eb91-11e5-9829-22000b9da45e | /share/godata/

and importantly

$ globus transfer bookmark list -Fjson --grep 'EP1'
{
  "DATA": [
    {
      "name": "EP1 GoData", 
      "path": "/share/godata/", 
      "endpoint_id": "ddb59aef-6d04-11e5-ba46-22000b92c6ec", 
      "DATA_TYPE": "bookmark", 
      "id": "ec207bf4-eb91-11e5-9829-22000b9da45e"
    }
  ]
}

Bookmark list is not a very compelling case for this because it's small.
Endpoint Search obviously has searching and filtering options available in the API.

The three really strong cases for this feature, I think, are Task List, Event List, and ACL List.
Lots of people will want to do something like this:

$ globus transfer task list | grep 'Globus Tutorial Endpoint 1'
c44056b6-f075-11e5-9833-22000b9da45e | SUCCEEDED  | DELETE     | Globus Tutorial Endpoint 1       | None                             | globus-cli delete
b8a53d98-ed29-11e5-982b-22000b9da45e | SUCCEEDED  | TRANSFER   | Globus Tutorial Endpoint 1       | Globus Tutorial Endpoint 1       | None

or this

$ globus transfer acl list --endpoint-id d1763b75-6d04-11e5-ba46-22000b92c6ec | grep 'identity'             
02ed6f8c-d2aa-11e5-9759-22000b9da45e | identity         | c8aad43e-d274-11e5-bf98-8b02896cf782 | rw          | /otherusers/globusteam/
02ed6f85-d2aa-11e5-9759-22000b9da45e | identity         | c8aad43e-d274-11e5-bf98-8b02896cf782 | r           | /   
01bbeadd-d2aa-11e5-9759-22000b9da45e | identity         | ae2f7f60-d274-11e5-b879-afc598dd59d4 | rw          | /otherusers/ballen/
None                                 | identity         | c8aad43e-d274-11e5-bf98-8b02896cf782 | rw          | /

The easy way to start on this is a raw substring match on the SDK's str(GlobusResponse)
Then we can maybe do regex, and maybe only match on fields inside of a JSON structure, and maybe only apply on visible columns in text output.

I know that there's a case for saying that "Oh, no! We don't want people using Display Name in scripts!", but I don't think that's the only consideration here.
I recognize that there's a danger in allowing someone to do globus transfer task list --grep <display_name> --format JSON > somefile.json, but I don't think that's particularly hard to do on your own regardless.

The question, in my mind, is whether or not some kind of basic filtering -- insufficient for all needs but useful in small doses -- would be a useful thing to have in the CLI.
If we just do a substring match on str(GlobusResponse), the implementation and maintenance cost is very cheap.
Is substring match not good enough? Should we do full regexes?

Is this entire idea an unnecessary feature?
We can always tell people to use grep or jq.
They'll certainly have the former, but not everyone has a command-line JSON filtering tool at the ready.
Although this might be useful for text output, I'm more interested in supporting simple filtering on JSON out. OTOH, maybe just instructing people in the use of jq (my personal favorite) is good enough?

Respect output format for error presentation whenever possible

@bd4 I want us to discuss this a little bit.

I think it makes a lot of sense that globus transfer ls --format json --endpoint-id 'badepid' would produce a JSON error message.

However, there are a number of conditions under which we do not receive JSON from the other end of the connection.
For example, the web server may respond with a raw body of 500 Internal Server Error.
In those cases, should we wrap this in a standard JSON container, like {"message": "500 Internal Server Error"} ?

To me, the target use case is stuff like nonexistent endpoints, which the caller may want to parse and inspect.
They should be able to, relatively safely, hand stderr to something that consumes JSON.

I think that would be totally acceptable, but I'm not sure I've thought through all of the options.
There are errors that we can't handle so easily in this way, namely Python stacktraces.
I don't want to be JSONifying every exception ever thrown, just to slavishly obey the request for --format JSON

Where do we draw the line? Do we say "We will convert any error responses from the APIs to JSON." ?
If we can't crystalize this into a simple statement, we probably don't have clear enough thinking about it yet.

Add Task Wait

It would be nice to have something similar to the old CLI wait command.

The idea is to be a blocking CLI command which returns with status 0 once the Task terminates.
Importantly, it should exit with status 0 even if the Task itself fails.
The exit code is necessary to distinguish network failure (likely on a long-running command) from successful termination.

Should let you write something like this:

#!/bin/sh
...
rc=1
while [ $rc -ne 0 ];
do
    # wait 30m at a time
    globus transfer task wait --task-id '...' --timeout 1800
    echo 'still alive and polling' >2
    rc=$?
done

Some thoughts on what we might want for this, interface-wise:

Command Name

globus transfer task wait.
It's a task-oriented feature, so it makes sense as a subcommand of globus transfer task.

Options

--timeout
This is not strictly necessary, but a waiting process probably wants to heartbeat periodically to show that it's still alive. After N seconds, exit with status 0 if the task is still pending.

--polling-interval
Check the status every N seconds; defaults to 1.
If you plan to poll many tasks simultaneously, you want to be able to reduce network traffic by lengthening the interval between queries.
If you want to watch a task at very high resolution, you'll still be limited by RTT for a given request.
We could do higher resolution polling by running multiple parallel requests, but that seems excessive.

--fail-on-fault-types
This is the most tenuous potential flag -- seems somewhat fragile.
If there are fault events of a particular type in the Task events, it may be desirable to consider it failed. Particularly, I'm thinking of Permission Denied or nonexistent dest dirs.
This would be an enum of string options, passed as a comma delimited list, defaulting to the empty list.

Add bash completion (maybe other forms of completion as well)

The awscli has fairly comprehensive completion support, which may serve as a guide:
http://docs.aws.amazon.com/cli/latest/userguide/cli-command-completion.html

It should be possible, given the structure of click commands, to do a completer as a python function or a hidden option to the CLI.

If we want completers to be pure shell functions, we can add a hidden option to generate a shell script as output, and then ship that result as part of the package data.

Generic error handlers are exiting with status 0, not status 1

This was something found in #32 and fixed there.
However, in case that work is cancelled, delayed, or significantly altered, we need to be sure to fix this issue.

When the custom excepthook was replaced with a high-level dispatch on exception types, I did not correctly add explicit exit(1) calls as I should have.

What should the structure of commands for managing Shares be?

This is a bit of a design question with fairly broad implications for a bunch of commands.

There's some clunkiness regarding shared endpoint management because they're sorta endpoints and they're sorta their own thing.
That they are valid logical endpoints, good as Task and FS op targets, is good and useful, but only if we strip back the meaning of "endpoint" to be exactly that.
Shared Endpoints don't/can't support the full set of Endpoint operations, which makes managing them as a type of endpoint very weird.

That leads to real weirdness in the API as well -- like all endpoints having a nullable "host_path" field.
At the very least, it would be nice to have endpoint subtypes added, so that we're not deducing the endpoint type from which fields are null.

Ideally, in my mind, non-shared endpoints would be "demoted" to be another specialized type of endpoint, something like "physical endpoints" or "gridftp server endpoints" (which distinguishes them from S3), but that's a bigger issue.
For now, focusing on the CLI.

I started looking at #3 again, and realized that there's actually a bit of a model problem posed here.
Regarding shared endpoint creation in particular, I think we have three options.

globus transfer endpoint create --shared --host-endpoint-id ...
globus transfer endpoint share --host-endpoint-id ...
globus transfer share create --host-endpoint-id ...

If I had to distill down this issue to a TL;DR format, it would probably be a choice between those three options, phrased as
Conceptually, which of the following is a Share?

A Special Type of Endpoint
A Resource Belonging to an Endpoint
A First-Class Entity (which happens to depend on an Endpoint)

How we think of Shared Endpoints / Shares and "what they are" will ultimately drive how we present them and manage them in the CLI and potentially other interfaces.

Right now, the API and Web App share the opinion that a Shared Endpoint is a special kind of endpoint. That works really well when you're looking at endpoints that are shared with you, but very badly when you're trying to manage your own shared endpoints.

Should this be divided into two categories?
I would be most inclined towards:

Your own Shared Endpoints are first-class objects which require a host-endpoint as a dependency
Shared Endpoints not owned by you (and even your own when found in searches) are a special case of Endpoints

That duality could be confusing if poorly presented, but I think the other split-brained view of the world -- in which managing Shared Endpoints is a confusing special case of managing an endpoint definition -- is more confusing.
Perhaps the Web UI will continue with its current layout and format, but the CLI, SDK, and other developer tools will present things in a different form? I worry that inconsistency will confuse developers by not mapping nicely to the UI, but right now we'll be confusing developers anyway, so it's not exactly a net loss.

I can take whatever kind of decision I want for now, but this needs to be resolved properly before the first "production" version.

Define a new option type which strictly requires that it is only passed once

We're currently using the Click default behavior when parsing options with multiple=False.
That parses all instances of an option and returns the last one given.

So globus transfer ls --endpoint-id 'abc123' --endpoint-id 'def456' is entirely valid, and results in endpoint_id="def456".

This is fine/safe for options where allowing an override doesn't change the service-side semantics of commands, like --format, but for IDs and paths, it could be the result of user confusion and not result in the desired action.
For example, a globus transfer async-transfer with two sources, two destinations, and two paths is inherently ambiguous, and likely means that the user has wholly misunderstood usage of that command.

To correctly implement a change in this behavior, I think we need to add a new class of option.
Such opts can internally be specified with multiple=True and then assert that len(value) <= 1 where value is the resulting tuple.
I'd suggest that this be a custom callback, but I worry about future opts that need or want a custom callback and this behavior.
The custom option class doesn't add much more complexity, and properly handles that case.

Add `globus logout` command

The inverse of globus login.
This can't touch the consent in Globus Auth, but it should invalidate all of your tokens by revoking them, and scrub them from the config file.

globus login, add output describing result/action after entering Authorization code

The login command could be improved with some additional explanation for what happened after the Authorization code is entered. Below is a sample suggestion.

globus login
Please login to Globus here:
  https://auth.globus.org/v2/oauth2/authorize?code_challenge=<snip>

Enter the resulting Authorization Code here: <snip>

A credential (use appropriate term) for using the CLI was saved for you in (path).  It is valid until <date/time>

> (back at command prompt)

Transfer: add options to task list

filter
limit
search by label (covered by "filter" above?)

Refactor to contain the `click` API surface

Right now, the way that we're using click is dangerous to the long-term health of this project.
It provides a lot of value with the way that it takes care of parsing and dispatch, but unless we wrap it carefully, it will be really difficult to move off of it if we ever need to.

We've already gotten into some interesting customization directions -- like the HiddenOption type.
And there are inevitably going to be some places where we find the parsing provided by click to be confusing or even downright wrong (at least there's pallets/click#619 ).

I don't think we need to obscure the basic usage behind globus_cli_command and globus_cli_option wrappers everywhere -- that's unnecessary and doesn't buy us anything -- but collecting all of the interesting customizations in one place will likely make it much easier to transition if we ever need to.

Expand `globus login` to do a full 3-legged OAuth Flow

There will be general-purpose support for public clients in Globus Auth in the near future.
At that time, we want globus login to use a public client registration to start a 3-legged flow with the following basic usage pattern:

globus login displays a link to copy-paste into a browser window
Globus Auth consumes that link, does a 3-legged flow, and displays for copy-paste an authorization code
globus login (still open) consumes that authorization code as an input and exchanges it for refresh tokens, which are saved
Print instructions for testing that your tokens work

This requires that the SDK changes its token handling to be ready for refresh tokens, and that the support in Globus Auth is completed.

Check if user has authenticated before attempting authenticated calls

This is a bit of an open question.

Right now, if you attempt an authenticated action like globus transfer ls ..., you get a 401 error back.
Ideally, for any call that cannot be made unauthenticated, the relevant commands would give you a "no auth" error of some kind and direct you to globus login.

The big question here is where and how this gets enforced.
Some calls are available unauthenticated, but I'm not sure that it's worth extra burden to support that mode of usage.
If all calls which talk to the service are forced to be made with credentials, then the problem is solved neatly and easily.
However, if we do want to support this case, then we need to figure out where credentials are and are not required.

When HTTPS endpoints enter the ecosystem with possible "public read" ACLs, the desire for unauthenticated calls to be supported may (not necessarily will) increase.

Transfer: review output from task show

Do we want to hide certain fields from the output? Example: event_link and subtask_link?

Fix list-commands output to handle long command names

I assumed for the recent rewrite that it would be okay to assume command names were under 16 chars. That's pretty good, but we have a violator:

=== globus transfer endpoint ===

    search          Search for Globus Endpoints
    show            Display a detailed Endpoint definition
    deactivate      Deactivate an Endpoint
    create          Create a new Endpoint
    update          Update attributes of an Endpoint
    my-shared-endpoint-listList all Shared Endpoints on an Endpoint by...
    server-list     List all servers belonging to an Endpoint
    autoactivate    Activate an Endpoint via autoactivation
    delete          Delete a given Endpoint

my-shared-endpoint-list is a bit of a long name, and probably should be fixed, but that doesn't mean it should break list-commands so easily.

Probable solution is to do a line break and indent the short-help, like so:

=== globus transfer endpoint ===

    search          Search for Globus Endpoints
    show            Display a detailed Endpoint definition
    deactivate      Deactivate an Endpoint
    create          Create a new Endpoint
    update          Update attributes of an Endpoint
    my-shared-endpoint-list
                    List all Shared Endpoints on an Endpoint by...
    server-list     List all servers belonging to an Endpoint
    autoactivate    Activate an Endpoint via autoactivation
    delete          Delete a given Endpoint

We can do that for any command that's going to come too close to the right-hand column (within 2 chars?).

globus / globus-cli Goto Github PK

globus-cli's People

Contributors

Stargazers

Watchers

Forkers

globus-cli's Issues

Command Name

Options

Recommend Projects

Recommend Topics

Recommend Org