Giter Site home page Giter Site logo

globus-pilot's Introduction

Globus Pilot

https://readthedocs.org/projects/globus-pilot/badge/?version=latest&style=flat

License

A Command Line tool for managing data in Globus Search as well as transferring corresponding data to and from a Globus Endpoint.

Installation

Pilot requires python 3.6+, you can install with the following:

pip install globus-pilot

See the Read-The-Docs Page for more options.

Quick Start

For a full walkthrough, see the User Guide. Administrators can also view the Admin Guide.

A quick walkthrough is below.

First, login using Globus:

pilot login

Set your Search Index:

pilot index set <myindex>

Then choose your project. See pilot project info for info on any listed project:

pilot project
pilot project set <myproject>

You can use list to get a high level overview of the data:

pilot list

If you want more detail about a specific search record, you can use describe to view details:

pilot describe dose_response/rescaled_combined_single_drug_growth

You can also download the data associated with the search record:

pilot download dose_response/rescaled_combined_single_drug_growth

When you want to add more data to the collection, you can use the upload command. This will upload the data in addition to creating a record in Globus Search to track it.

touch my_data.tsv
pilot upload my_data.tsv test_dir --dry-run --verbose -j my_metadata.json

The two flags '--dry-run --verbose' are optional but handy for testing. '-j my_metadata.json' is for providing any extra metadata the pilot tool can't automatically determine. Here is an example of the metadata:

{
    "title": "Drug Identifiers",
    "description": "Drug identifiers, including InChIKey, SMILES, and PubChem.",
    "data_type": "Drug Response",
    "dataframe_type": "List",
    "source": [
        "InChIKey",
        "SMILES",
        "PubChem"
    ]
}

Running Tests

Ensure packages in test-requirements.txt are installed, then run:

pytest

And for coverage:

pytest --cov pilot

globus-pilot's People

Contributors

nickolausds avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nickolausds

globus-pilot's Issues

Field metadata dependent on datatype

In the prototype ingest script, there's some ugly code that checks the type of field and then plucks the metadata out:

        for d in schema_dict['fields']:
            c = pandas_col_metadata.get(d['name'])
            d['count'] = int(c['count'])
            if d['type'] == 'string':
                if not numpy.isnan(c['unique']):
                    d['unique'] = int(c['unique'])
                d['top'] = c['top']
                if not numpy.isnan(c['freq']):
                    d['frequency'] = int(c['freq'])
            else:
                d['type'] = str(c.dtype)
                for k in ("25", "50", "75"):
                    d[k] = float(c[k+'%'])
                for k in ('mean', "std", "min", "max"):
                    d[k] = float(c[k])

whereas in analysis.py this logic is missing

    metadata = {
        'name': field_name,
        'type': 'string' if str(pmeta.dtype) == 'object' else str(pmeta.dtype),

        # numerical statistics
        '25': pmeta['25%'],
        '50': pmeta['50%'],
        '75': pmeta['75%'],
        'mean': pmeta['mean'],
        'std': pmeta['std'],
        'min': pmeta['min'],
        'max': pmeta['max'],

        # string/object  statistics
        'count': int(pmeta['count']),
        'unique': pmeta['unique'],
        'top': pmeta['top'],
        'frequency': pmeta['freq'],
    }

And it fails when certain keys are missing.

Bad (old) error message raised for invalid project metadata

If the 'project_metadata' key is not present, pilot will complain about a few required fields (which are not actually required fields!). In fact, project_metadata does not actually need any fields, it only needs to be present. We could even relax that requirement.

Note: The Pilot Client doesn't really suffer from this bug, due to automatically generating valid metadata. This only happens if calling into the SDK (like I was).

Review Documentation, Check for Each Command

Ensure that all of the pilot commands are exercised and demonstrated.

User Guide

Demonstrate the user commands in the following order

  • help
  • version
  • login
  • whoami
  • profile
  • profile -i
  • profile --local-endpoint
  • project
  • project info
  • project set
  • project update
  • list
  • describe
  • download
  • status
  • logout

Project Admin Guide

  • project
  • project add
  • project push
  • project delete
  • project edit
  • upload
  • mkdir
  • delete

Identity and other features

  • Fix delete command

    • Delete command now allows deleting by entry, or deleting the whole subject
  • Add '--delete-data' flag to allow deleting data in addition to search subjects/entries

  • Change output of field metadata in pilot describe to match the field order of the portal

  • Fix upload command re-uploading data even if the file hasn't changed

  • Change pilot upload --dry-run --verbose foo bar to format output like the describe command

  • Fix upload command starting on version 2

  • Add update subcommand to update search metadata associated with the file without requiring access to the file itself.

  • Add a default project configuration including:

    • Search index
    • default visible_to group
    • Base endpoint and path
    • Default identity provider
    • Include prod and test for each project
  • Pull the creator identity from the logged in user based on the project IdP

    • Allow to be overridden by user provided metadata
  • Try to determine mime type automatically, override via user metadata

  • Support using Globus transfer for download command

Later

  • Add multi-file handling (pilot upload -j <metadata.json> <file1> <file2> ... <fileN> <dir>
  • May need to switch to resource descriptions to enable field metadata on a per-file basis.

login triggers error on local_endpoint

Login goes fine, but then attribute needs to be set.

I had deleted the config file and GCP was not running. Error also happens when GCP is running.

Logging in when the config file is in place with the local endpoint is configured does not cause the error.

(base) Magik:~ rpwagner$ pilot login
You have been logged in.
Traceback (most recent call last):
  File "/Users/rpwagner/anaconda3/bin/pilot", line 10, in <module>
    sys.exit(cli())
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/commands/auth/auth_commands.py", line 44, in login
    globus_sdk.LocalGlobusConnectPersonal().endpoint_id
AttributeError: can't set attribute
(base) Magik:~ rpwagner$ 

Metadata dropped on subsequent uploads

When doing the following:

  • pilot upload -j metadata.json foo.txt /
  • pilot upload foo.txt /

'metadata.json' will be lost for the foo.txt record. It should instead be carried over for each update.

Delist the `pilot push` command

Pushing is now automatic on project creation. This is still useful as a maintainer command. For example, if a project gets into an unstable state we can manually tinker with the config file and do a pilot push to fix the master manifest. I don't think regular users will see much use for it, though.

Add `--wipe-data-only` to `pilot project delete` command.

For XPCS, we needed to wipe all data on the index, but we didn't really want to delete the whole project and all the settings/descriptions associated with it. A flag to keep the project but wipe all the data would be very convenient.

Proposed: pilot project delete myproject --wipe-data-only
Deletes data and wipes search records within project, but does not delete or change the project itself.

Spec change for user provided metadata

Currently, as described in docs here: https://github.com/globusonline/pilot1-tools/blob/master/docs/reference.rst, users provide all metadata in a flat list. This is different from how pilot internally stores data, which is in a few nested keys for 'dc', 'project-metadata' and 'files'.

Pilot should require users to supply metadata in the same form it uses to construct documents, that way updating existing documents is easier.

Example for how users should structure metadata:

{
    "dc": {
        "creators": [
            {
                "creatorName": "Saint, Nickolaus"
            }
        ],
        "dates": [
            {
                "date": "2019-10-09T17:39:08.181483Z",
                "dateType": "Created"
            },
            {
                "date": "2019-10-09T17:39:08.183487Z",
                "dateType": "Updated"
            }
        ],
        "formats": [
            "text/plain"
        ],
        "publicationYear": "2019",
        "publisher": "Globus",
        "resourceType": {
            "resourceType": "Dataset",
            "resourceTypeGeneral": "Dataset"
        },
        "subjects": [
            {
                "subject": "machine learning"
            },
            {
                "subject": "genomics"
            }
        ],
        "titles": [
            {
                "title": "t.txt"
            }
        ],
        "version": "2"
    },
    "files": [
        {
            "filename": "t.txt",
            "length": 20,
            "md5": "29511287b8ffc8b314e91f9de91229b8",
            "mime_type": "text/plain",
            "sha256": "a43ba50a586df444e17b53a1f2e83fd7c1f0506e1183825877dcb3ab5eba1d49",
            "url": "https://ebf55996-33bf-11e9-9fa4-0a06afd4a22e.e.globus.org/projects/nick-testing/t.txt"
        }
    ],
    "project_metadata": {
        "project-slug": "nick-testing"
    }
}

Requested feature list for Nov 8th

Here's a list of changes we discussed from the last call.

  • Change pilot list to show directories in the project's folder
  • For pilot upload, show warning when Globus connect personal is not running
    • Attempt to autoactivate the endpoint when starting a transfer
  • After pilot upload, change 'success' message to tell the user they should run pilot status to check the status of transfer. (Currently, it implies everything succeeded and simply posts a link to the portal, which may not be the case if the files are still transferring, or are failing to start for some reason).
  • For pilot status - get the ‘nice status’ instead of 'basic' status, which does not capture transfer errors during an active transfer, such as when GCP isn't running.
  • For pilot upload —help
    • Link to docs for how to upload metadata (which fields supported, how to format etc)
  • pilot describe should show number of files if user uploads multiple files in a directory
    • show each files's type in parenthesis (text/plain) when listing
    • Show total number of files per-mimetype (when listing mimetypes)
    • limit the number of files shown to 10

Make select commands support 'n' args

Several commands that support only one command could support several. For example, pilot mkdir foo bar baz could create all three folders, although currently those can only be done one at a time.

List of commands that could support nargs:

  • mkdir
  • delete
  • download

project delete requires a project to be set

Doesn't have to be the project being deleted.

(base) Magik:type-class rpwagner$ pilot project delete pilot-tutorial-2
Traceback (most recent call last):
  File "/Users/rpwagner/anaconda3/bin/pilot", line 10, in <module>
    sys.exit(cli())
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/commands/project/project.py", line 194, in delete
    results = search.search_commands.search_by_project(project=project)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/commands/search/search_commands.py", line 35, in search_by_project
    return sc.post_search(pc.get_index(), search_data).data
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/client.py", line 82, in get_index
    return self.project.get_info(project)['search_index']
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/project.py", line 88, in get_info
    raise PilotInvalidProject(f'No project exists {project}')
pilot.exc.PilotInvalidProject: No project exists None
(base) Magik:type-class rpwagner$ pilot project set ncipilot1
Current project set to ncipilot1
(base) Magik:type-class rpwagner$ pilot project delete pilot-tutorial-2

////////////////////////////////////////////////////////////////////////////////
DANGER ZONE
////////////////////////////////////////////////////////////////////////////////
This will delete all data and search results in yourproject.
0 datasets will be deleted for pilot-tutorial-2
////////////////////////////////////////////////////////////////////////////////
DANGER ZONE
////////////////////////////////////////////////////////////////////////////////
Please type the name (pilot-tutorial-2) of your project to delete it> pilot-tutorial-2
Deleting Data...
Deleting Search Records...
Removing project...
Project pilot-tutorial-2 has been deleted successfully.
(base) Magik:type-class rpwagner$ 

pilot upload needs to handle arbitrary file types

We can't assume they're TSV. Need to try extracting column metadata then try a different way, then go for minimal file metadata.

pilot upload GDC_metadata.txt /
Traceback (most recent call last):
  File "/Users/rpwagner/anaconda3/bin/pilot", line 10, in <module>
    sys.exit(cli())
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/commands/transfer/transfer_commands.py", line 81, in upload
    skip_analysis=no_analyze)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/search.py", line 122, in scrape_metadata
    skip_analysis=skip_analysis),
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/search.py", line 285, in gen_remote_file_manifest
    metadata = analyze_dataframe(filepath, fkeys) if not skip_analysis else {}
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/analysis.py", line 71, in analyze_dataframe
    ts_info = tableschema.Schema(tableschema.infer(filename)).descriptor
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/tableschema/infer.py", line 25, in infer
    descriptor = table.infer(limit=limit, confidence=confidence)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/tableschema/table.py", line 148, in infer
    with self.__stream as stream:
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/tabulator/stream.py", line 162, in __enter__
    self.open()
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/tabulator/stream.py", line 250, in open
    raise exceptions.FormatError(message)
tabulator.exceptions.FormatError: Format "txt" is not supported

Add support for 'resolvables' instead of only 'shortpaths'

Currently, Pilot encourages the use of shortpaths for most operations. For example, to download a file 'foo', you specify pilot download foo, or pc.download('foo'), and pilot will resolve the project, endpoint, and search index based on the current user settings. 'foo', would resolve to something like https://<endpoint_uuid>.e.globus.org/<base project path>/foo.

For some operations, you don't have the short path, and it would be more convenient to simply download by URL. For example, if a user has the url https://<endpoint_uuid>.e.globus.org/<base project path>/foo, there are two options:

  1. Manually hack away at the url to find the short path (Or in the case of the sdk, write a hacky script to derive the shortpath)
  2. Dig into the Pilot Client to find a tool that will resolve a full URL. Currently, PilotClient.get_http_client() can resolve full paths, but still derives the hostname, so users need to hack off the https://<endpoint_uuid>.e.globus.org/ part.

Both options require hacking on the URL to figure out a path Pilot will accept. It would be ideal if Pilot supported 'resolvables' instead, which could be a full URL, a full path, or a shortpath. That way, pilot could be used in areas where the user doesn't have a shortpath, but something that should be able to resolve a given resource. These should all work:

pilot download foo.txt
pilot download base_project_dir/foo.txt
pilot download https://<endpoint_uuid>.e.globus.org/base_project_dir/foo.txt
pilot download globus://<endpoint_uuid>/base_project_dir/foo.txt

Alternatively, these should also be available via scripting:

pc.download('foo.txt')
pc.download('base_project_dir/foo.txt')
pc.download('https://<endpoint_uuid>.e.globus.org/base_project_dir/foo.txt')
pc.download('globus://<endpoint_uuid>/base_project_dir/foo.txt')

These commands could benefit from using 'resolvables' instead of 'shortpaths':

  • pilot download
  • pilot describe
  • pilot delete
  • pilot mkdir
  • pilot list

Additionally, this would allow implicitly describing/downloading files from a different project or index if the user provides a fully resolvable path. For example, the command below should be able to work regardless of the user's current context or project, since all of the necessary info is in the URL:

pilot download https://<endpoint_uuid>.e.globus.org/base_project_dir/foo.txt

HTTP PUT does not fully overwrite files

When doing a PUT on files using the HTTP module, if the new file is shorter than the older file, remnants of the old file will remain at the end of the new file. For example, if you did two PUTs, one after the other with the contents:

  • File1: This is a big long description with lots of words
  • File2: Shorter Description

The result will be: Shorter Descriptiondescription with lots of words

Add delete

Add pilot delete <subject or path>. Options for deleting both the file or just the metadata.

Add caching to speed up pilot operations

pilot list, pilot describe and other commands which search the remote filesystem can be slow due to needing to check various things before committing operations. This could be sped up drastically if we kept a local cache of search results and directory structure.

The cache should automatically go stale after a certain period of time. We may also want to set options to modify these behaviors, such as pilot project --clear-cache and pilot profile --no-cache to disable caching entirely.

I've done an implementation of this before for the petreldata.net portal here, however that only caches the projects config, not the directory structure of the remote endpoint.

Set local endpoint

Users should be able to set the local endpoint and path for data to use a different DTN. For example, when running on a machine like Cooley, they would use alcf#dtn_theta and a base path like their home directory.

pilot project add with custom groups

When creating a group when group is not set to public, pilot upload foo / will fail due to the gmeta visible_to not being formed quite right.

Remove the `whoami` command

The pilot profile command seems to be a duplicate to the whoami command. Should we simply include the user identity within the pilot profile command and remove the whoami command?

Upload should disregard absolute paths

The upload destination path needs to be sanitized and relative to the project default area. Currently, pilot upload <foo> / will cause data to be written to the root of the endpoint.

Desired behavior:

  • pilot upload <foo> / should write file <foo> to the base project path
  • pilot upload <foo> <moo> and pilot upload <foo> </moo> should behave the same, uploading <foo> to directory </moo>

Current behavior:

(base) Magik:combo rpwagner$ pilot upload lincs1000.tsv 
No Destination Provided. Please select one from the directory:

(base) Magik:combo rpwagner$ pilot upload lincs1000.tsv /
Ingesting record into search...
Success!
Starting Transfer...
The transfer has been accepted and a task has been created and queued for execution. You can check the status below: 
https://app.globus.org/activity/c5ff9ca8-9556-11e9-bf5c-0e4a062367b8/overview
URL will be: https://ebf55996-33bf-11e9-9fa4-0a06afd4a22e.e.globus.org/lincs1000.tsv
(base) Magik:combo rpwagner$ pilot upload lincs1000.tsv datasets
Ingesting record into search...
Success!
Starting Transfer...
The transfer has been accepted and a task has been created and queued for execution. You can check the status below: 
https://app.globus.org/activity/dd489392-9556-11e9-8e6d-029d279f7e24/overview
URL will be: https://ebf55996-33bf-11e9-9fa4-0a06afd4a22e.e.globus.org/projects/Pilot-Tutorial/datasets/lincs1000.tsv
(base) Magik:combo rpwagner$ 

Track the list of available groups in the project manifest.

Currently, the list of groups available to admins when they create a project are hardcoded into the Pilot Client. We should be able to dynamically update this list, so ideally the list of groups should be stored alongside projects in the master project manifest.

whoami command broken

(base) Magik:~ rpwagner$ pilot whoami
Traceback (most recent call last):
  File "/Users/rpwagner/anaconda3/bin/pilot", line 10, in <module>
    sys.exit(cli())
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/rpwagner/anaconda3/lib/python3.7/site-packages/pilot/commands/auth/auth_commands.py", line 65, in whoami
    info = config.get_user_info()
AttributeError: 'Config' object has no attribute 'get_user_info'

Consider Siegfried as base mimetype detector

Let's compare Siegfied (repo) to puremagic for mimetype detection. It also does hashes so we could overload it a bit. Alternatively, we might just have to get good at working with signature files and extend puremagic. Eventually we'll want to get into custom mimetype detection for specialized file types.

$ sf -hash sha256 xpcs/A001_Aerogel_1mm_att3_Lq0_001_0001-1000.hdf 
---
siegfried   : 1.7.13
scandate    : 2019-11-06T23:47:25-08:00
signature   : default.sig
created     : 2019-08-18T15:32:39+02:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V95.xml; container-signature-20180917.xml'
---
filename : 'xpcs/A001_Aerogel_1mm_att3_Lq0_001_0001-1000.hdf'
filesize : 17052754
modified : 2019-10-09T19:52:49-07:00
errors   : 
sha256   : fad48c8eea6a2e72fd43fa79f5e9f80c66d641d488aa38d2eaf816f8c4d5a604
matches  :
  - ns      : 'pronom'
    id      : 'fmt/807'
    format  : 'HDF5'
    version : '0'
    mime    : 
    basis   : 'extension match hdf; byte match at 0, 9'
    warning : 

Bad state possible if user revokes consent with refresh tokens

If a user logs in with refresh tokens, then revokes user consent to make them inert, native login will get stuck in a bad state. It will attempt to refresh the access token with the bad refresh token, but won't handle the invalid refresh token grant. This can also cause logout to not work properly if the caller is checking whether tokens are active by attempting to load them, due to the wrong exception being thrown (globus_sdk.exc.AuthAPIError instead of TokensExpired).

I'm not sure if this is a problem with how pilot is calling into the native client, or if native client needs to be smarter in not assuming refresh tokens are valid. Probably both. I'll open an issue in both places just in case.

(exalearn) Firefly:scripts nick$ pilot logout
Traceback (most recent call last):
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/fair_research_login/client.py", line 134, in load_tokens
    check_expired(tokens)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/fair_research_login/token_storage/storage_tools.py", line 12, in check_expired
    raise TokensExpired(resource_servers=expired)
fair_research_login.exc.TokensExpired:  auth.globus.org, petrel_https_server, search.api.globus.org, transfer.api.globus.org

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nick/anaconda3/envs/exalearn/bin/pilot", line 11, in <module>
    load_entry_point('pilot1-tools==0.3.1.dev0', 'console_scripts', 'pilot')()
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/click/core.py", line 1134, in invoke
    Command.invoke(self, ctx)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/pilot1_tools-0.3.1.dev0-py3.7.egg/pilot/commands/main.py", line 30, in cli
    if pc.is_logged_in():
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/pilot1_tools-0.3.1.dev0-py3.7.egg/pilot/client.py", line 48, in is_logged_in
    self.load_tokens()
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/fair_research_login/client.py", line 139, in load_tokens
    tokens.update(self.refresh_tokens(expired))
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/fair_research_login/client.py", line 158, in refresh_tokens
    authorizer.check_expiration_time()
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/globus_sdk/authorizers/renewing.py", line 170, in check_expiration_time
    self._get_new_access_token()
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/globus_sdk/authorizers/renewing.py", line 134, in _get_new_access_token
    res = self._get_token_response()
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/globus_sdk/authorizers/refresh_token.py", line 84, in _get_token_response
    return self.auth_client.oauth2_refresh_token(self.refresh_token)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/globus_sdk/auth/client_types/native_client.py", line 136, in oauth2_refresh_token
    return self.oauth2_token(form_data)
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/globus_sdk/auth/client_types/base.py", line 400, in oauth2_token
    "/v2/oauth2/token", response_class=response_class, text_body=form_data
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/globus_sdk/base.py", line 288, in post
    retry_401=retry_401,
  File "/Users/nick/anaconda3/envs/exalearn/lib/python3.7/site-packages/globus_sdk/base.py", line 553, in _request
    raise self.error_class(r)
globus_sdk.exc.AuthAPIError: (400, 'Error', 'invalid_grant')

Add '-s' flag for uploading custom subjects

I ran into a problem with Ryan when ingesting records on XPCS, where previously single-file datasets had been uploaded, but were going to be replaced with folders containing supporting files and images. This was a problem. The directories didn't have the .hdf extension, and so pilot insisted on uploading new records instead of updating the old ones. This is a problem any time the filename of the file being uploaded changes. I think it will be a relatively common case where a file changes, maybe significantly, but should be updated instead of uploaded anew.

Proposed change would add the -s or --subject flag to the upload and register commands. This forces the short name to be whatever the user wants it to be. An example could look like this:

pilot upload -s bar.txt foo.txt /  # Upload 'foo.txt' as 'bar.txt' to '/'
pilot describe bar.txt  # The new record shortname will now always be `bar.txt`
pilot download bar.txt
pilot upload -u -s bar.txt foo_v2.txt /

Existing metadata overwritten by new uploads

Some data we probably want to keep is overwritten on multiple uploads/registers.

For example, given the creator name 'John' set manually into a metadata.json file and uploaded with pilot upload -j metadata.json foo.txt /. On a second upload without the metadata, the creator name will be overwritten by the pilot user, for example pilot upload -u foo.txt / will overwrite the creator from 'John' to 'Frank', if 'Frank' was the logged in pilot user doing the upload.

We probably don't want to overwrite those fields unless they are missing or specifically specified in another metadata.json file.

Local path and Globus path can de-sync on user machine

If a user has configured a custom path through GCP, paths may not correspond to local paths on the user's machine. This is configurable via pilot profile --set-endpoint, but if the paths still diverge it will cause problems for uploading dataframes.

add mkdir

Add a pilot mkdir <foo> that creates a new directory under project base path

Add Github testing integration

Now that this project is public, it might be nice to start running automated unit tests through travis-ci or similar. Coverage would also be nice.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.