Giter Site home page Giter Site logo

cernanalysispreservation / cap-client Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 8.0 241 KB

This repository provides the CAP command-line client that researchers could use to talk to some CAP server in order to preserve and manage their analyses.

License: GNU General Public License v2.0

Shell 1.22% Python 98.67% Dockerfile 0.11%

cap-client's People

Contributors

annatrz avatar artemislav avatar atrisovic avatar ioannistsanaktsidis avatar lilykos avatar pamfilos avatar papadopan avatar parths007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cap-client's Issues

cli: get repositories method [2h]

We would like to have a cli method, that returns user a list of repositories attached to his analysis.

Example command:
cap-client repositories get --pid <my-pid> --with-snapshots

By default, we don't want to show all the snapshots for webhooks, just repositories details. User can ask for snapshots using --with-snapshots flag.

We need to make a request (in CapAPI class) like:

self._make_request(
            urljoin('deposits/', pid),
            method='get',
            expected_status_code=200,
            headers={'Accept': 'application/repositories+json'},
        )

Then we have to filter results, depending if user asked for full details (with snapshots) or not.
If the user wanted to see all the details you just return a response directly, if without snapshots you filter data like that:
for x in response['webhooks']: x.pop('snapshots')

You need to register your new get command under repositories click group, so you can call it from cap-client repositories ..

Example how it's done for permissions:

@permissions.command()
@click.option(
    '--pid',
    '-p',
    help='Get permissions of the deposit with given pid',
    default=None,
    required=True,
)
@click.pass_context
def get(ctx, pid):
    """Retrieve analysis user permissions."""
    try:
        response = ctx.obj.cap_api.get_permissions(pid=pid)
        click.echo(json.dumps(response, indent=4))
    except Exception as e:
        logging.error('Unexpected error.')
        logging.debug(str(e))

You need to write something very very similar, just add --with-snapshots flag. Filtering results should be inside your cap_api.get_repositories method.

Write test to check your command in both cases (with snapshots and not) and with error handling.

cli: add all parameter for `get-shared` command [1h]

cap-client get by default returns only drafts created by a user, and by flag all you can see all the drafts that user has access to
Let's make a similar behaviour for cap-client get-shared.

You need to pass all param here, and then:

  • if all is False - query /records/?q=&by_me=True
  • if all is True - query /records

NOTE this method is used also for fetching a specific record (with given PID), so make sure your changes don't break this functionality.

UPDATE Sorry, didn't mention, need to add/update tests to check behaviour with your added parameter.

Add cap-client ping

Add CLI skeleton with first command: check status of connection with analysis-preservation server

cli: update get-schema method [2h]

    def get_schema(self, ana_type=None, version='0.0.1'):
        """Retrieve schema according to type of analysis."""
        types = self._get_available_types()

        if ana_type not in types:
            raise UnknownAnalysisType(types)

        response = self._make_request(
            url='schemas/deposits/records/{}-v{}.json'.format(
                ana_type, version))

        schema = {
            k: v
            for k, v in response.get('properties', {}).items()
            if not k.startswith('_')
        }

        return schema
  • remove default version parameter from client-side (server by default will return the latest version of the given schema, if version not passed)
  • remove types check (new endpoint does the check itself)
  • hit an https://analysispreservation.cern.ch/api/jsonschemas/{schema_type}/{version}?resolve=True ** NOTE ** resolve is an important flag, that will resolve all the $ref in the schema, check how schema like cms-analysis look like with resolve=False and resolve=True
  • add flag to return deposit/record schema
  • the result is a dictionary, with various fields, depending on the flag above you have to return either deposit_schema or record_schema to the user
  • write/update tests for your command to check all the cases

Improve --help documentation to match full documentation

In the last user test iteration, we commonly observed analysts attempting to use cap-client --help to find a command to perform a desired task. They drew back on --help, even though the full documentation was already loaded in the browser. While the full documentation proved to be helpful for all tasks, this was not the case for the command overview resulting from --help. For example, a user who wanted to update an analysis field, did not manage doing so with the command line --help. It only lists metdata with the description Metadata managing commands. As we see that the full documentation is effective, improving the --help overview by drawing back on already existing descriptions should improve the usability.

cli: make possible to pass json from command line

For now for commands like create, update, we can pass json from file, using --file option. Two things should be done:

  • rename this option to --json or --json-file, as can be confused with --file used for commands like update (when you can pass every kind of file)
  • make possible to pass json directly through command line, e.g.
    cap-client create {} --type lhcb

tests: test all the commands [5h]

Install cap-client locally with current master and try all the commands.
Prepare a note with every command you run and the output you've got.

Example:
cap-client --help to see all the commands

Usage: cap-client [OPTIONS] COMMAND [ARGS]...

  CAP Client for interacting with CAP Server.

Options:
  -v, --verbose                   Verbose output
  -l, --loglevel [error|debug|info]
                                  Sets log level
  -t, --access_token TEXT         Sets users access token
  --help                          Show this message and exit.

Commands:
  clone         Clone analysis with given pid.
  create        Create an analysis.
  delete        Delete analysis with given pid.
  files         Files managing commands.
  get           Retrieve one or all analyses from a user.
  get-schema    Retrieve analysis schema.
  get-shared    Retrieve one or all shared analyses from a user.
  me            Retrieve user info.
  metadata      Metadata managing commands.
  permissions   Permissions managing commands.
  publish       Publish analysis with given pid.
  repositories  Repositories managing commands.
  types         Retrieve all types of analyses.

cap-client metadata --help pick one to test, use help to see all the commands in metadata group

Usage: cap-client metadata [OPTIONS] COMMAND [ARGS]...

  Metadata managing commands.

Options:
  --help  Show this message and exit.

Commands:
  append  Edit analysis field adding a new value to an array.
  get     Retrieve one or more fields in analysis metadata.
  remove  Remove analysis field.
  set     Edit analysis field value.

cap-client metadata get --help see what can be parameters for metadata get method

Usage: cap-client metadata get [OPTIONS] [FIELD]

  Retrieve one or more fields in analysis metadata.

Options:
  -p, --pid TEXT  Get metadata of the deposit with given pid  [required]
  --help          Show this message and exit.

So you see it needs a pid argument, which is required, so it means, you have to test three cases:

  • when existing PID passed cap-client metadata get --pid non-existing-pid
  • when non-existing pid passed cap-client metadata get --pid existing-pid
  • when no pid passed cap-client metadata get

Write down all the commands you run and output/error you've got.

cli: clean output

By default client shouldn't display all the debug information, with [INFO] or [DEBUG] at the beginning, we want to just see the output.
Make possible to see more verbose output with --verbose flag.

cli: dir upload to delete leftover tar file

screen shot 2018-07-24 at 12 00 35

the tar file stays in the system (see fig.), while it should be deleted after the upload

Description:

while uploading a directory for example: cap-client -v files upload newdir -p bf6b8501822c4d2ba46028611354df7e

the client creates a temporary tar file which remains in the system after the upload is over.

Use access tokens

Make it possible to authorize users, by passing the access token generated within CERN Analysis Preservation app.
Access token passed as a parameter of CLI or set as env variable

cli: pass oauth token in headers [1h]

Currently we pass auth token in URL.

Update CapAPI._make_request method, so instead of passing access_token in params, adds this one to headers:
"Authorization: OAuth2 <access_token>

cli: add get field method

Example:

$ cap-cli metadata get field_name --pid ana_pid
field_value

Nested fields should be access with dots, e.g. basic_info.ana_title

Add validators output in response

For now, when using API, we validate passed schema, but don't return any information, which fields were incorrect.
We want to returns response from JSON validators on POST, PUT requests.

Python 3 compatibility

I tried using the client using the Python that ships with my Linux distribution, v3.6.5 but this failed:

$ cap-client --help
Traceback (most recent call last):
  File "/home/apearce/tmp/venv/bin/cap-client", line 11, in <module>
    load_entry_point('cap-client==0.0.2', 'console_scripts', 'cap-client')()
  File "/home/apearce/tmp/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 480, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/apearce/tmp/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2693, in load_entry_point
    return ep.load()
  File "/home/apearce/tmp/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in load
    return self.resolve()
  File "/home/apearce/tmp/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2330, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/apearce/tmp/venv/lib/python3.6/site-packages/cap_client/cli/__init__.py", line 33, in <module>
    from cap_client.cap_api import CapAPI
  File "/home/apearce/tmp/venv/lib/python3.6/site-packages/cap_client/cap_api.py", line 30, in <module>
    from urlparse import urljoin
ModuleNotFoundError: No module named 'urlparse'

I suspect there are probably other issues with Python 3 compatibility, but I haven't checked.

cli: add upload objects method

Add method to upload objects to existing analysis, like:
$ cap-client files upload file_name --file foo.dat --pid analysis_pid

Upload file to the bucket associated with given analysis.

cli: upload repository method [2h]

In CapAPI class, add a method for uploading a repository from URL.

Example of how it looks for publishing:

def publish(self, pid):
        return self._make_request(url='deposits/{}/actions/publish'.format(pid), 
                                  expected_status_code=202,
                                  method='post',
                                  headers={'Content-Type': 'application/json',
                                           'Accept': 'application/basic+json'})

For upload repository we need to make request with:

  • an endpoint that should be called is deposits/{pid}/actions/upload
  • expected status code is 201
  • method post
  • data (needs json.dumps)
    • url
    • webhook (true|false)
    • event_type (release|push)'
  • headers
    {'Content-Type': 'application/json', 'Accept': 'application/basic+json'}

Currently, we don't pass repositories field in basic serializer (that's the one you're asking for in headers) - so no way to validate response JSON - for now just check that status code was 201.
We will decide in a separate thread on creating a serializer for repositories part.

Write tests to check how your method behaves in cases:

  • 400 returned from the server (e.g. wrong URL like http://nongithubhost.com)
  • just repo upload
  • repo upload with push webhook
  • repo upload with release webhook
  • user doesn't have sufficient permission

Connect method to CLI, create a file like cap_client/cli/files_cli.py. It needs registering click group, like:

@click.group()
def repositories():
    """Repositories managing commands."""

And then upload command registered under this group so you can call it like:
cap-client repositories upload my_url --webhook push
We can have a separate command for upload and creating webhook, or as a parameter like above (to be decided)

Remember to add a line at the end of cap_client/cli/__init__.py file:
cli.add_command(repositories)
Without this one, you won't see your repositories command when calling cap-client

UPDATE let's do this one after cernanalysispreservation/analysispreservation.cern.ch#1547
then, you want to make a request in your method with the header ('Accept', 'application/repositories+json'), like:

    def upload_repository(self, pid, url, event_type=None):
        """Your method."""
        return self._make_request(
            url=f'deposits/{pid}/actions/upload',
            data=json.dumps(
                dict(url=url,
                     event_type=event_type if event_type else None,
                     webhook=True if event_type else False)),
            method='post',
            headers={
                'Content-Type': 'application/json',
                'Accept': 'application/repositories+json'
            })

write yours tests according to repositories serializer format

cli: add set field method

Now we have patch method, that allows users to patch json passing operations in JSON Patch format, so to set field we need to use:

[{ "op": "replace", "path": "/field_name", "value": "field_value" }]

Would be simpler to let users run commands like:

cap-cli metadata set field_name field_value --pid ana_pid

cli: update create method [3h]

    def create(self, json_='', ana_type=None, version='0.0.1'):
        """Create an analysis."""
        types = self._get_available_types()

        if ana_type not in types:
            raise UnknownAnalysisType(types)

        if not json_:
            raise MissingJsonFile()

        try:
            data = json.loads(json_)
        except ValueError:
            with open(json_) as fp:
                data = json.load(fp)

        data['$ana_type'] = ana_type
        json_data = json.dumps(data)

        response = self._make_request(
            url='deposits/',
            method='post',
            data=json_data,
            expected_status_code=201,
            headers={'Content-Type': 'application/json'})

        return self._make_request(url='deposits/{}'.format(
            response.get('metadata', {}).get('_deposit', {}).get('id', '')),
                                  method='put',
                                  data=json.dumps(response.get('metadata',
                                                               {})),
                                  expected_status_code=200,
                                  headers={
                                      'Content-Type': 'application/json',
                                      'Accept': 'application/basic+json'
                                  })
  • json-file should be a required field (also rename to json, as can be both - file and command line)
  • if ana_type is passed, $schema shouldnt be in a json (raise an error)
  • if $schema in json, ana_type shouldnt be there (raise an error)
  • version parameter is not supported by backened so should be removed
  • dont make call to available types, user can call it from different command
  • fix an issue when create called without valid access token, example:
    Screenshot 2020-03-12 at 16 51 00
  • make put request to url='deposits/{}'.format(response['id'])
  • write/update tests to check all the cases for your updated method

cli: refactor structure of cap-client commands

cap-client resources:

  • me
  • metadata [PID]
    • get [PATH]
    • set [PATH] —file <file_to_upload> —ref <url_ref_record_ti_point>
    • append [PATH]
    • remove [PATH]
  • files [PID]
    • list
    • upload [FILE]
    • download [file_key_as_on_cap_server] [output_file]
    • remove [file_key_as_on_cap_server]
  • permissions [PID]
    • get [SCOPE (“read”, “update”, “admin”)]
    • add [SCOPE (“read”, “update”, “admin”)] --user [EMAIL]
    • remove [SCOPE (“read”, “update”, “admin”)] --user [EMAIL]
  • get [PID]
  • create [DEPOSIT_TYPE] —json <json_metadata_file>
  • delete [PID]
  • publish [PID]
  • clone [PID]
  • get-shared [PID]
  • types

Anything else we might need, changes?

cli: add append method

The idea is to have option to append element to the array of elements in analysis metadata, e.g. add new file
(implementation just wraps JSONPatch format, for adding element at the end of array)

cap-cli metadata append field_name field_value --pid ana_pid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.