Giter Site home page Giter Site logo

fedcloud-catchall-operations's Introduction

fedcloud-catchall-operations

Operation of fedcloud integration components for selected providers.

Site Configuration

This repository consists of the main configuration for the fedcloud catchall operations. For every endpoint, a file in the sites directory should describe its configuration with a format as follows:

gocdb: <name in gocdb of the site>
endpoint: <keystone endpoint of the site>
# optional: use central image sync
images:
  # true, get sync, false do not
  sync: true
  # a list of supported formats of the site can be specified
  # if not available, no conversion will be done, so whatever format
  # is available in AppDB will be used
  formats:
    - qcow2
    - raw
# optionally specify a protocol for the Keystone V3 federation API
protocol: openid | oidc (default is openid)
# optionally specify a region name if using different regions
region: myregion
vos:
  # List of VOs defined as follows
  - name: <vo name>
    auth:
      project_id: <project id supporting the VO vo name at the site>
    # any other optional configuration for cloud-info-provider, e.g:
    # not really used for now
    defaultNetwork: private | public | private_only | public_only
    publicNetwork: <name of the public network>

Docker containers

Components are run as docker containers, which if not available upstream, are generated in this repository.

Deployment

Deployment is managed with GitHub actions, there is a VM for the cloud-info-provider and one VM for the image sync. Check the deploy directory for details. Configuration is done with ansible using a dedicated role:

ansible-playbook -i inventory.yaml --extra-vars "@secrets.yaml" playbook.yaml

where:

  • inventory.yaml contains the ansible inventory with the host to configure
  • secrets.yaml contains the credentials for every configured VO and a valid token for the AMS
  • playbook.yaml is an ansible playbook that just uses the catchall role to configure the host

fedcloud-catchall-operations's People

Contributors

aidaph avatar alfonpd avatar andrea-manzi avatar astalosj avatar berkas1 avatar catalincondurache avatar cesga-rdiez avatar daikema avatar danielmartinez avatar dealfonso avatar dependabot[bot] avatar egi-ilm avatar enolfc avatar feyzaeryol avatar freznicek avatar hbayindir avatar jan-krystof-csnt avatar jiri256 avatar kelaamm avatar kira2600 avatar marcoverl avatar marcvs avatar mariojmdavid avatar mkszuba avatar mrorro avatar otemizsoylu avatar rosinec avatar scai-malin avatar sebastian-luna-valero avatar wetzel-desy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fedcloud-catchall-operations's Issues

Add some testing

At least some sanity check that the helm is correctly generated

Handle access/refresh tokens

Access tokens are revoked if a newer one is obtained from the originating refresh token so concurrently refreshing tokens does not seem a good idea. Could we have something that refreshes tokens independently and make those accessible to whatever needs them?

Deployment issues at NCG-INGRID

Short Description of the issue

VMs are not correctly contextualised and the deployment is failing. The cloud-info is still operational as there is a backup VM in place, but this should be fixed asap to make sure we can keep the infrastructure operational and automatically updated.

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

output

Summary of proposed changes

Invalid format in INFN-CLOUD-CNAF site

Hi @enolfc,
It seems that there is an error in the format of the INFN-CLOUD-CNAF.yaml site.
It is causing the fedcloud client to fail:

Site config in file https://raw.githubusercontent.com/EGI-Foundation/fedcloud-catchall-operations/main/sites/INFN-CLOUD-CNAF.yaml is in wrong format
Exception: Additional properties are not allowed ('region' was unexpected)

Failed validating 'additionalProperties' in schema:
    {'$id': 'http://fedcloud.egi.eu/catchall-ops.json',
     '$schema': 'http://json-schema.org/draft-07/schema',
     'additionalProperties': False,
     'definitions': {'vodata': {'additionalProperties': True,
                                'properties': {'auth': {'properties': {'project_id': {'description': 'project '
                                                                                                     'supporting '
                                                                                                     'the '
                                                                                                     'VO '
                                                                                                     'at '
                                                                                                     'the '
                                                                                                     'site',
                                                                                      'title': 'project '
                                                                                               'id',
                                                                                      'type': 'string'}},
                                                        'type': 'object'},
                                               'name': {'type': 'string'}},
                                'required': ['auth', 'name'],
                                'type': 'object'}},
     'description': 'site configuration schema',
     'properties': {'endpoint': {'$id': '#/properties/endpoint',
                                 'default': '',
                                 'description': 'The URL of keystone '
                                                'endpoint (should match '
                                                'GOCDB entry).',
                                 'title': 'keystone endpoint',
                                 'type': 'string'},
                    'gocdb': {'$id': '#/properties/gocdb',
                              'default': '',
                              'description': 'The GOCDB site name.',
                              'title': 'GOCDB site name',
                              'type': 'string'},
                    'protocol': {'$id': '#/properties/protocol',
                                 'default': 'openid',
                                 'description': 'The protocol configured '
                                                'in keystone for egi.eu '
                                                'idp.',
                                 'title': 'protocol in Keystone',
                                 'type': 'string'},
                    'vos': {'$id': '#/properties/vos',
                            'description': 'VOs supported at the site.',
                            'items': {'$ref': '#/definitions/vodata'},
                            'title': 'Supported VOs',
                            'type': 'array'}},
     'required': ['gocdb', 'endpoint'],
     'title': 'site specs',
     'type': 'object'}

On instance:
    {'endpoint': 'https://cloud-api-pub.cr.cnaf.infn.it:5000/v3',
     'gocdb': 'INFN-CLOUD-CNAF',
     'region': 'sdds',
     'vos': [{'auth': {'project_id': 'a8af02aad2894e9e8b5d4775c9736b8a'},
              'name': 'fedcloud.egi.eu'},
             {'auth': {'project_id': '8b6a8afe225344dea808ec17a26de56d'},
              'name': 'ops'},
             {'auth': {'project_id': 'a8af02aad2894e9e8b5d4775c9736b8a'},
              'name': 'dteam'}]}

Enable VALIDATE_CHECKOV in linter

Short Description of the issue

Was disabled to be able to move on with the update of super-linter but it's too much for this repo at the moment.

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

output

Summary of proposed changes

Error on installing cloudkeeper during last merge

See 4e8db3b and https://github.com/EGI-Federation/fedcloud-catchall-operations/actions/runs/5808942881/job/15746764652:

32.99 Successfully installed zaru-0.3.0
32.99 Successfully installed yell-2.2.2
32.99 Successfully installed tilt-2.2.0
32.99 Successfully installed thor-0.20.3
32.99 Successfully installed settingslogic-2.0.9
32.99 Successfully installed mixlib-shellout-2.4.4
------
Dockerfile:24
--------------------
  22 |     RUN fetch-crl -p 2 -T 30 || exit 0
  23 |     
  24 | >>> RUN gem install cloudkeeper -v 1.7.1
  25 |     
  26 |     COPY entrypoint.sh /entrypoint.sh
--------------------
ERROR: failed to solve: process "/bin/bash -o pipefail -c gem install cloudkeeper -v 1.7.1" did not complete successfully: exit code: 1
Error: buildx failed with: ERROR: failed to solve: process "/bin/bash -o pipefail -c gem install cloudkeeper -v 1.7.1" did not complete successfully: exit code: 1

Document new service account usage

Short Description of the issue

Review documentation to reflect the new way of operating this with the service account so no need for PRs anymore.

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

output

Summary of proposed changes

Too large messages

Short Description of the issue

Some providers info reach AMS limits:

Need to find a way to avoid this

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

requests.exceptions.HTTPError: 413 Client Error: Request Entity Too Large for url: https://msg.argo.grnet.gr/v1/projects/egi_cloud_info/topics/SITE_CESGA_ENDPOINT_11548G0:

Summary of proposed changes

Add tests everywhere

Short Description of the issue

We miss good tests for the different parts of this repo, this causes deployments to sometime fail
See e.g. #275

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

output

Summary of proposed changes

Use oidc-agent for managing credentials

Short Description of the issue

Avoid the refresh of credentials in the code as we are doing now and move to a better system like oidc-agent

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

output

Summary of proposed changes

Service accounts

We need a way to have service accounts per VO to simplify and make more secure the operations.

Test deployment before removing previous VM

Short Description of the issue

Ideally we should be able to check that the new VM will be capable of running before doing the actual deployment (maybe deploying twice?)

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

output

Summary of proposed changes

Review how secrets are handled

Short Description of the issue

We have secrets in several places in this code and they are treated in different ways (via a file, via env variables, in GitHub Actions secrets, ...), we should review how these are managed and move to as simple and secure way as possible.

Add cloudkeeper

Short Description of the issue

Add cloudkeeper here alongside cloud-info-provider

Add missing VO credentials

Short Description of the issue

Several VOs are yet unconfigured because the credentials are not available in the secrets of the repo

Summary of proposed changes

Environment variable EGI_SITE not used

The environment variable EGI_SITE is set to a site name, but fedcloud does not use it, instead it insists to provide parameter --site or --all-sites.

Environment

  • Operating System: Ubuntu 22.04
  • Other related components versions: fedcloud version 1.2.15

Steps to reproduce

  • Provision a new VM
  • Install fedcloudclient
  • Set environment variable EGI_VO and EGI_SITE
  • Run command e.g. fedcloud openstack server list

Error message is shown:

Error: Missing one of the required mutually exclusive options from 'Site' option group:
  '--site'
  '--all-sites' / '-a'

Monitor the deployment

Short Description of the issue

We lack direct monitoring of the deployed VMs, so we react only to issues once we are notified by users and that can take a few days as AppDB caches the information. We need to ensure the cloud-info/image sync works independently of the individual site status

Environment

  • Operating System:
  • Other related components versions:

Steps to reproduce

Logs, stacktrace, or other symptoms

output

Summary of proposed changes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.