openchami / deployment-recipes Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 8.0 280 KB

Ochami deployment recipes

License: MIT License

Shell 100.00%

deployment-recipes's People

Contributors

Stargazers

Watchers

Forkers

synackd njones-lanl davidallendj bcfriesen

deployment-recipes's Issues

[DEV] Create the end-to-end workflow for bootstrapping credentials for an SI cluster

This task may be mostly documentation, but I anticipate that we'll need some service customization as well. It's possible that we'll need to add an identity provider as either a static file or a dynamic service.

Relevant user stories include:

As a student, I would like instructions that allow me to set up my own instance of ochami and enable my team to collaborate with me using it for HPC system management.
As a team, we would like to know how to keep other students from accessing our ochami instance by accident, or on purpose.
As a student, I would like to be able to create users and grant them access to the ochami delpoyment through a properly scoped jwt.
As a student, I would like to be able to disable a jwt if I accidentally disclose it.
As a student, I would like to be able to renew my jwt easily in order to continue to make calls to the system.

Note: Standard practice for jwts is to give them short expiration times and have clients renew them as part of normal usage. For example, if the student obtains a token each morning and uses it at least once per hour, they shouldn't ever experience an expired token.

[DEV] Create a release pipeline for the docker-compose files needed to run a HPC cluster

Our readme instructions in the subdirectory of the repo are good, but we can make it easier for new users by creating a release of just the essential files and scripts for getting started.

This task may be marked complete when we have added :

a github action workflow to create a tarball of only the files essential for a default installation of OpenCHAMI through docker-compose
automation for signing the tarball and generating a Github release with changelog(s)
Instructions in the readme for downloading the most current release and verifying the signature

[FEATURE] TLS cert renewal unit tests

We need tests to verify the TLS cert renewal server is functioning correctly.
Krakend is currently using the server to get a cert and other services might in the future.

A compose file with the tests should be the priority as it will make it easier to deploy with the rest of the services

[DEV] create private network for opaal and hydra

[BUG] docker compose -f ochami-services.yml -f hydra.yml -f opaal.yml -f ochami-krakend-ce.yml up fails port in use

Describe the bug
docker compose -f ochami-services.yml -f hydra.yml -f opaal.yml -f ochami-krakend-ce.yml up fails on

Error response from daemon: driver failed programming external connectivity on endpoint hydra (aa4af1c3010b07aee23f3ea108493764dc7be311c622d24a9d0e632b2aa9962a): Error starting userland proxy: listen tcp4 0.0.0.0:5555: bind: address already in use

To Reproduce
Steps to reproduce the behavior:
run
docker compose -f ochami-services.yml -f hydra.yml -f opaal.yml -f ochami-krakend-ce.yml up

Expected behavior
Service come up completely.

Desktop (please complete the following information):

OS: Fedora 39

[FEATURE] Add our own DHCP Server

OpenCHAMI needs a way to run dhcp directly as well as feed an external DHCP server with configuration data. This issue is confined to running a simple dhcp server directly within the docker compose environment.

Options Available

There are several credible, embeddable DHCP servers to consider

ISC DHCP is a common choice even though it is EOL. There are plenty of libraries that help to generate configs for it. It is deprecated with kea as the successor.
kea is the successor to ISC DHCP and is included with CSM. It has modern features like database backends, a rest interface, and modularity
*dnsmasq is an all-purpose DHCP and DNS server that is commonly included in appliances. In addition to DHCP, it serves DNS , BOOTP, and PXE within the same small footprint. It's most recent release is a year old and it doesn't support external integrations.
coredhcp is a relatively new project that can be used as a standalone dhcp server and can be extended with plugins. It also supports being embedded as a library with the same configuration and plugins.

Assessment

My recommendation is that we embed coredhcp using the file plugin for now and add a roadmap item to create a dedicated ochami plugin that interacts with the inventory system and some form of IPAM, (either within the coredhcp plugin or within ochami)

[DEV] split Helm values file into multiple files to support multiple Kubernetes environments

Currently there is a single values.yaml file for the Helm charts that deploy OCHAMI services. Those values expect a Google Kubernetes Engine (GKE) environment, and have annotations targeting that environment which, in any other Kubernetes environment will, at best, be ignored, and at worst, break the deployment.

Helm supports specifying values files with the -f flag to helm install, and multiple values files can be specified in a single invocation. So let's split the existing values file into one for GKE and one for CSM, which is somewhat closer to a "plain" Kubernetes environment than GKE.

I don't have access to a completely unmodified Kubernetes environment so I am not sure how to write a values file targeting that environment. But hopefully some combination of the GKE and CSM values files will get pretty close.

[BUG] Instructions for "To run services with JST authentication enabled:" lack a verb and even with up fail

Describe the bug
The instructions say
"To run the services without JWT authentication enabled:

docker compose -f ochami-services-noauth.yml -f ochami-krakend-ce.yml"

This is incorrect because there is no action given to docker-compose (up, down, ...)
Adding up still fails.

To Reproduce
Steps to reproduce the behavior:
Run
docker compose -f ochami-services-noauth.yml -f ochami-krakend-ce.yml

Expected behavior
docker compose would run the services

Screenshots

`docker compose -f ochami-services-noauth.yml -f ochami-krakend-ce.yml

Usage: docker compose [OPTIONS] COMMAND

Define and run multi-container applications with Docker

Options:
--ansi string Control when to print ANSI control characters ("never"|"always"|"auto") (default "auto")
--compatibility Run compose in backward compatibility mode
--dry-run Execute command in dry run mode
--env-file stringArray Specify an alternate environment file
-f, --file stringArray Compose configuration files
--parallel int Control max parallelism, -1 for unlimited (default -1)
--profile stringArray Specify a profile to enable
--progress string Set type of progress output (auto, tty, plain, quiet) (default "auto")
--project-directory string Specify an alternate working directory
(default: the path of the, first specified, Compose file)
-p, --project-name string Project name

Commands:
attach Attach local standard input, output, and error streams to a service's running container
build Build or rebuild services
config Parse, resolve and render compose file in canonical format
cp Copy files/folders between a service container and the local filesystem
create Creates containers for a service
down Stop and remove containers, networks
events Receive real time events from containers
exec Execute a command in a running container
images List images used by the created containers
kill Force stop service containers
logs View output from containers
ls List running compose projects
pause Pause services
port Print the public port for a port binding
ps List containers
pull Pull service images
push Push service images
restart Restart service containers
rm Removes stopped service containers
run Run a one-off command on a service
scale Scale services
start Start services
stats Display a live stream of container(s) resource usage statistics
stop Stop services
top Display the running processes
unpause Unpause services
up Create and start containers
version Show the Docker Compose version information
wait Block until the first service container stops
watch Watch build context for service and rebuild/refresh containers when files are updated

Run 'docker compose COMMAND --help' for more information on a command.`

If I add up to the end to give it an action

docker compose -f ochami-services-noauth.yml -f ochami-krakend-ce.yml up WARN[0000] /mnt/home/jhanson/Canary/deployment-recipes/lanl/docker-compose/ochami-services-noauth.yml: versionis obsolete WARN[0000] /mnt/home/jhanson/Canary/deployment-recipes/lanl/docker-compose/ochami-krakend-ce.yml:version is obsolete service "step-ca" refers to undefined network internal: invalid compose project

Desktop (please complete the following information):

OS:Fedora 39

[BUG] Cannot run Docker compose with `hydra.yml`

Describe the bug
When trying to start the hydra service with Docker compose in the LANL deployment recipes, the service fails to start due to an error. See the following below to reproduce.

To Reproduce
Steps to reproduce the behavior:

cd to lanl/docker-compose
Stop ochami services docker compose -f ochami-services.yml -f ochami-krakend-ce.yml -f hydra.yml down --volumes
Then restart docker compose -f ochami-services.yml -f ochami-krakend-ce.yml up
Try and run docker compose -f hydra-config/hydra.yml up -d

This will produce the following error:

validating /Users/allend/Desktop/projects/ochami/deployment-recipes/lanl/docker-compose/hydra-config/hydra.yml: secrets.system must be a mapping

Additionally, trying to run docker compose with the other hydra.yml with docker compose -f hydra.yml up -d will produces the following error:

service "hydra-migrate" refers to undefined network internal: invalid compose project

Expected behavior
The hydra container should start without any errors.

Desktop (please complete the following information):

OS: M2 macOS Sonoma 14.2
Docker Compose version v2.23.3-desktop.2

Client: Docker Engine - Community
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.21.3
 Git commit:        afdd53b4e3
 Built:             Thu Oct 26 07:06:42 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.26.1 (131620)
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.10
  Git commit:       311b9ff
  Built:            Thu Oct 26 09:08:15 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.25
  GitCommit:        d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
 runc:
  Version:          1.1.10
  GitCommit:        v1.1.10-0-g18a0cb0
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Additional context
Add any other context about the problem here.

[BUG] Authorization requests sometimes returning 401 when running recipes

Describe the bug
When running the deployment recipes, both SMD and BSS will try to fetch a JWKS from the authorization server (Hydra) to verify incoming JWTs are valid. Hydra will generate a new key pair when the request is made if the pair does not already exist. If both micro-services try to fetch the JWKS roughly at the same time, Hydra will try to generate the pair twice. This will cause all authorization request to return a 401 from both micro-services.

To Reproduce
Steps to reproduce the behavior:

Make sure the *_JWKS environment variables are set to make SMD and BSS fetch a JWKS
Run the deployment recipes
Check the Hydra logs to see these lines twice:

2024-03-05 09:37:13 time=2024-03-05T16:37:13Z level=warning msg=JSON Web Key Set "hydra.jwt.access-token" does not exist yet, generating new key pair... audience=application service_name=Ory Hydra service_version=v2.2.0-rc.3
2024-03-05 09:37:13 time=2024-03-05T16:37:13Z level=warning msg=JSON Web Key Set "hydra.openid.id-token" does not exist yet, generating new key pair... audience=application service_name=Ory Hydra service_version=v2.2.0-rc.3

Fetch a token and try to use with BSS or SMD
Returns specific error token is unauthorized

Expected behavior
Any normal output expected from the micro-service that isn't token is unauthorized

Desktop (please complete the following information):

OS: M2 MacOS 14.2 Sonoma

Additional context
This problem only occurs sometimes, so you will have to run multiple times if it doesn't happen the first time.

Add example Helm charts

There exists already a Docker Compose directory in this repo for deploying a simple arrangement of SMD and BSS. It may also be useful to provide corresponding Helm charts to achieve a similar result. I was able to reproduce this recipe using Helm charts, deploying those services into (ironically) an existing CSM cluster:

root@ncn-m001 2024-01-31 22:22:29 ~ # kubectl get pods -n ochami
NAME                        READY   STATUS      RESTARTS   AGE
bss-66b86457ff-tnvrv        1/1     Running     0          16m
ochami-init-gb4pr           0/1     Completed   0          16m
postgres-7887576bf9-gqlmn   1/1     Running     0          16m
smd-556d7689dd-vw8k8        1/1     Running     0          16m
smd-init-zgk67              0/1     Completed   0          16m
root@ncn-m001 2024-01-31 22:23:09 ~ # kubectl get svc -n ochami
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
bss-svc        LoadBalancer   10.18.43.163    10.92.100.1   27778:30914/TCP   16m
postgres-svc   ClusterIP      10.20.183.224   <none>        5432/TCP          16m
smd-svc        LoadBalancer   10.29.103.48    10.92.100.0   27779:31095/TCP   16m
root@ncn-m001 2024-01-31 22:23:13 ~ # curl -s 10.92.100.0:27779/hsm/v2/service/ready|jq
{
  "code": 0,
  "message": "HSM is healthy"
}
root@ncn-m001 2024-01-31 22:23:22 ~ # curl 10.92.100.1:27778/boot/v1/
Hello World!

I will share my first attempt at such a Helm chart as a PR for others to browse.

[RFD] adding secrets to services in docker compose (without using docker swarm)

Secrets can be added via ENV variables or files on the host
Files on the host can be plaintext files and can be added as secrets and used by services in the compose file.
Secret as a file example below

compose.yaml

version: '3.8'

services:
  web:
    image: <service-container>
    secrets:
     - secret-file

secrets:
  secret-file:
    file: <path to file>

This can be a complex file and the syntax and structure will be preserved.

We can use this to bring in the ochami-init config file to generate users in the various databases. Then we won't need to generate passwords every time we run the init.

Here is an example of how we might use files to bring in the ochami-init config file

version: '3.8'

smd-init:
  container_name: smd-init
  image: ghcr.io/openchami/smd:v2.13.5
  environment:
    - SMD_DBHOST=postgres
    - SMD_DBPORT=543
    - SMD_DBUSER=ochami
    - SMD_DBPASS=${POSTGRES_PASSWORD} # Set in .env file
    - SMD_DBNAME=ochami
    - SMD_DBOPTS=sslmode=disable
    - OCHAMI_CONFIG=/run/secrets/ochami.yaml #need to read from here now
  hostname: smd-init
  depends_on:
    - postgres
    - ochami-init
  networks:
    - internal
  entrypoint:
    - /smd-init
  secrets:
    ochami-config

secrets: 
  ochami-config:
    file: deployment-recipes/lanl/docker-compose/configs/ochami.yaml

There shouldn't be a need to write to that config anymore.
But we will still need a way to generate the initial passwords and a safe place to keep the ochami.yaml config file

[RFD] Writing to external configuration in Docker and Docker Compose

Background

When working on Ochami's deployment-recipes repository, I found that, within the ochami-services.yml Docker Compose file, the ochami-init container mounts in the ochami.yaml file included in the deployment-recipes repo, which specifies how to set up the networking and databases for running Ochami services in Docker Compose. BSS and SMD both use PostgreSQL as data storage backends, and ochami-init configures the postgres container by initializing each database specified in ochami.yaml so that SMD and BSS can both use the same Postgres container containing their respective databases.

The way that ochami-init initializes the databases is it:

Reads ochami.yaml and parses each database name and username.
Generates a password for each user.
Sends the SQL queries to Postgres to execute the configuration.
Writes the final configuration back to ochami.yaml.

Currently, ochami.yaml is mounted into the ochami-init container using the following YAML in ochami-services.yml:

volumes:
 - ./ochami.yaml:/config/ochami.yaml:rw

The Dockerfile that specifies how to build the ochami-init container sets the user to UID 65534 (nobody).

The Problem

ochami.yaml is getting bind mounted into the ochami-init container, so it is owned by the UID of the user on the host who checked out the repo. For example, if we examined ochami.yaml within the container compared to the user, we would see that it is owned by my host user (UID 1001) and is not other writable, and the user running the shell is UID 65534:

$ docker run -v ./ochami.yaml:/config/ochami.yaml:rw -e DB_USER=ochami -e DB_PASSWORD=ochami -e DB_NAME=ochami -e DB_HOST=<postgres-ip> -it openchami/ochami-init:v0.0.18 /bin/sh
/config $ ls -la
total 4
drwxr-xr-x    2 nobody   nobody          60 Jan 22 23:42 .
drwxr-xr-x    1 root     root            40 Jan 22 23:42 ..
-rw-rw-r--    1 1001     1001           543 Jan 22 23:27 ochami.yaml
/config $ id
uid=65534(nobody) gid=65534(nobody) groups=65534(nobody)

So, as we would expect, we get a permission denied error when trying to write to it in the container:

ochami-init  | time="2024-01-19T19:54:48Z" level=fatal msg="open ochami.yaml: permission denied"

Possible Solutions

Solution 1: User Namespace Re-Mapping

Docker provides a way to map a range of UIDs/GIDs in the container to a range of UIDs/GIDs on the host. Using this, we can make the file appear in the container as being owned by UID 65534 (the same UID who is running /ochami-init), while still being owned by 1001 (testuser in the example below) on the host.

/etc/docker/daemon.json:

{
          "userns-remap": "testuser"
}

/etc/subuid and /etc/subgid:

testuser:231072:65536

Testing:

$ docker run --rm -v ./ochami.yaml:/config/ochami.yaml:rw -e DB_USER=ochami -e DB_PASSWORD=ochami -e DB_NAME=ochami -e DB_HOST=<postgres-ip> -it  openchami/ochami-init:v0.0.18 /bin/sh
/config $ ls -l
total 4
-rw-rw-r--    1 nobody   nobody         543 Jan 23 21:15 ochami.yaml
/config $ whoami
nobody

Awesome! The file is owned by the same user as the user that runs /ochami-init. Let's test:

/config $ /ochami-init
INFO[0000] Connected to 172.16.0.10/ochami
[...]
FATA[0000] open ochami.yaml: permission denied

Uh oh. Why didn't it work? It's because we mapped UIDs 0 to 65535 in the container to UIDs 231072 to 296607 (231072 + 65535) and so UID 65534 in the container is mapped to UID 296606 (231072 + 65534) on the host. Therefore, ochami.yaml needs to be owned by UID 296606 on the host for it to be able to be written in the container, even though the file is mounted into the container using rw.

Problems with Solution 1

This solution requires changing the ownership of ochami.yaml to be owned by a particular UID, which can be inconvenient for editing as a normal user. Preferably, we would like to not have to modify the file permissions of ochami.yaml.

Plus, the docker daemon config needs to be modified, along with the /etc/subuid and /etc/subgid files, just for this container, which can be inconvenient.

Benefits of Solution 1

The Docker Compose file need not be modified, nor the ochami-init code.

Solution 2.A: Setting User via Environment Variables

Alternatively, we could specify which user to run /ochami-init as so that we can run it as our own host user, the same one that owns ochami.yaml.

ochami-services.yml:

services:
  [...]
  ochami-init: # Creates the ochami databases and users
    [...]
    # Disable user namespace remapping
    userns_mode: host
    # Set running user/group to be what we configure in the UID/GID env vars.
    # These will be set in .env to be the host UID/GID (e.g. 1001:1001).
    user: '${UID}:${GID}'

Setting the environment variables in .env:

$ echo "UID=$(id -u)" | tee -a .env
UID=1001
$ echo "GID=$(id -g)" | tee -a .env
GID=1001
$ grep -e UID -e GID .env
UID=1001
GID=1001

Alternatively, one could export UID and GID in their shell autostart, e.g. ~/.bashrc.

Problems with Solution 2.A

Being able to set the user the container entrypoint runs as presents an obvious security concern.

Benefits of Solution 2.A

The user need only add to the existing .env file, which is already specified in the setup instructions, and they don't need to be privileged to do any of that. With minimal overhead, the user can get the containers up and running quickly.

Solution 2.B: Modify `ochami-init` with Optional Read/Write Config Paths

NOTE: Configuring the running user (Solution 2.A) is a prerequisite for this solution.

Another solution could be modifying the ochami-init code to have environment variables and/or flags to specify which config file to read and which file to write to, creating as necessary. If neither are specified, it could default to what is currently hard-coded, ochami.yaml.

For example:

$ ./ochami-init -h
Usage of ./ochami-init:
  -read-config string
        (OCHAMI_INIT_READ_CONFIG) YAML configuration to read (default "ochami.yaml")
  -write-config string
        (OCHAMI_INIT_WRITE_CONFIG) YAML configuration to write (default "ochami.yaml")

Then, we create config/ within the repo and put ochami.yaml there:

$ mkdir config/
$ cp ochami.yaml config/
$ ls config/
ochami.yaml

We could then specify OCHAMI_INIT_WRITE_CONFIG to be a different file that will be created by ochami-init (e.g. ochami-out.yaml) and mount the config/ directory into the container:

services:
[...]
ochami-init: # Creates the ochami databases and users
    [...]
    environment:
      - OCHAMI_INIT_WRITE_CONFIG=ochami-out.yaml
      [...]
    volumes:
      - ./config:/config:rw

After this runs successfully:

$ ls -la config/
total 12
drwxr-xr-x 2 testuser testuser   48 Jan 23 16:58 .
drwxrwxr-x 4 testuser testuser 4096 Jan 23 16:57 ..
-rw-r--r-- 1 testuser testuser  543 Jan 23 16:58 ochami-out.yaml
-rw-rw-r-- 1 testuser testuser  543 Jan 23 16:58 ochami.yaml

Problems with Solution 2.B

Having to write a separate file shouldn't be required. This solution is also an extra step to Solution 2A, which presents a bit more complexity.

Benefits of Solution 2.B

The flexibility to specify any or both of the config file that ochami-init reads and writes seems useful for testing different configuration.

Weighing the Solutions

Another solution I did not include was scripting the chowning of ochami.yaml to 65534 within the container as root (getting rid of the USER 65534 in the Dockerfile) and then exec su 65534 to drop privileges. However, this does not seem like a desirable solution since it automatically changes the ownership of the config file.

Ultimately, if we don't want to use Docker volumes and we don't want to change the ownership of ochami.yaml or any directory containing it, Solution 2.A seems to be the easiest. Plus, it is the top-voted answer to a question on a similar issue on Stack Overflow here. If going this route, Solution 2.B might be a good add-on for testability.

What do folks think the solution should be?

[DEV] Identify, expose, and describe the API routes that will be supported for the Supercomputing Institute

Using an KrakenD-CE as our API Gateway gives us an opportunity to choose to expose or hide URLs that are necessary for interacting with an ochami system. We can choose to expose existing routes with new/different urls and we can also pass existing URLs directly through to underlying services.

Our students shouldn't need to know which microservice is involved with which requests and shouldn't be exposed to routes/endpoints that aren't relevant to their use cases.

This issue can be closed once all routes necessary for bringing up and using an ochami system are identified and configured in a krakend-ce configuration file.

[FEATURE] Krakend unit tests

There isn't anything currently verifying that the krakend deployment is functioning correctly
A set of tests run in a compose file are need to ensure it's working as expected

openchami / deployment-recipes Goto Github PK

deployment-recipes's People

Contributors

Stargazers

Watchers

Forkers

deployment-recipes's Issues

Options Available

Assessment

Background

The Problem

Possible Solutions

Solution 1: User Namespace Re-Mapping

Problems with Solution 1

Benefits of Solution 1

Solution 2.A: Setting User via Environment Variables

Problems with Solution 2.A

Benefits of Solution 2.A

Solution 2.B: Modify ochami-init with Optional Read/Write Config Paths

Problems with Solution 2.B

Benefits of Solution 2.B

Weighing the Solutions

Recommend Projects

Recommend Topics

Recommend Org

Solution 2.B: Modify `ochami-init` with Optional Read/Write Config Paths