bf2fc6cc711aee1a0c2a / ffm-project Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 9 KB

Repository containing issues and roadmap for the factorized fleet manager

License: Apache License 2.0

ffm-project's People

Contributors

Watchers

Forkers

davidffrench machi1990

ffm-project's Issues

Skip all integration tests that are failing for now and add a note in the automated test guide

The guide is: https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template/blob/main/docs/automated-testing.md#integration-tests

It might catch some people by surprise to see the integration tests failing. The idea of having some integration tests was to be able to show how the setup phase is done. So it actually makes sense to skip the failing ones with a note on why they are failing and what will be needed to make them pass.

Allow CORS option configurability

At the moment the CORS options are hardcoded: https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template/blob/main/pkg/server/api_server.go#L97

It'll be good to allow these to be configurable so that each service could put in their own options via configuration.

/cc @SimonBaeumer

Test roadmap

Backport some changes from kas-fleet-manager to the golang fleet manager template

backport claims configurability bf2fc6cc711aee1a0c2a/kas-fleet-manager#902
backport docker multistage build and cve fixes related to golang version in use: bf2fc6cc711aee1a0c2a/kas-fleet-manager#873 and bf2fc6cc711aee1a0c2a/kas-fleet-manager#831
backport SQL debug enablement bf2fc6cc711aee1a0c2a/kas-fleet-manager#949

Create ADR with proposal for Authorization for factorized fleet Manager

Zullip link on communication page is incorrect

Zullip link on communication page for project incorrect:

https://github.com/bf2fc6cc711aee1a0c2a/ffm-project/wiki/How-to-communicate-in-the-project

It's https://bf2/zulipchat.com not https://bf2.zulipchat.com

Revise the keycloak service

to remove the custom claims
remove the need for custom roles
update the authentication between fleet-manager and fleetshard sync which relied on custom roles. This is not the way we do thing s anymore
clean up the interface and leave only what's needed

Related to #20.
The interface template repo is: https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template
/cc @akoserwal as discussed.

Cluster Registration Service

Context

A mechanism for registering an OpenShift or Kubernetes cluster that can be used to schedule workloads through AppStudio.

User Narrative

Steve wants to streamline his enterprise application lifecycle and operations across any footprint. Steves company PizzaPie.inc has several clusters used for their various environments including both managed and self-managed OpenShift clusters.

Steve registers PizzaPie.incs existing clusters against various managed services which enables his development team to start scheduling services to the existing clusters.

Job Stories

As a cloud administrator, I want to register my existing self-managed or managed OpenShift clusters, so that my development teams can integrate with existing managed services
As a cloud administrator, I want to deregister a cluster, so that it no longer is possible to use it with other managed services
As a cloud administrator, I want to retrieve a list of my registered clusters, so that I can review the registered clusters on demand

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

Offer a first version of the java template

Document the usage of the go template

API Definition for new Terraforming/Infrastructure/Fleet Management service

Context

Terraforming and infrastructure management is a common concern for all fleet managers that use a bin packed data plane deployment topology. A shared service would ensure consistency and reliability with data plane cluster fleet management and scaling.

The high-level idea for the API is to achieve:

Policy endpoint that allows API users to define the structure of their managed service dataplane cluster (e.g. what should be included in terraforming - addons, resources etc)
Capacity Pool endpoint which allows API users to likely define some restrictions around their capacity pool. Minimum, maximum clusters etc

These are rough ideas above about how this might work. As spikes are completed, this Epic should become more concrete.

User Narrative

Joe, a development lead from ProductA has been tasked with turning it into a managed service, running on a managed OpenShift offering. Joe knows that this new managed service will colocate separate customers service instances on the same managed OpenShift cluster.

Joe hears about the Infrastructure fleet manager, which suits this bin packed deployment topology perfectly. Joe creates a blueprint for his managed service and instantiates a fleet of clusters created using his blueprint in a specific cloud provider and region.

Job Stories

As a SaaS developer, I want to define the blueprint for my SaaSs data plane cluster, so the data plane fleet can be created in a consistent and reliable way.
As a Saas developer, I would prefer the blueprint to be cloud provider agnostic, so that I only have to define one blueprint for my data plane.
As a Saas developer, I want to define machinepools and node sizes within my blueprint, so that I can provision services on different node sizes.
As a SaaS developer, I want an observability stack and my custom observability resources included in each data plane cluster, so that the service instances can be managed by an SRE team.
As a SaaS developer, I want the ability to create a pool of my SaaSs data plane clusters from a defined blueprints version, so that I can start provisioning service instances
As a SaaS developer, I want the ability to update the blueprint for my SaaSs data plane cluster with versioning, so that I can update the blueprint used for cluster pools independently.
As a SaaS developer, I want to retrieve the clusters from my SaaSs data plane cluster pools, so that I can choose where to provision service instances.
When a given data plane cluster's available compute resources have fallen below a threshold, I want the number of nodes scaled in the cluster, so I can continue to provision service instances.
When a given data plane cluster has reached my defined maximum capacity, I want a new data plane cluster added to the cluster pool, so I can continue to provision service instances.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

Investigate and Document FleetShard Operator Patterns

Description

Based on a discussion in Zulip, there are numerous patterns in use for the RHOSAK fleetshard operator which would be useful for other managed services teams looking to build their own.

Some examples:

Cluster status reporting (readiness, capacity, utilisation) to the fleet manager
Reporting service instance status

It is likely this information is captured in ADRs and other documents, ideally this would be made available as links or direct documentation in ffm-fleet-manager-go-template, for now at least. If in the future, there is a separate fleetshard template or SDK, it could be moved at that time.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

First spike of the API for terraforming

from the template code, extract a v0.0.0 of the API to have some concrete to discuss across team. It will not be the final API but helps delineate the bounded contexts and thus our services.

DNS Management Service

Context

DNS management is a common concern for all fleet managers to ensure a consistent format of the service URL exposed for all separate managed service instances. A shared service will also reduce the shared code across managed services.

User Narrative

Samantha is a development lead working for Corporate Inc. She is building a new managed service and needs to follow the company guidelines on URL naming for managed service instance URLs. Samantha onboards with the DNS management service and integrates the new managed service to it, ensuring that there is a record created in the company DNS service Route53 for each new service instance.

Liam is a developer working for Customer Inc and has started using 3 different managed services from Coporate Inc. Liam has noticed that the same domain name is used for each of the managed services with a different subdomain for each of the managed service. This strengthens Liams professional opinion on Corporate Inc.

Job Stories

As a managed service developer, I want clear onboarding documentation for the DNS management service, so that I know what information is required from my team.
As a managed service developer, I want clear authentication and authorisation documentation for the DNS management service, so that I know what prerequisites I need to complete.
As a managed service developer, I want API documentation for the DNS management service, so that I know how to integrate with the service.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

Publish the v0 of the template that's using SyncSet to deploy the service

Description

Most teams creating new service won't have the fleetshard components in the early days. To enable them to deploy the initial version of their managed service that work end to end for first round of testings, they may need to directly deploy the CRs using the trusted data path approach with SyncSet. It might be useful to have this in our template as our v0 of the template.

This stemmed from the bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template#22 (comment)

Analysis

The template could be create from the existing https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template template with the fleetshard parts stripped out.
In the provisioning_mgr.go reconciler, we could push the CR of a managed service via SyncSet
in the deleting_mgr.go reconciler we could delete the SyncSet created in previous step

Task List

Dynamic scaling

Description

Add documentation/link to dynamic scaling architectural pattern bf2fc6cc711aee1a0c2a/architecture#58

Add keycloak container setup for CI and local development

This is more or less a backport of bf2fc6cc711aee1a0c2a/kas-fleet-manager#840

The realm name in that PR will change to match the ones in https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template/blob/main/internal/dinosaur/internal/presenters/dinosaur.go#L23

Update the golang template automated tests documentation

The https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template/blob/main/docs/automated-testing.md#adding-new-tests section contains references to a method that does not exists anymore.
The RegisterIntegration() has been replaced with other setups functions like NewDinosaurHelperWithHooks(..), let's update the documentation to mention those instead of the old ones.

Convert "TODO" strings to change in the template by external variables

Description

A decent portion of the todo in the code are static strings, service name, API path etc.
A way to make a template user life easier and reduce the number of non rebasable code would be to externalize these strings and read them from an external config file, env variable and or config map. That would make life easier for consumers and might not make the template too terribly more complicated.

Analysis

See description, a decent % of TODO grunt would could be simplified and centralized into an external config file.
How doable it is in Go is to be analyzed as I am less than an novice in it.

Task List

Authorization for Fleet Managers

Description

When working on the phase 2 (pre-flight checks) of the Factorized Fleet Manager initiative, it was identified
that pre-flight checks where part of a bigger functionality area, which is the area related to Authorization for fleet managers. This could be considered an area that belongs to phase 3 of the Factorized Fleet Manager initiative.

This issue is about defining, extracting and implementing an Authorization mechanism/service for Fleet Manger.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

User Facing Metrics Service

Context

Red Hats managed services should have a consistent user experience for exposing customers service instance metrics. This should be provided in both a JSON and Prometheus text format. Creating a new user-facing metrics service, exposing a subset of managed service instance metrics to the customers will allow the customers to monitor their own managed service instances. It will also allow UIs to display their instance metrics in user-friendly dashboards.

User Narrative

Peter is the team lead for Innovation Inc, which has purchased several Red Hat managed services including Red Hat OpenShift Streams for Apache Kafka (RHOSAK). Peter is using RHOSAK as part of their own internal system and would like to monitor the Kafka instance and include the metrics in their own internal system dashboards. Thankfully, Peter can easily do this by scraping their Kafka instance metrics from the Red Hat user-facing metrics API directly into their companies Prometheus instance.

Steve from Innovation Inc is also interested in viewing his RHOSAK instance metrics, he is happy to view these from his Kafka instance page available on console.redhat.com where he can see several widgets displaying his instances metrics. Curious about where these are coming from, he can see they are retrieved from the /query and query_range endpoints from the Red Hat user-facing metrics API.

Job Stories

As a managed service developer, I want onboarding documentation for the user-facing metrics service, so that I know how I can expose a subset of my tenant metrics that are user-facing
As a managed service developer, I want my user-facing metrics retrieved from Observatorium, so that I can send metrics to a single location
As a managed service developer, I want the authorisation configuration documented, so that I can define which users or organisations are allowed to retrieve their metrics
As an end-user, I want to retrieve metrics in a Prometheus text format, so that I can scrape my services metrics to my own Prometheus instance
As an end-user, I want to retrieve metrics using a query or query_range endpoint, so that I can accurately read the metrics

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

backport makefile and cleanup changes from kas-fleet-manager

Description

Backport a few recent and worthwhile changes of the Makefile from the https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager repository

Analysis

Task List

backport bumping to golang 1.19
backport removal of go-bindata
backport removal of GO111MODULE
backport the additional of .dockerignore file
backport update of several tooling dependencies in the Makefile
etc

Spiking Java fleet manager template as reactive with executor pool for Operator SDK usages

To build the Java factorized fleet manager, we are considering going for a modern Quarkus approach with default reactive stack. One of the reason for the reactive stack is for me to be confident that adding fleet managers will be low resource intensive. We already aim to deploy 3 instances at minimal (one per AZ). With the idea of deploying 10s of services off this technology, we should aim at resource reduction.

Now today the fleet manager is not an operator, but with Kubernetes Control Plane being a likely target API server in the future, it is very likely that some interaction with the Kube API will be part of the fleet manager responsibility.

Spiking a model where the fleet manager is reactive but use the executor pool for operator SDK tasks and see how simple / complicated that model is

Create the roadmap view documentation

At the moment we do not have documents on how to epic, how to assign an epic to a sprint and milestone.
This issue is about documenting the process. The process and hence the document will likely evolve overtime, but it is good to capture the initial process so that epic creates contains the info needed.

Document/Show a pattern on how to store sensitive data in the database

Description

Some fleet managers may need to store sensitive data. The current template is missing a pattern of how this can be done.
Even though encryption at rest can be turned on for some database vendors, it is good to add another layer of protection applicative wise. Some ideas that are current being discussed:

encrypt and decrypt data using the db encryption functions e.g pgcrypto
use a dedicated service like Vault to store sensitive data and only store the ref/vault key in the database. The same pattern like https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/tree/main/internal/connector/internal/services/vault

Analysis

We need to investigate on the best pattern by doing further analysis

perf overhead
some other protection mechanism
etc

bf2fc6cc711aee1a0c2a / ffm-project Goto Github PK

ffm-project's People

Contributors

Watchers

Forkers

ffm-project's Issues

Context

User Narrative

Job Stories

Analysis

Task List

Context

User Narrative

Job Stories

Analysis

Task List

Description

Analysis

Context

User Narrative

Job Stories

Analysis

Task List

Description

Analysis

Task List

Description

Description

Analysis

Task List

Description

Analysis

Task List

Context

User Narrative

Job Stories

Analysis

Task List

Description

Analysis

Task List

Description

Analysis

Task List

Recommend Projects

Recommend Topics

Recommend Org