Giter Site home page Giter Site logo

ffm-project's People

Contributors

davidffrench avatar machi1990 avatar

Watchers

 avatar  avatar  avatar  avatar

ffm-project's Issues

Skip all integration tests that are failing for now and add a note in the automated test guide

The guide is: https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template/blob/main/docs/automated-testing.md#integration-tests

It might catch some people by surprise to see the integration tests failing. The idea of having some integration tests was to be able to show how the setup phase is done. So it actually makes sense to skip the failing ones with a note on why they are failing and what will be needed to make them pass.

Cluster Registration Service

Context

A mechanism for registering an OpenShift or Kubernetes cluster that can be used to schedule workloads through AppStudio.

User Narrative

Steve wants to streamline his enterprise application lifecycle and operations across any footprint. Steves company PizzaPie.inc has several clusters used for their various environments including both managed and self-managed OpenShift clusters.

Steve registers PizzaPie.incs existing clusters against various managed services which enables his development team to start scheduling services to the existing clusters.

Job Stories

  • As a cloud administrator, I want to register my existing self-managed or managed OpenShift clusters, so that my development teams can integrate with existing managed services
  • As a cloud administrator, I want to deregister a cluster, so that it no longer is possible to use it with other managed services
  • As a cloud administrator, I want to retrieve a list of my registered clusters, so that I can review the registered clusters on demand

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

  • [ ]

API Definition for new Terraforming/Infrastructure/Fleet Management service

Context

Terraforming and infrastructure management is a common concern for all fleet managers that use a bin packed data plane deployment topology. A shared service would ensure consistency and reliability with data plane cluster fleet management and scaling.

The high-level idea for the API is to achieve:

  • Policy endpoint that allows API users to define the structure of their managed service dataplane cluster (e.g. what should be included in terraforming - addons, resources etc)
  • Capacity Pool endpoint which allows API users to likely define some restrictions around their capacity pool. Minimum, maximum clusters etc

These are rough ideas above about how this might work. As spikes are completed, this Epic should become more concrete.

User Narrative

Joe, a development lead from ProductA has been tasked with turning it into a managed service, running on a managed OpenShift offering. Joe knows that this new managed service will colocate separate customers service instances on the same managed OpenShift cluster.

Joe hears about the Infrastructure fleet manager, which suits this bin packed deployment topology perfectly. Joe creates a blueprint for his managed service and instantiates a fleet of clusters created using his blueprint in a specific cloud provider and region.

Job Stories

  • As a SaaS developer, I want to define the blueprint for my SaaSs data plane cluster, so the data plane fleet can be created in a consistent and reliable way.
  • As a Saas developer, I would prefer the blueprint to be cloud provider agnostic, so that I only have to define one blueprint for my data plane.
  • As a Saas developer, I want to define machinepools and node sizes within my blueprint, so that I can provision services on different node sizes.
  • As a SaaS developer, I want an observability stack and my custom observability resources included in each data plane cluster, so that the service instances can be managed by an SRE team.
  • As a SaaS developer, I want the ability to create a pool of my SaaSs data plane clusters from a defined blueprints version, so that I can start provisioning service instances
  • As a SaaS developer, I want the ability to update the blueprint for my SaaSs data plane cluster with versioning, so that I can update the blueprint used for cluster pools independently.
  • As a SaaS developer, I want to retrieve the clusters from my SaaSs data plane cluster pools, so that I can choose where to provision service instances.
  • When a given data plane cluster's available compute resources have fallen below a threshold, I want the number of nodes scaled in the cluster, so I can continue to provision service instances.
  • When a given data plane cluster has reached my defined maximum capacity, I want a new data plane cluster added to the cluster pool, so I can continue to provision service instances.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

Investigate and Document FleetShard Operator Patterns

Description

Based on a discussion in Zulip, there are numerous patterns in use for the RHOSAK fleetshard operator which would be useful for other managed services teams looking to build their own.

Some examples:

  • Cluster status reporting (readiness, capacity, utilisation) to the fleet manager
  • Reporting service instance status

It is likely this information is captured in ADRs and other documents, ideally this would be made available as links or direct documentation in ffm-fleet-manager-go-template, for now at least. If in the future, there is a separate fleetshard template or SDK, it could be moved at that time.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

First spike of the API for terraforming

from the template code, extract a v0.0.0 of the API to have some concrete to discuss across team. It will not be the final API but helps delineate the bounded contexts and thus our services.

DNS Management Service

Context

DNS management is a common concern for all fleet managers to ensure a consistent format of the service URL exposed for all separate managed service instances. A shared service will also reduce the shared code across managed services.

User Narrative

Samantha is a development lead working for Corporate Inc. She is building a new managed service and needs to follow the company guidelines on URL naming for managed service instance URLs. Samantha onboards with the DNS management service and integrates the new managed service to it, ensuring that there is a record created in the company DNS service Route53 for each new service instance.

Liam is a developer working for Customer Inc and has started using 3 different managed services from Coporate Inc. Liam has noticed that the same domain name is used for each of the managed services with a different subdomain for each of the managed service. This strengthens Liams professional opinion on Corporate Inc.

Job Stories

  • As a managed service developer, I want clear onboarding documentation for the DNS management service, so that I know what information is required from my team.
  • As a managed service developer, I want clear authentication and authorisation documentation for the DNS management service, so that I know what prerequisites I need to complete.
  • As a managed service developer, I want API documentation for the DNS management service, so that I know how to integrate with the service.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

  • [ ]

Publish the v0 of the template that's using SyncSet to deploy the service

Description

Most teams creating new service won't have the fleetshard components in the early days. To enable them to deploy the initial version of their managed service that work end to end for first round of testings, they may need to directly deploy the CRs using the trusted data path approach with SyncSet. It might be useful to have this in our template as our v0 of the template.

This stemmed from the bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template#22 (comment)

Analysis

  • The template could be create from the existing https://github.com/bf2fc6cc711aee1a0c2a/ffm-fleet-manager-go-template template with the fleetshard parts stripped out.
  • In the provisioning_mgr.go reconciler, we could push the CR of a managed service via SyncSet
  • in the deleting_mgr.go reconciler we could delete the SyncSet created in previous step

Task List

  • [ ]

Convert "TODO" strings to change in the template by external variables

Description

A decent portion of the todo in the code are static strings, service name, API path etc.
A way to make a template user life easier and reduce the number of non rebasable code would be to externalize these strings and read them from an external config file, env variable and or config map. That would make life easier for consumers and might not make the template too terribly more complicated.

Analysis

See description, a decent % of TODO grunt would could be simplified and centralized into an external config file.
How doable it is in Go is to be analyzed as I am less than an novice in it.

Task List

Authorization for Fleet Managers

Description

When working on the phase 2 (pre-flight checks) of the Factorized Fleet Manager initiative, it was identified
that pre-flight checks where part of a bigger functionality area, which is the area related to Authorization for fleet managers. This could be considered an area that belongs to phase 3 of the Factorized Fleet Manager initiative.

This issue is about defining, extracting and implementing an Authorization mechanism/service for Fleet Manger.

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

User Facing Metrics Service

Context

Red Hats managed services should have a consistent user experience for exposing customers service instance metrics. This should be provided in both a JSON and Prometheus text format. Creating a new user-facing metrics service, exposing a subset of managed service instance metrics to the customers will allow the customers to monitor their own managed service instances. It will also allow UIs to display their instance metrics in user-friendly dashboards.

User Narrative

Peter is the team lead for Innovation Inc, which has purchased several Red Hat managed services including Red Hat OpenShift Streams for Apache Kafka (RHOSAK). Peter is using RHOSAK as part of their own internal system and would like to monitor the Kafka instance and include the metrics in their own internal system dashboards. Thankfully, Peter can easily do this by scraping their Kafka instance metrics from the Red Hat user-facing metrics API directly into their companies Prometheus instance.

Steve from Innovation Inc is also interested in viewing his RHOSAK instance metrics, he is happy to view these from his Kafka instance page available on console.redhat.com where he can see several widgets displaying his instances metrics. Curious about where these are coming from, he can see they are retrieved from the /query and query_range endpoints from the Red Hat user-facing metrics API.

Job Stories

  • As a managed service developer, I want onboarding documentation for the user-facing metrics service, so that I know how I can expose a subset of my tenant metrics that are user-facing
  • As a managed service developer, I want my user-facing metrics retrieved from Observatorium, so that I can send metrics to a single location
  • As a managed service developer, I want the authorisation configuration documented, so that I can define which users or organisations are allowed to retrieve their metrics
  • As an end-user, I want to retrieve metrics in a Prometheus text format, so that I can scrape my services metrics to my own Prometheus instance
  • As an end-user, I want to retrieve metrics using a query or query_range endpoint, so that I can accurately read the metrics

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

  • [ ]

Spiking Java fleet manager template as reactive with executor pool for Operator SDK usages

To build the Java factorized fleet manager, we are considering going for a modern Quarkus approach with default reactive stack. One of the reason for the reactive stack is for me to be confident that adding fleet managers will be low resource intensive. We already aim to deploy 3 instances at minimal (one per AZ). With the idea of deploying 10s of services off this technology, we should aim at resource reduction.

Now today the fleet manager is not an operator, but with Kubernetes Control Plane being a likely target API server in the future, it is very likely that some interaction with the Kube API will be part of the fleet manager responsibility.

Spiking a model where the fleet manager is reactive but use the executor pool for operator SDK tasks and see how simple / complicated that model is

Create the roadmap view documentation

At the moment we do not have documents on how to epic, how to assign an epic to a sprint and milestone.
This issue is about documenting the process. The process and hence the document will likely evolve overtime, but it is good to capture the initial process so that epic creates contains the info needed.

Document/Show a pattern on how to store sensitive data in the database

Description

Some fleet managers may need to store sensitive data. The current template is missing a pattern of how this can be done.
Even though encryption at rest can be turned on for some database vendors, it is good to add another layer of protection applicative wise. Some ideas that are current being discussed:

Analysis

We need to investigate on the best pattern by doing further analysis

  • perf overhead
  • some other protection mechanism
  • etc

Task List

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.