openshift / enhancements Goto Github PK

View Code? Open in Web Editor NEW

179.0 64.0 463.0 27.21 MB

Enhancements tracking repository for OKD

License: Apache License 2.0

Go 89.37% Makefile 3.12% Shell 7.51%

enhancements's Introduction

Enhancements Tracking and Backlog

Enhancement tracking repository for OpenShift, including OKD and OCP.

Inspired by the Kubernetes enhancement process.

This repository provides a rally point to discuss, debate, and reach consensus for how OpenShift enhancements are introduced. OpenShift combines Kubernetes container orchestration services with a broad set of ecosystem components in order to provide an enterprise ready Kubernetes distribution built for extension. OpenShift assembles innovation across a wide array of repositories and upstream communities. Given the breadth of the distribution, it is useful to have a centralized place to describe OpenShift enhancements via an actionable design proposal.

Enhancements may take multiple releases to ultimately complete and thus provide the basis of a community roadmap. Enhancements may be filed from anyone in the community, but require consensus from domain specific project maintainers in order to implement and accept into the release.

For an overview of the whole project, see the roadmap.

For a quick-start, FAQ, and template references, see the guidelines.

Why are Enhancements Tracked?

As the project evolves, its important that the OpenShift community understands how we build, test, and document our work. Individually it is hard to understand how all parts of the system interact, but as a community we can lean on each other to build the right design and approach before getting too deep into an implementation.

Is My Thing an Enhancement?

A rough heuristic for an enhancement is anything that:

impacts how a cluster is operated including addition or removal of significant capabilities
impacts upgrade/downgrade
needs significant effort to complete
requires consensus/code across multiple domains/repositories
proposes adding a new user-facing component
has phases of maturity (Dev Preview, Tech Preview, GA)
demands formal documentation to utilize

It is unlikely to require an enhancement if it:

fixes a bug
adds more testing
internally refactors a code or component only visible to that components domain
minimal impact to distribution as a whole

If you are not sure if the proposed work requires an enhancement, file an issue and ask!

When to Create a New Enhancement

Enhancements should be related to work to be implemented in the near future. If you have an idea, but aren't planning to implement it right away, the conversation should start somewhere else like the mailing list or Slack.

Create an enhancement here once you:

have circulated your idea to see if there is interest
(optionally) have done a prototype in your own fork
have identified people who agree to work on and maintain the enhancement
- many enhancements will take several releases to complete

Although you should probably not start your new idea's journey by writing an enhancement up front, it's worth perusing the enhancement template to understand the kinds of details that will ultimately be required, so you can keep them in mind as you explore your new idea.

How are Enhancements Reviewed and Approved?

The author of an enhancement is responsible for managing it through the review and approval process, including soliciting feedback on the pull request and in meetings, if necessary.

Each enhancement should have at least one "approver" and several reviewers designated in the header of the document.

The approver assists authors who may not be familiar with the process, the project, or the maintainers. They may provide advice about who should review a specific proposal and point out deadlines or other time-based criteria for completing work. The approver is responsible for recognizing when consensus among reviewers has been reached so that a proposal is ready to be approved, or formally rejected. In cases where consensus is not emerging on its own, the approver may also step in as a mediator. The approver does not need to be a subject-matter expert for the subject of the design, although it can help if they are.

Choosing the appropriate approver depends on the scope of an enhancement. If it is limited in scope to a given team or component, then a peer or lead on that team or pillar is appropriate. If an enhancement captures something more broad in scope, then a member of the OpenShift staff engineers team or someone they delegate would be appropriate. Examples of broad scope are proposals that change the definition of OpenShift in some way, add a new required dependency, or change the way customers are supported. Use your best judgement to determine the level of approval needed. If you’re not sure, ask a staff engineer to help find a good approver by posting in #forum-arch on the RedHat Slack server and tagging @aos-staff-engineers. If you are external to RedHat, you can use the #openshift-users forum on the kubernetes.slack.com instance.

The set of reviewers for an enhancement proposal can be anyone that has an interest in this work or the expertise to provide a useful input/assessment. At a minimum, the reviewers must include a representative of any team that will need to do work for this EP, or whose team will own/support the resulting implementation. Be mindful of the workload of reviewers, however, and the challenge of finding consensus as the group of reviewers grows larger. Clearly indicating what aspect of the EP you expect each reviewer to be concerned with will allow them to focus their reviews.

How Can an Author Help Speed Up the Review Process?

Enhancements should have agreement from all stakeholders prior to being approved and merged. Reviews are not time-boxed (see Life-cycle below). We manage the rate of churn in OpenShift by asking component maintainers to act as reviewers in addition to everything else that they do. If it is not possible to attract the attention of enough of the right maintainers to act as reviewers, that is a signal that the project's rate of change is maxed out. With that said, there are a few things that authors can do to help keep the conversation moving along.

Respond to comments quickly, so that a reviewer can tell you are engaged.
Push update patches, rather than force-pushing a replacement, to make it easier for reviewers to see what you have changed. Use descriptive commit messages on those updates, or plan to use /label tide/merge-method-squash to have them squashed when the pull request merges.
Do not rely solely on the enhancement for visibility of the proposal. For high priority work, or if the conversation stalls out, you can start a thread in #forum-arch on the CoreOS Slack server or bring the enhancement to one of the weekly architecture review meetings for discussion. If you aren't sure which meeting to use, work with a staff engineer to find a good fit.
If the conversation otherwise seems stuck, pinging reviewers on Slack can be used to remind them to look at updates. It's generally appropriate to give people at least a business day or two to respond in the GitHub thread first, before reaching out to them directly on Slack, so that they can manage their work queue and disruptions.

Using Labels

The following labels may be applied to enhancements to help categorize them:

priority/important-soon indicates that the enhancement is related to a top level release priority. These will be highlighted in the this-week newsletters.

Life-cycle

Pull requests to this repository should be short-lived and merged as soon as there is consensus. Therefore, the normal life-cycle timeouts are shorter than for most of our code repositories.

Pull requests being actively discussed will stay open indefinitely. Inactive pull requests will automatically have the life-cycle/stale label applied after 28 days. Removing the life-cycle label will reset the clock. After 7 days, stale pull requests are updated to life-cycle/rotten. After another 7 days, rotten pull requests are closed.

Ideally pull requests with enhancement proposals will be merged before significant coding work begins, since this avoids having to rework the implementation if the design changes as well as arguing in favor of accepting a design simply because it is already implemented.

Template Updates

From time to time the template for enhancement proposals is modified as we refine our processes. When that happens, open pull requests may start failing the linter job that ensures that all documents include the required sections.

If you are working on an enhancement and the linter job fails because of changes to the template (not other issues with the markdown formatting), handle it based on the maturity of the enhancement pull request:

If the only reason to update your pull request is to make the linter job accept it after a template change and there are no substantive content changes needed for approval, override the job to allow the pull request to merge.
If your enhancement is still a draft, and consensus hasn't been reached, modify the pull request so the new enhancement matches the updated template.
If you are updating an existing (merged) document, go ahead and override the job.

enhancements's People

Contributors

Stargazers

Watchers

Forkers

deads2k mfojtik hexfusion derekwaynecarr zshi-redhat aravindhp wking enxebre ericavonb redhat-nfvpe ironcladlou ecordell ravisantoshgudimetla inecas awgreene sanchezl hasbro17 jmrodri nmagnezi jcantrill danehans sallyom pkliczewski tomassedovic pecameron kevinrizza abhinavdahiya s-urbaniak tkashem squeed gabemontero enj benjaminapetersen huffmanca bertinatto damemi abhat larhauga soltysh sdodson adambkaplan sttts smarterclayton bparees rgolangh lilic joelsmith gyliu513 wewang58 iamemilio knobunc lorbuschris mrogers950 jhrozek dinhxuanvu jaormx sadasu marun florinpeter russellb yuqi-zhang csrwng fedosin patrickdillon chancez jstuever tisnik ewolinetz jhixson74 cgwalters miciah jcpowermac cybertron tlwu2013 mgugino-upstream-stage ondrejmular pierreprinetti gnufied jsafrane enarha jacobtanenbaum beekhof stlaz stykmartin schseba lukas-vlcek fabianvf miabbott clyang82 shawn-hurley sbose78 praveenkumar simonpasquier tnozicka l0rd arithx ingvagabund ashishranjan738 dulek joelspeed

enhancements's Issues

authentication/separate oauth-apiserver

image for the oauth-apiserver needs to be created
image for the oauth-apiserver needs to be in the payload (must be referenced from an operator)
authentication operation needs inspection to be sure that proper unioning of status works
oauth-apiserver needs to be manually put into a cluster using the resources that we'll later use in the operator.
oauth-apiserver needs to actually work (no one has checked yet) I think @polynomial could help here.
some controllers should be extracted into an apiserver-controllers package (@stlaz is doing this now)
various helpers and mutators for operator workload manifests need to become congruent and viable for re-use
the exact backoff mechanism for the "stop managing this apiservice" needs to be coded with a unit test synthetically producing an "upgrade, downgrade, upgrade" path. I think @polynomial would be a big help here.

Enhancement necessary for resource pool support on vSphere?

I've submitted a PR to address an existing issue. It adds a feature, but the scope is not huge. Resource pool support already exists in the in-tree vsphere driver and in the machine-api operator. The only change I've made is for the installer to collect a path to an optional existing resource pool and deliver that information to all of the correct places. There may, however, be implications for QE since the underlying resource pool support is probably not currently being tested with OCP.

It was suggested in a slack discussion that this might require an enhancement, or modification to the exinsting vsphere ipi enhancement.

I'm new to contributing in this project, so apologies for my unfamiliarity with the procedures.

Clarify Manila operator's behaviour when Manila is not available in Openstack

The design proposal for Manila operator does not clarify behaviour of operator when Manila is not available in Openstack install - #129

This needs to be clarified some more. cluster-storage-operator will try and sync the Manila operator and if Manila operator exits because Manila service is not available, it might try to resync. is Cluster-Storage-Operator going to downgrade its own status if Manila operator fails to come up?

arm64 installer files for OKD4

Hi,
since there is nw the possibility to install OpenShift 4 on AWS arm64 https://github.com/openshift/ocp-on-arm and there OpenShift installer files available at
https://mirror.openshift.com/pub/openshift-v4/aarch64/clients/ocp-dev-preview/pre-release/

Is there any plan to add the arm64 support also to OKD ?

Thanks

Describe purpose and issues with openshift.io/run-level label on namespaces

In e.g. openshift/machine-api-operator#496 we stopped labeling some namespaces with openshift.io/run-level: "1" because those components "should follow normal admission control rules for a cluster".

Some of the background on the purpose of this label and why it should generally be avoided is described in bz #1805488 and should be captured as a doc here.

stability: point-to-point network check prototype

Implement #289.

Define API
- openshift/api#639
Generate API client
- WIP: openshift/client-go#147
Implement
- WIP: openshift/cluster-kube-apiserver-operator#846
- network check tool
- controller
- controller

OpenShift installer on OpenStack does not provide the userTags field to add custom tags to created virtual machines

Since my bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=1868517) got closed for some reason... and since I've been asked to file an issue:

Description of problem:

Every OpenShift release including 4.5(.5).

Every Cloud Platform (AWS, Azure, GCP) has a customization field "userTags" - for example "platform.aws.userTags" for the install-config.yaml.

OpenStack does not - which makes it really hard to find the VMs that have been created as part of cluster.

Actual results:
Tried adding "platform.openstack.userTags" to the install-config.yaml. Nothing happened.

Expected results:
Tags set on the VMs in OpenStack

Support or information on more cluster configurations

Would like to find out with working examples or documentation if OKD4 supports below configurations:

Using a combination of cloud provider nodes e.g. AWS EC2 Instance and adding user-hosted on-prem local nodes still part of the same cluster?
Using a combination of on-prem nodes in location A and adding a node in location B still part of the same cluster?

Additional information, such as configuring heartbeat to account for introduced network latencies in the above example configurations.

Describe path for conversion of a CVO operator to run under OLM

This topic is being worked on and an enhancement will be posted soon

[report] ignore tool changes

There's no need to include changes to the report tool in the weekly summary.

Configurable Routes

design #577
auto-configure RBAC permissions
validator for user input consistency and validity
observer to watch a route and set status
observer for a stanza to copy to fixed location
observer to set route spec host
authentication observer to properly set serving certificate

[report] take author list from enhancement content

Instead of using the GitHub PR author ID, look at the IDs in the authors list inside the document. Fall back to the GitHub value if we can't parse the enhancement or if the change is not related to an enhancement.

Bare metal installation with multiple IP addresses

The OpenShift documentation describes how to configure bare metal nodes with multiple IP addresses; however, after FCOS installation, openshift seems to select an interface at seemingly random. It would be great if there was a way to tell OpenShift which interface the node should use for internal communication.

This would be extremely helpful in cases of hosts with both a private and a public network connection, where the public address is required for internet access and remote management while the private address is either a private network or a separate, high speed network.

etcd operator

https://github.com/openshift/enhancements/blob/master/enhancements/etcd/cluster-etcd-operator.md

Perf

problems bootstrapping cluster @alaypatel07

Metal

ipv4 works on 3/4 (asked in slack channel)
ipv6 works on 3/4 (asked in slack channel)

IPv6

azure is broken on pivot @deads2k
1. openshift/cluster-etcd-operator#173
azure bootstrap is broken @hexfusion
- openshift/cluster-etcd-operator#154
- openshift/cluster-etcd-operator#175
- stop using invalid bootstrapIP from installer openshift/installer#3175 @deads2k
- use correct bootstrapIP in static pods openshift/cluster-etcd-operator#200 @deads2k

bootstrapping

keep bootstrap kube-apiserver up @deads2k
1. openshift/cluster-etcd-operator#179
real kube-apiserver to only use real etcd @deads2k
1. openshift/cluster-kube-apiserver-operator#772

shutdown clusters don't start back up

need to bypass the member lookup when there is no cluster
- create a golang command so we can follow the logic. @deads2k openshift/etcd#28
- use golang command in our static pods @deads2k openshift/cluster-etcd-operator#185
- update golang command to have a memory. - this isn't needed because we can use presence of data-dir to skip. Thanks @alaypatel07 and @retroflexer
- openshift/cluster-etcd-operator#203 @alaypatel07

blocker bugs

port conflict @p0lyn0mial https://bugzilla.redhat.com/show_bug.cgi?id=1806579 - openshift/cluster-etcd-operator#226
this is 4.2 to 4.3. Not part of this feature. cert problem found by wking @sttts is finding someone
oc rsh doesn't auto-start with work etcdctl. this is needed for maintenance. - @sanchezl https://bugzilla.redhat.com/show_bug.cgi?id=1805981 openshift/cluster-etcd-operator#207
upgrade failures where etcd appears to be unavailable @sanchezl https://bugzilla.redhat.com/show_bug.cgi?id=1807194

Tests we think should work

All nodes being shut off at the same time and restarted. - tested by QE and appears to be working.
IP address change of a single member @hexfusion
debugging and detection when DNS information for one member is lost @sanchezl openshift/cluster-etcd-operator#225. It degrades nicely, but we don't like degrading during upgrade. May end up switching to IPs
Addition of a new member when there is significant etcd data. - verified by @alaypatel07 by adding a fourth member.
Upgrade, downgrade, re-upgrade @alaypatel07

Tests we think should fail

[report] add test code

The tool is complex enough, especially the text parsing and formatting, that having some test code would be useful.

Improve markdown linter; Optimize for human time vs brittle manual markdown formatting.

Hello 👋🏽

I like the fact that this project has a markdown linter that enforces some consistent styling. We do that in projects we maintain e.g Thanos (e.g here). However, I think there is some room to improve. As an author of the proposal or some markdown document, I want to focus on the content, not on formatting. Yet I literally spent ~2h in a total of my time on fixing my proposal #866 formatting (not mentioning context switches). I think that's not the smartest way to spend my paid work time, and I think there are alternatives that will minimize human time but enforce good styling which is aiming to improve maintenance and readability.

Let's enumerate why I had to spend so much time (and my PR at the time of writing is still failing #866):

Linter is giving me only snippets of errors, not all the errors (or I cannot use the output), this means I have to push and check if it passes to learn there is a bunch of errors in another place.
Almost 99% of errors I fixed was arguably useful. Things like:
- Line length does not matter these days. Most editors can wrap lines on certain width and markdown format does not care either.
- Why do spaces inside linked item matters? or code spans?
- Language in code spans having to be specified is also a bit too strict

Anyway, those are just opinions, but there is something that will satisfy everyone. If we lint on some problem, why not.. fixing this automatically?

Proposal

Introduce a proper markdown formatter that fixes all problems and produces formatted markdown or fails if it cannot fix it.
Run it in CI, if produced output by formatter is different then what was committed then fail. If formatter fails, fail.
Allow users easily install and run formatter without extra configuration

We did that in https://github.com/bwplotka/mdox if we want to use it here too. Just run mdox fmt of mdox fmt --check if we want to validate if things are formatted.

We use https://github.com/Kunde21/markdownfmt internally.

Hope that will help future designers and authors of markdown in this project (:

Allow Users to Create Custom Grafana Dashboards

Currently, there's no way for users to create or customize grafana dashboards in the cluster in respect to the default cluster monitoring stack. We let people execute arbitrary prometheus queries, but not create dashboards for those queries.

We should allow user to either:

Create additional dashboards in the default grafana instance, possibly in a namespaced area that prevents them from editing the default dashboards. Custom dashboards will not be offered support if they break on upgrade.
or allow users to deploy a secondary grafana instance pointing at the same prometheus to create these dashboards if 1 is infeasible.

It seems like 1 should be easily achieved: https://grafana.com/docs/grafana/latest/permissions/organization_roles/#editor-role

Just create a folder similar to 'user-added-dashboards' and give them editor permissions on that folder.

OKD machineset with "Highly Available" option in oVirt

Hey there,

is it possible to implement the option as described here:

https://www.ovirt.org/develop/ha-vms.html

so our OKD machines are starting automatically after maintenance or outage scenarios? As we are spawning machines quite dynamically, we actually have to remember setting this option manually every time we scale.

HELM 3 Catalog Enhancement - filter charts based on kubeVersion of cluster

Utilize kubeVersion field from helm index to filter charts which are NOT supported on current cluster.

Excerpt from helm index with 'kubeVersion:' :

apiVersion: v1
entries:
  chart-a:
  - apiVersion: v2
    appVersion: 10.0.0
    created: "2020-02-21T17:06:07.641831254-05:00"
    dependencies:
    - alias: sch
      name: ibm-sch
      repository: '@sch'
      version: 1.2.6
    description: Chart for deploying product a
    digest: 4a71450835d18c2bee431d15a10096ce085f2d024b49067e41ede31bb8893818
    keywords:
    - example
    kubeVersion: '>=1.11.0'
    name: chart-a
    urls:
    - chart-a-3.1.0.tgz
    version: 3.1.0

@sbose78 @siamaksade @pedjak

IPTables Operator

We are in the difficult situation where our server provider only allows limited configuration for their firewall - as things are right now, FCOS also doesn't seem to allow custom IPTables configurations via ignition and besides, ignition isn't truly flexible enough for this I believe since post-deployment modifications may be needed to whitelist machines.

Since OpenShift seems to take over the iptables rules anyways, it would be helpful if we could dynamically configure the node's firewall via CRDs.

CI Operator Visualizer Tool

A tool to enable engineers to more quickly diagnose the causes of failed CI runs. This tool should run automatically after/during artifact gathering and display an easy to consume visualization with important details for each operator.

Long term, it would be nice to present information about operator statuses across CI runs in an OK/NotOK fashion, so a particular team, such as machine-api, can look at all CI runs and see if there was any detected problem with their particular operator.

kube-apiserver/auto-cert-recovery.md

https://github.com/openshift/enhancements/blob/master/enhancements/kube-apiserver/auto-cert-recovery.md

/assign @tnozicka

We want to start running all containers by default with /dev/fuse.

This would allow us to more easily run fuse-overlay as well as other user file systems if the container is running in a user namespace. We have lots of user attempting to run buildah within a container, and this would enable it.

I plan on adding "/dev/fuse" to cri-o.conf in the installation and having the OS automatically load the fuse kernel module.

I don't see this as much of a security risk, since /dev/fuse is already allowed to be used with no capabilties/privileges by non root users.

Allow Users To List Own Groups

As an unprivileged developer building and deploying applications on OpenShift, I need to be able to list the groups I am a member of so that I can associate the correct groups roles with my development tools like ArgoCD/Tekton/etc...

On the clusters I use for development, I am unable to list those groups

❯ oc get groups
Error from server (Forbidden): groups.user.openshift.io is forbidden: User "infosec812" cannot list resource "groups" in API group "user.openshift.io" at the cluster scope

[report] ignore "WIP" markers when stripping the prefix from titles

Look for [WIP] and WIP: and other variations and remove them from the titles when looking for a section prefix.

oc debug enhancement

Summary
As currently implemented, oc debug can accept single command without parameters such as oc debug [node] -- ls
This is quite restrictive and limits possible actions that can be done to the nodes.

I propose that oc debug be enhanced in a way that will allow commands with parameters to be used to perform such actions without having to enter the node/container's shell first, and having to exit later..

Alternatively, implement a new function to allow administrative user to perform actions on the nodes.

Background
A few months ago during my 🔴⛑ internship, I had developed an ansible role for deploying rook-ceph on openshift.

During "teardown", it is required that /var/lib/rook: Path on each host in the cluster where configuration is cached by the ceph mons and osds be deleted.

This was done by using oc debug [NODENAME] followed by
rm -rf /host/var/lib/rook
and exit.

This was tedious and you had to do it for each worker node during troubleshooting between rook-ceph installs/removals. I was able to make the repeating process slightly easier by using a for loop in the shell to iterate through each worker node, but still had to manually press return by hand for each worker node.

Log hostname/actual IP for instances request ignition files on first boot

A recurring problem the MAO has is a machine is provisioned, but never joins the cluster. Invariably, this is blamed on the machine-api due to lack of understanding how ignition and MCS/MCO works.

We need to track the hostname and/or private-ip address of machines that request the first boot ignition file so we can identify where a failure has occurred. The MCS currently logs IP addresses it sees from the http connection level, but this information is usually a VIP or a NAT and not useful. The actual hostname and/or private-ip should be captured in either the http headers or the request payload, and we should track these requests in a way they can be easily associated with a particular machine object. Machine objects know via the cloud providers what the hostname and/or ip addresses of an instance will be, so having the MCS report this information somewhere useful will allow us to build tooling to help diagnose cluster-joining problems easier.

Question: Helm 3, OpenShift specific resources handling (DeploymentConfigs)

Hello,
I found a couple of Helm 3 related proposals and would ask about plans for OpenShift specific resources, such as DeploymentConfigs.

Unfortunately, at the moment Helm doesn't handle that specific resources properly (you can not force helm to wait for successful DeploymentConfig rollout for example)

Question - Would it be possible to handle DeploymentConfigs with Helm charts somehow?

Many thanks and best regards!
Denis

P.S. Some links which might illustrate the issue

Helm part which handles Deployments: https://github.com/helm/helm/blob/master/pkg/kube/wait.go#L69

I also made some experiments around:
https://github.com/helm/helm/compare/master...lobziik:openshift_cleaned?expand=1

enhancements/etcd/storage-migration-for-etcd-encryption.md

storage-migration-for-etcd-encryption.md

Add operator image to 4.4
Enable operator -- openshift/cluster-kube-storage-version-migrator-operator#6
Switch to using migrator
- kube-apiserver-operator: openshift/cluster-kube-apiserver-operator#713
- openshift-apiserver-operator: openshift/cluster-openshift-apiserver-operator#291

Non-blocking:

Upstream: rename package from github.com/kubernetes-sigs/* to sigs.k8s.io/* - kubernetes-sigs/kube-storage-version-migrator#46.
Upstream: fix flaky tests - kubernetes-sigs/kube-storage-version-migrator#51
Remove unused operand image from 4.3.x

Bubble Unreachable Kubelet Alert to the very top

If a kubelet goes unreachable, you get like 57 alerts of all different kinds firing. It's very difficult to get a sense of what's happening. And that is just for one random worker kubelet, not even a master.

There should be a separate section on the alerting dashboard for just these types of alerts.

After just doing an oc delete node, almost all of the alerts cleared (Obviously, this was just to test, don't just oc delete node, use the machine-api to replace a broken node).

allow mounting custom subscription into the built container

In reagards to subscription-content-access.md (#384),

Is it possible to add to it the ability for customers to create custom rhsm secret/config maps in the project and mount them while building containers in a way that subscription is available during build but never saved to a layer? And without the need to modify Dockerfile as is needed ATM.

It is alright to have a cluster global subscription but sometimes developer/QE wants to try a new product with development subscription or need beta channel or something. Or in case of a service provider, the service provider may not want to share subscription data with its customers.

template-lint.sh can't handle committed file renames

I ran into some issues with make lint when trying to relocate an enhancement to a different top level folder.

Here is a trivial reproducer that is entirely a hypothetical example:

mv enhancements/builds/volume-mounted-resources.md enhancements/builds/test-name.md
git add .
git commit -m "test"
make lint

And you are presented with the following error:

+ /workdir/hack/template-lint.sh
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
enhancements/builds/volume-mounted-resources.md missing "## Release Signoff Checklist"
enhancements/builds/volume-mounted-resources.md missing "## Summary"
enhancements/builds/volume-mounted-resources.md missing "## Motivation"
enhancements/builds/volume-mounted-resources.md missing "### Goals"
enhancements/builds/volume-mounted-resources.md missing "### Non-Goals"
enhancements/builds/volume-mounted-resources.md missing "## Proposal"
enhancements/builds/volume-mounted-resources.md missing "### User Stories"
enhancements/builds/volume-mounted-resources.md missing "### Risks and Mitigations"
enhancements/builds/volume-mounted-resources.md missing "## Design Details"
enhancements/builds/volume-mounted-resources.md missing "### Test Plan"
enhancements/builds/volume-mounted-resources.md missing "### Graduation Criteria"
enhancements/builds/volume-mounted-resources.md missing "#### Dev Preview -> Tech Preview"
enhancements/builds/volume-mounted-resources.md missing "#### Tech Preview -> GA"
enhancements/builds/volume-mounted-resources.md missing "#### Removing a deprecated feature"
enhancements/builds/volume-mounted-resources.md missing "### Upgrade / Downgrade Strategy"
enhancements/builds/volume-mounted-resources.md missing "### Version Skew Strategy"
enhancements/builds/volume-mounted-resources.md missing "## Implementation History"
enhancements/builds/volume-mounted-resources.md missing "## Drawbacks"
enhancements/builds/volume-mounted-resources.md missing "## Alternatives"
grep: enhancements/builds/volume-mounted-resources.md: No such file or directory
enhancements/builds/volume-mounted-resources.md is missing a title
make: *** [Makefile:13: lint] Error 1

@russellb @dhellmann is there a way we can improve the template-lint.sh script for this edge case?

Support for SOCA AWS MachinePools to support EC2 Spot Best practices

Until very recently this document was accurate up to date and spot on (pun intended) in regards to the implementation of Spot on OpenShift https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/spot-instances.md

With the recent release of Cluster API Provider AWS (CAPA) now there is experimental support for MachinePool API's on AWS. This would be the desired approach for users of OpenShift running their workloads on EC2 Spot and adhering Spot best practices.

Note that while the MachinePool API is now implemented, There are still dependencies like Cluster Autoscaler support for MachinePools that might be needed. https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/clusterapi

Documentation-First Development

Multiple times per release, we are seeing escalations related to 'bugs' as perceived by users. Frequently, these bugs pertain to specific installation and cloud provider options. For one reason or another, different users have the perception that missing features are actually bugs because some place in the documentation gave them some high-level steps about how to perform some particular action, but not all components are able to handle such a configuration.

In order to better define what we actually support, feature work should point to publicly facing documentation. Chicken and egg problem aside (which can be resolved, hopefully, with some kind of automation), before a feature gets merged, it should have reference to public facing documentation.

If a given setting or option is not explicitly called out in our documentation, it should not be considered supported. No setting or option should make it into documentation without architectural review to ensure that all the different components impacted will be updated or assuring existing capabilities cover documented processes.

This is a recurring issue on UPI installations.

To give a recent example, a shared VPC in GCP for a UPI install. The documentation said it was supported, but there is no relevant section about how to configure the machine-api: https://docs.openshift.com/container-platform/4.5/installing/installing_gcp/installing-gcp-user-infra-vpc.html

Openshift-installer support for IBM Cloud

Re-opening this issue here openshift/installer#3275

I see there is already some on-going enhancements around IBM Cloud support in general (https://github.com/openshift/enhancements/blob/ce4d303db807622687159eb9d3248285a003fabb/enhancements/update/ibm-public-cloud-support.md)

I'm curious as to if the Openshift Installer can have IBM Cloud as a supported platform 😺

CoreOS Encrypted Disks By Default doc is not clear enough for installer changes

The installer team was looking at implementing the https://github.com/openshift/enhancements/blob/a3411e6f3458743ee2f84b013101d584fc272dc8/enhancements/automated-policy-based-disencryption.md#installer-support section, but the section is very brief in details that would allow somebody to implement the requested feature.

Here are some of the high-level questions that probably should be answered..

A) The installer can only provide the configurastion for nodes in form of MachineConfig objects.
Therefore it would be highly useful there were example for MachineConfig objects that would define the encryption setting:

i) default (disable: false, enforce: true)
ii) tpm2 based
iii) tang based, multiple tang servers based
iv) custom user based

B) The specs allow tpm2, tang etc. as source for encryption setup source, but there are no links or definition of valid values for these options.

C) The spec says the default is disable: false, enforce: true

that's not a backward compatible change for install-config.yaml users, because users today expect to have no encryption...?

D) lack of clarity for default on cloud platforms.\

https://github.com/openshift/enhancements/blob/b5e77b5a99dc19de9acfa27fb0758ca42d74f3ee/enhancements/automated-policy-based-disencryption.md#policies

is also not clear on the defaults for cloud like AWS, Azure, GCP..

Service Load Balancers for Bare Metal

Document the various alternatives we could choose to provide service load balancers on bare metal, including a recommended approach.

OKD - Wizard SSH Key Missing

I have been trying OKD on oVirt. Using Wizard. Symptom was cluster kept failing with message... this is NOT 100% from below but more of another step missed to root cause below error:

INFO Pulling debug logs from the bootstrap machine
DEBUG Added /tmp/bootstrap-ssh328666366 to installer's internal agent
ERROR Attempted to gather debug logs after installation failure: failed to create SSH client: failed to use the provided keys for authentication: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

OKD Wizard Steps

./openshift-install create install-config --dir=/home/openshift/$cluster --log-level=debug
? Platform ovirt
? oVirt API endpoint URL https://ovirte01.penguinpages.local/ovirt-engine/api
? Is the oVirt CA trusted locally
? Yes
? oVirt certificate bundle -----BEGIN CERTIFICATE----- MIID7TCCAtANYAA== -----END CERTIFICATE-----
? oVirt engine username admin@internal
? oVirt engine password
? oVirt cluster Default_Cluster ? oVirt storage domain data
? oVirt network ovirtmgmt
? Internal API virtual IP 172.16.100.63
? Internal DNS virtual IP 172.16.100.83
? Ingress virtual IP 172.16.100.73
? Base Domain penguinpages.local ? Cluster Name okd
? Pull Secret [? for help] **************************************** INFO Obtaining RHCOS image file from 'https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/32.20200629.3.0/x86_64/fedora-coreos-32.20200629.3.0-openstack.x86_64.qcow2.xz?sha256=dc2450ab1347f20eecc46f860105ddd4b62a677be6a5de648b31ef44bc114581' INFO Creating infrastructure resources...

Notice this vs Wizard post being told "you forgot to run for that user 'ssh-keygen'

[openshift@ansible00 okd]$ ./openshift-install create install-config --dir=/home/openshift/$cluster --log-level=debug
DEBUG OpenShift Installer 4.5.0-0.okd-2020-09-18-202631
DEBUG Built from commit 63200c80c431b8dbaa06c0cc13282d819bd7e5f8
DEBUG Fetching Install Config...
DEBUG Loading Install Config...
DEBUG Loading SSH Key...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Cluster Name...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Pull Secret...
DEBUG Loading Platform...
DEBUG Fetching SSH Key...
DEBUG Generating SSH Key...
? SSH Public Key /home/openshift/.ssh/id_rsa.pub
DEBUG Fetching Base Domain...
DEBUG Fetching Platform...
DEBUG Generating Platform...
? Platform ovirt
? oVirt cluster [Use arrows to move, enter to select, type to filter, ? for more help]

Default_Cluster

I think wizard should prompt for "where is your key".. vs just skipping and not noting this cluster will fail without key.

Rotate/Renew long term CA and API client certs created by installer

When we use the installer to create the cluster, it creates a kubeconfig file for system:admin user which uses client certificates to authenticate to the API server and have a vaild CA (for 2 years). This client certificate is valid for 10 years and there is no way (at least we don’t know) to have it rotated with a new one after the cluster is created. Also the CA is the same for the existing cluster.

UseCase -1
If a company installs OpenShift and an employee that was cluster admin leaves, and now the company wants to revoke that employee's access to OpenShift. They cannot do that today if the employee had access to a client cert with admin access.

UseCase - 2
If we want to use a specific version of CRC on a cloud infra, then any user which downloaded that specific version of CRC has the kubeconfig file with API client certs and CA, able to login to the cluster as soon as they know the endpoints.

$ oc config view --kubeconfig ~/.crc/cache/crc_libvirt_4.5.5/kubeconfig
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://api.crc.testing:6443
  name: crc
contexts:
- context:
    cluster: crc
    user: admin
  name: admin
current-context: admin
kind: Config
preferences: {}
users:
- name: admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

$ oc config view --raw --kubeconfig ~/.crc/cache/crc_libvirt_4.5.5/kubeconfig -o jsonpath={.clusters[0].cluster.certificate-authority-data} | base64 -d - | openssl x509 -noout -dates -in -
notBefore=Aug  6 13:24:37 2020 GMT
notAfter=Aug  6 13:24:38 2022 GMT

$ oc config view --raw --kubeconfig ~/.crc/cache/crc_libvirt_4.5.5/kubeconfig -o jsonpath={.users[0].user.client-certificate-data} | base64 -d - | openssl x509 -noout -dates -in -
notBefore=Aug  6 12:44:49 2020 GMT
notAfter=Aug  4 12:44:50 2030 GMT

cc @bbrowning @gbraad

CRD documentation URL lookup

I would like to propose a documentation enhancement.

OpenShift CRDs are named with an URI in the openshift.io domain:

machine.openshift.io/v1beta1
metering.openshift.io/v1

It would be helpful to find useful information at the corresponding URL.

Ideally, https://machine.openshift.io/v1beta1 contains some (machine-readable?) machine-generated documentation about the resource.

But for a first, humble and (relatively) easy step, pointing my browser to that URL redirects me to the relevant access.redhat.com documentation.

Is this repo the right place to propose such an enhancement?

API counts

Blocker

Future

update stats during shutdown

Default tls termination configuration for user-created routes

New routes do not have tls termination configured by default, if user does not specify it in yaml. Nowadays most environments have tls enabled, so trying to access non-tls route usually gives user some sort of error page.

Defaults could be applied to new routes that do not have tls configuration set.

Setting cluster-wide default tls termination settings for all new routes would make it easier for users to create new routes. The easier the use, the less tickets/requests admins and support teams will get.

etcd operator

https://github.com/openshift/enhancements/blob/master/enhancements/etcd/cluster-etcd-operator.md

Metal

what is here @alaypatel07

IPv6

azure is broken on pivot @deads2k
azure bootstrap is broken @hexfusion

bootstrapping

keep bootstrap kube-apiserver up
real kube-apiserver to only use real etcd

Tests

[report] correct the count of enhancements in each section when some are filtered out of the report

We don't include the this-week PRs in the report, but they still show up in the totals, so sometimes the counts don't match the list of enhancements.

[report] add footer option to config file

add config option for footer to append to the report so we can use our names without hard-coding that

[report] add config option for intro to insert after title

We usually link to the published versions on GitHub.

Automatically create user-data secret for each MCP

Currently, the installer creates the initial user-data (stub ignition file) that is provided to machines on first boot to configure the host. This ignition stub contains a url that specifies the name of a MachineConfigPool. If a user creates a custom MachineConfigPool, the user must create a new stub ignition file manually, which is quite tedious.

We should automatically generate a new user-data secret for each MachineConfigPool created by the MCO. This would allow users to more easily consume custom MCPs via the machine-api by just specifying the corresponding user-data secret.

Describe zero-downtime upgrades

This topic is being worked as a draft google doc and will be converted to an enhancement.

need okd4.x hello-world example and suppport webui

need okd4.x hello-world example and suppport webui for route test

i want a quick start demo to deploy a hello-world app have a webui support test route

Multi-tenancy and Elasticsearch rollover

Re: LOG-545: Cluster logging Elasticsearch rollover data design #108

Will there be an easy way to see how much disk space is consumed per application namespace? On my multi-tenant cluster I've occasionally had one namespace accidentally leave their logging at DEBUG when they went to production. When I got disk full alerts from ES I was able to query ES for all indexes sorted by size and see the culprit quickly and then contact the appropriate person for that namespace to get them to turn their logging back down.

I worry that by merging all the logs into once index it will be harder to find the "noisy neighbour" and have them adjust their settings.