raballew / okd-the-hard-way Goto Github PK
View Code? Open in Web Editor NEWBootstrap an OKD cluster the hard way on user-provisioned infrastructure in a disconnected environment. No scripts.
License: MIT License
Bootstrap an OKD cluster the hard way on user-provisioned infrastructure in a disconnected environment. No scripts.
License: MIT License
The lab docs/22-usage.md needs to be extended to explain on how to onboard a tenant in a automated and standardized way.
As of now, the services machine must be manually configured to serve the required functionalities to bootstrap and run a cluster. This process is error prone and introduces issues regarding reproducibility. A better approach would be to run each service in its own container.
Currently libvirt creates IP table rules on the hosts system which get modified to simulate a disconnected environment. Basically only the services machine is allowed to connect to the internet, other traffic from and to other nodes will be dropped. As iptables are not recommended it would be helpful to disable the automagical creation of those rules and replace them with nmcli rules to standardize the usage accross the labs.
OKD 4.7 has been released. Bump the installation instructions to the newest release.
The current implementation uses IPv4 only on both the hypervisor and OKD overlay networking. Even though we are not facing any IP address shortages, switching to IPv6 should be done for academical purposes as this is increasingly becoming used in the real world. The idea is to define one or multiple IPv6 subnets using libvirt
and use a flat network approach to make pods or services directly accessible.
Also compatiblity with ceph and metallb needs to be verified. Additionally many services need to be reworked due to their current IPv4 configuration.
The mirror registry service works fine on already started machines but if the VM gets rebooted the unit fails to do a missing dependency and thus leaving the mirror registry offline. This behaviour needs to be improved so that the services node can be rebooted without the need to perform manual actions afterwards.
As of now the Docker registry container image is used to host the mirror registry. While the basic functionalities are the same, an enterprise-quality registry usually offers supprt for building, securing and serving container images which is not covered by the Docker registry. As an effort to move the lab environment closer to a real world production environment, add a procedure that describes how to deploy a registry that offers enterprise grade functionality for proof-of-concept (non-production) purposes.
So instead of using oc adm mirror
to provide the resources for a release one could simple configure an pull trough or mirror registry that automatically stays synced.
After providing an automated way on how to install the cluster, one usually has tasks that need to be performed frequently. Those tasks should be automated to reduce friction.
Describe how to do the following in an automated way
During the bootstrap process the authentication cluster operator fails to move from progressing to available because the .well-known/oauth-authorization-server
endpoint is not reachable.
oc get clusteroperator authentication -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
annotations:
exclude.release.openshift.io/internal-openshift-hosted: "true"
creationTimestamp: "2020-09-02T07:56:05Z"
generation: 1
managedFields:
- apiVersion: config.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:exclude.release.openshift.io/internal-openshift-hosted: {}
f:spec: {}
f:status:
.: {}
f:extension: {}
f:versions: {}
manager: cluster-version-operator
operation: Update
time: "2020-09-02T07:56:05Z"
- apiVersion: config.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions: {}
f:relatedObjects: {}
manager: authentication-operator
operation: Update
time: "2020-09-02T08:07:26Z"
name: authentication
resourceVersion: "18233"
selfLink: /apis/config.openshift.io/v1/clusteroperators/authentication
uid: e85a7dd9-f8f8-4fda-bc0a-d3dce0e61323
spec: {}
status:
conditions:
- lastTransitionTime: "2020-09-02T08:07:23Z"
reason: AsExpected
status: "False"
type: Degraded
- lastTransitionTime: "2020-09-02T08:07:26Z"
message: 'Progressing: got ''404 Not Found'' status while trying to GET the OAuth
well-known https://192.168.200.31:6443/.well-known/oauth-authorization-server
endpoint data'
reason: _WellKnownNotReady
status: "True"
type: Progressing
- lastTransitionTime: "2020-09-02T08:07:26Z"
status: "False"
type: Available
- lastTransitionTime: "2020-09-02T07:58:50Z"
reason: AsExpected
status: "True"
type: Upgradeable
extension: null
relatedObjects:
- group: operator.openshift.io
name: cluster
resource: authentications
- group: config.openshift.io
name: cluster
resource: authentications
- group: config.openshift.io
name: cluster
resource: infrastructures
- group: config.openshift.io
name: cluster
resource: oauths
- group: route.openshift.io
name: oauth-openshift
namespace: openshift-authentication
resource: routes
- group: ""
name: oauth-openshift
namespace: openshift-authentication
resource: services
- group: ""
name: openshift-config
resource: namespaces
- group: ""
name: openshift-config-managed
resource: namespaces
- group: ""
name: openshift-authentication
resource: namespaces
- group: ""
name: openshift-authentication-operator
resource: namespaces
- group: ""
name: openshift-ingress
resource: namespaces
This results in other cluster operators to stay in progressing or degraded state.
oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication False True False 127m
cloud-credential 4.5.0-0.okd-2020-08-12-020541 True False False 138m
cluster-autoscaler 4.5.0-0.okd-2020-08-12-020541 True False False 130m
config-operator 4.5.0-0.okd-2020-08-12-020541 True False False 130m
console 4.5.0-0.okd-2020-08-12-020541 False True True 127m
csi-snapshot-controller 4.5.0-0.okd-2020-08-12-020541 True False False 73m
dns 4.5.0-0.okd-2020-08-12-020541 True False False 134m
etcd 4.5.0-0.okd-2020-08-12-020541 True True True 134m
image-registry 4.5.0-0.okd-2020-08-12-020541 True False False 131m
ingress 4.5.0-0.okd-2020-08-12-020541 True False False 73m
insights 4.5.0-0.okd-2020-08-12-020541 True False False 131m
kube-apiserver 4.5.0-0.okd-2020-08-12-020541 True True True 133m
kube-controller-manager 4.5.0-0.okd-2020-08-12-020541 True False False 133m
kube-scheduler 4.5.0-0.okd-2020-08-12-020541 True False False 133m
kube-storage-version-migrator 4.5.0-0.okd-2020-08-12-020541 True False False 74m
machine-api 4.5.0-0.okd-2020-08-12-020541 True False False 131m
machine-approver 4.5.0-0.okd-2020-08-12-020541 True False False 134m
machine-config 4.5.0-0.okd-2020-08-12-020541 True False False 133m
marketplace 4.5.0-0.okd-2020-08-12-020541 True False False 130m
monitoring 4.5.0-0.okd-2020-08-12-020541 True False False 121m
network 4.5.0-0.okd-2020-08-12-020541 True False False 135m
node-tuning 4.5.0-0.okd-2020-08-12-020541 True False False 135m
openshift-apiserver 4.5.0-0.okd-2020-08-12-020541 True False False 131m
openshift-controller-manager 4.5.0-0.okd-2020-08-12-020541 True False False 131m
openshift-samples 4.5.0-0.okd-2020-08-12-020541 True False False 130m
operator-lifecycle-manager 4.5.0-0.okd-2020-08-12-020541 True False False 134m
operator-lifecycle-manager-catalog 4.5.0-0.okd-2020-08-12-020541 True False False 134m
operator-lifecycle-manager-packageserver 4.5.0-0.okd-2020-08-12-020541 True False False 131m
service-ca 4.5.0-0.okd-2020-08-12-020541 True False False 135m
storage 4.5.0-0.okd-2020-08-12-020541 True False False 131m
The endpoint is reachable from the oc
clients machine:
curl -X GET https://192.168.200.31:6443/.well-known/oauth-authorization-server -k
{
"paths": [
"/apis",
"/apis/",
"/apis/apiextensions.k8s.io",
"/apis/apiextensions.k8s.io/v1",
"/apis/apiextensions.k8s.io/v1beta1",
"/healthz",
"/healthz/etcd",
"/healthz/log",
"/healthz/ping",
"/healthz/poststarthook/crd-informer-synced",
"/healthz/poststarthook/generic-apiserver-start-informers",
"/healthz/poststarthook/priority-and-fairness-config-consumer",
"/healthz/poststarthook/start-apiextensions-controllers",
"/healthz/poststarthook/start-apiextensions-informers",
"/livez",
"/livez/etcd",
"/livez/log",
"/livez/ping",
"/livez/poststarthook/crd-informer-synced",
"/livez/poststarthook/generic-apiserver-start-informers",
"/livez/poststarthook/priority-and-fairness-config-consumer",
"/livez/poststarthook/start-apiextensions-controllers",
"/livez/poststarthook/start-apiextensions-informers",
"/metrics",
"/openapi/v2",
"/readyz",
"/readyz/etcd",
"/readyz/log",
"/readyz/ping",
"/readyz/poststarthook/crd-informer-synced",
"/readyz/poststarthook/generic-apiserver-start-informers",
"/readyz/poststarthook/priority-and-fairness-config-consumer",
"/readyz/poststarthook/start-apiextensions-controllers",
"/readyz/poststarthook/start-apiextensions-informers",
"/readyz/shutdown",
"/version"
]
}
By default, FIPS mode is not enabled. If FIPS mode is enabled, the Fedora CoreOS (FCOS) machines that OKD runs on bypass the default Kubernetes cryptography suite and use the cryptography modules that are provided with FCOS instead. This setup is common for production workloads. It seems that currently, if FIPS is enabled, the FCOS installation fails. Having this feature enabled would be a step in moving the configuration closer to what a lot of real world setups use.
Rebooting up to two nodes of each node type is possible without any bigger issues. But the functionality after rebooting everything at once in the case that the hypervisor must be updated e.g. has not be verified.
Add a smoke test to verify that the following works as expected:
The section docs/20-deploy.md should cover the following content:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.