openshiftdemos / openshift-ops-workshops Goto Github PK

Workshop materials for OpenShift admin training, covering Red Hat OpenShift Container Platform 4, Modern App Dev Roadshow's Ops Track, and Summit 2023 Hands-on with OCP Plus.

License: GNU General Public License v3.0

Shell 81.92% Python 6.99% Dockerfile 11.09%

acm acs admin openshift redhat

openshift-ops-workshops's Introduction

OpenShift Operations Workshops

This repository contains lab instructions and supporting content for a series of administrative-focused workshops centered around OpenShift.

The workshops included in this repo are:

Red Hat OpenShift Container Platform 4 for Admins RHDP
Modern App Dev Roadshow - Ops Track RHDP / More Info
Summit 2023 Hands on with OCP Plus Workshop RHDP

If you are a Red Hat employee with access to RHDP, we recommend deploying using the provided RHDP links above.

Requirements / Prerequisites

Doing these labs on your own requires a few things.

AWS

These labs are designed to run on top of an OpenShift 4 cluster that has been installed completely by the new installer. You will need access to AWS with sufficient permissions and limits to deploy the 3 masters, 4-6 regular nodes, and NVME-equipped nodes for storage.

Check out the documentation for Installing on AWS.

OpenShift 4

At this time an OpenShift 4 cluster can be obtained by visiting https://try.openshift.com -- a free "subscription" to / membership in the developer program is required.

Deploying the Lab Guide

Deploying the lab guide will take three steps. First, you will need to get information about your cluster. Second, you will build a container based on your lab. Third, you will deploy the lab guide using the information you found so that proper URLs and references are automatically displayed in the guide.

Required Environment Variables

Most of the information can be found in the output of the installer.

Explaination and examples

API_URL - URL to access API of the cluster
- https://api.cluster-gu1d.sandbox101.opentlc.com:6443
MASTER_URL - Master Console URL
- http://console-openshift-console.apps.cluster-gu1d.sandbox101.opentlc.com
KUBEADMIN_PASSWORD - Password for kubeadmin
SSH_PASSWORD - password for ssh into bastion
ROUTE_SUBDOMAIN - Subdomain that apps will reside on
- apps.cluster-gu1d.sandbox101.opentlc.com:6443
- apps.mycluster.company.com

Specific to Red Hat internal systems

GUID - GUID
- gu1d
BASTION_FQDN - Bastion Domain Name
- bastion.gu1d.sandbox101.opentlc.com

Create a file called workshop-settings.sh using the values of your environment. Here is an example.

⚠️ For export ensure special characters are escaped (ie. use \! in place of !).

API_URL=https://api.openshift4.example.com:6443
MASTER_URL=https://console-openshift-console.apps.openshift4.example.com
KUBEADMIN_PASSWORD=IqJK7-o3hYR-ZTr6c-7sztN
SSH_USERNAME=lab-user
SSH_PASSWORD=apassword
BASTION_FQDN=foo.bar.com
GUID=XXX
ROUTE_SUBDOMAIN=apps.openshift4.example.com
HOME_PATH=/opt/app-root/src

Deploy the Lab Guide

Now that you have the workshop-settings.sh file with the various required variables, you can deploy the lab guide into your cluster.

First, clone the repo

NOTE Remember to checkout the branch you want to test against

git clone https://github.com/openshiftdemos/openshift-ops-workshops

Next, Build a container using the repo/branch you checked out.

cd openshift-ops-workshops
export QUAY_USER=myusername
export BRANCH=$(git branch --show-current)
podman build -t quay.io/${QUAY_USER}/lab-sample-workshop:${BRANCH} .

Now, login to quay (it's free to sign up) or another registry your cluster has access to.

podman login quay.io

Next push your container to your repo.

podman push quay.io/${QUAY_USER}/lab-sample-workshop:${BRANCH}

You will use this image to deploy the lab. The following command will log you in as kubeadmin on systems with oc client installed:

oc login -u kubeadmin -p $KUBEADMIN_PASSWORD

oc new-project lab-ocp-cns

# This part is needed if you're running on a "local" or "self-provisioned" cluster
oc adm policy add-role-to-user admin kube:admin -n lab-ocp-cns

# Create deployment.
oc new-app -n lab-ocp-cns https://raw.githubusercontent.com/redhat-cop/agnosticd/development/ansible/roles/ocp4-workload-workshop-admin-storage/files/production-cluster-admin.json \
--param TERMINAL_IMAGE="quay.io/${QUAY_USER}/lab-sample-workshop:${BRANCH}" --param PROJECT_NAME="lab-ocp-cns" \
--param WORKSHOP_ENVVARS="$(cat ./workshop-settings.sh)"

# Wait for deployment to finish.

oc rollout status dc/dashboard -n lab-ocp-cns

If you made changes to the container image and want to refresh your deployed Homeroom quickly, execute this:

oc import-image -n lab-ocp-cns dashboard

Doing the Labs

Your lab guide should deploy in a few moments. To find its url, execute:

oc get route dashboard -n lab-ocp-cns

You should be able to visit that URL and see the lab guide. From here you can follow the instructions in the lab guide.

Notes and Warnings

Remember, this experience is designed for a provisioning system internal to Red Hat. Your lab guide will be mostly accurate, but slightly off.

You aren't likely using lab-user
You will probably not need to actively use your GUID
You will see lots of output that references your GUID or other slightly off things
Your MachineSets are different depending on the EC2 region you chose

But, generally, everything should work. Just don't be alarmed if something looks mostly different than the lab guide.

Also note that the first lab where you SSH into the bastion host is not relevant to you -- you are likely already doing the exercises on the host where you installed OpenShift from.

Troubleshooting

Make sure you are logged-in as kubeadmin when creating the project

If you are getting too many redirects error then clearing cookies and re-login as kubeadmin. This usually happens if you're using RHPDS and stopped/started a cluster.

Cleaning up

To delete deployment run

oc delete all,serviceaccount,rolebinding,configmap -l app=admin -n labguide

License

This repository and everything within it are licensed under the GNU General Public License (GPL) v3.0

openshift-ops-workshops's People

Contributors

Stargazers

Watchers

Forkers

cooktheryan dmesser kmurudi dobbymoodge scollier tonynv mkerker lutzlange masauve twiest edseymour ianpurdy mdstjean craq2017 nytrozzo chrisricci sa-ne sabre1041 jmferrer slmingol levysantanna ashtondavis matej13 michaeljohn32 jewzaam mwoodson blues-man piggyvenus kaovilai eformat stencell steven-ellis jliberma kim-jaeyul b43646 arnav300 rhcn sarathindev zeldi alimuratunsal pablohalamaj binnyrs wangjun1974 stevenbarre tinytongolai jaysonzhao pichuang akochnev techjw slionb polashdevops proxi73 vishnoisuresh grvuolo renatoppuccini ddobrescu jeannyil thepuneet codegazers yigitpolat johnwangemory siler23 rh-virtual-workshops vijaymuru nuttea ralvares franklinunix jfillman shahtab romancin mkuriti ecwpz91 jnovotni team-ohc-jp-place frankingwh giofontana jchraibi global-localhost global19 global19-atlassian-net jeanchlopez ldsanches tonyli71 franciscoclaros rafaeeld5x vg0x0 miiraheart malbrahim-1 aksoydoruk kcokyaman jkeam wwongvg-github itester394 iuztemur christianh814 mkmuralidevops boto-olsson subilmathews1 cmcornejocrespo jalvarez-rh

openshift-ops-workshops's Issues

[Mod 4 - cluster basics] automation for project template says "create" instead of "uses"

https://github.com/openshift/openshift-cns-testdrive/blob/master/tests/defaultproj_template_automation.yml#L9

This should be "to use project request template"

[Mod 2 - install] Add oc adm diagnostics section

[mod 3 - app mgmt] add node selector information

remove prompts in code blocks / examples

Instead of presenting the user with the prompt:

[cloud-user@{{MASTER_HOSTNAME}} ~]$ heketi-cli node list | grep ca777ae0285ef6d8cd7237c862bd591c

Please just have the command:

heketi-cli node list | grep ca777ae0285ef6d8cd7237c862bd591c

This makes copy/paste out of the lab guide much easier, especially for blocks of commands.

switch cns deploy/extend plays to use line replace on support files instead of templates

This provides for a more accurate simulation of what the end user is doing.

[mod cns management] tests should probably upload a file and verify if we want to truly validate

Playbook for installing OCP stops everytime at the task - 'Start and enable iptables service'

using this command to run installer-
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

always stops at the mentioned task and then have to stop and run the playbook again..gets deployed second time though.

IDM server host stops being reachable after 1-1.5 hours of ec-2 provision

create separate Ansible inventories for different lab modules

When adding nodes to the OpenShift cluster or nodes to the CNS topology it's best to use different Ansible inventory files to avoid side-effects of additional groups during OpenShift installation.

[Mod 3 - App Mgmt] add info about default project request template in installer

[infra mgmt] on scaleup might want to show fluentd daemonset auto expansion

installation lab verification fails / checks for wrong node names

- name: Checking status of all the nodes to be 'Ready' command: oc get -o jsonpath='{.status.conditions[?(@.reason=="KubeletReady")].type}' node {{ item }} with_items: - "{{ groups.nodes }}" register: status_of_node failed_when: "'Ready' not in status_of_node.stdout"

@kmurudi we cannot rely on the hostnames from the ansible inventory. They are different to the hostnames used by OpenShift. I suggest we use openshift_facts.yml supplied with openshift-ansible to determine the hostname.

host names not displayed as needed

after installation is complete, the command # oc get nodes gives the IP addresses as names for nodes instead of displaying the given hostnames/instance names.

ldap group sync automation fails

groupsync.yaml gets generated at cloud-init stage but has the LDAP bindDN and baseDN hard as well as the idm URL hard-coded.

Best to deploy in generic form with placeholders and then replace it as part of a post-deploy playbooks that runs at cloud-init stage on the master. Similarly done with the LDAP urls in /etc/ansible/hosts and the inventory_ldap_auth.yml playbook.

add information on openshift namespace and templates

[Mod 4 - cluster basics] tests for quota should delete created project when finished

[Mod 2 - Install] remove excluder instructions

While indicated in the docs, for the purposes of the test drive we don't need to include it.

Reminder: Reset workshopper URLs to production

workshopper URLs are pointing to master branch at the moment in several file locations. Re-route to production before go-live.

Ansible playbook for OCP installation

For the user to be able to use ansible-playbook and access the config.yml playbook from the openshift-ansible git repository, it should be present in the master host. The inventory file is present but the ansible playbook needed to run the advanced installation is not.

Also, two instances present of node01 in the list of EC-2 Instances

[Mod 4 - ldap] verify doesn't actually login as system:admin

https://github.com/openshift/openshift-cns-testdrive/blob/master/tests/ldap-groupsync_verification.yaml#L9

[mod cns management] chicken-egg problem with combined automation/test playbook and adding hosts to group

when running the automation/tests in a single playbook, the inventory is loaded on playbook start. When the test of items in the fact new_cns_nodes_internal_fqdn is executed against the group cns, Ansible hasn't learned of these hosts yet, so this test fails...

cross-links to other Workshopper lab modules are not working

Syntax taken from https://github.com/osevg/workshopper-content/blob/b1d50492022fd7071962507ac2a102a749dbc57c/configmap.adoc does not appear to work:

In this lab, you will replace the environment variables provided in the link:databases[previous labs] and use a ConfigMapinstead to configure thenationalparks application.

[mod installation] add information on installed registry and router

Introduce a WaitCondition handle

We should introduce a WaitConditionHandle in the CFN template to signal CREATE_COMPLETE only when all resources are provisioned and stood up, which includes:

all nodes are online and reachable via SSH
IdM's LDAP service is reachable on port 389
IdM's setup routine has produce ca.crt
lab guide URL is reachable

cns-management_automation.yml failure

The nodes04-06 are not added to the cluster so the play applying labels fails.

TASK [label storage nodes] ****************************************************************************************************************************************************************************************
Wednesday 09 August 2017  20:29:09 +0000 (0:00:00.683)       0:03:28.568 ****** 
failed: [localhost] (item=node04.internal.aws.testdrive.openshift.com) => {
    "changed": true, 
    "cmd": "oc label node/node04.internal.aws.testdrive.openshift.com storagenode=glusterfs", 
    "delta": "0:00:00.221782", 
    "end": "2017-08-09 20:29:09.802963", 
    "failed": true, 
    "item": "node04.internal.aws.testdrive.openshift.com", 
    "rc": 1, 
    "start": "2017-08-09 20:29:09.581181"
}

STDERR:

Error from server (NotFound): nodes "node04.internal.aws.testdrive.openshift.com" not found

Change to using a bind user instead of the admin; create a bind user

The following ldif file should be dropped onto the IDM server during environment provisioning:

dn: uid=system,cn=sysaccounts,cn=etc,dc=auth,dc=internal,dc=aws,dc=testdrive,dc=openshift,dc=com
changetype: add
objectclass: account
objectclass: simplesecurityobject
uid: system
userPassword: bindingpassword

Just after provisioning IDM, and before doing any of the user creation, we should execute the following command:

ldapmodify -x -D 'cn=Directory Manager' -w ldapadmin -f /path/to/sysaccount.ldif

Then, we need to change the /etc/ansible/hosts file that gets deployed to use the above DN and password in place of the existing one.

This will at least prevent us from being locked out entirely when the "too many failed logins" error occurs.

We also need to update /home/cloud-user/groupsymc.yaml as well, as this appears to have auth information in it, too.

[mod 2] configure firewall automation file should be created during environment provisioning

Currently the lab guide asks the user to create the file. This file should ideally be precreated or just present in the repo.

default hosts template should have ldap for auth

If we don't do the installation with LDAP for auth, it means the admin has to re-run the installer (albeit with -t master) later. This is a little bit... awkward.

I am thinking we may wish to install with LDAP auth out of the box.

it won't affect the system:admin special user
verification of installation can then include a simple "oc login" as a user
module 2 can become LDAP setup and group manipulation
module 3 can become CNS installation and configuration

Thoughts?

@cooktheryan
@dmesser

Add more definitive example of quota and limit usage

Currently we just offer to the user to explore the quotas and limits without much real direction. It could be good to add some more specific examples to show the evaluator what happens in practice.

Reminder: Lab Module 5 - Prune Data depends on sample project of Lab Module 4

Lab module 5 "Pruning data" requires that a sample application (currently assuming cake-php) was deployed and updated multiple times in a namespace called "sampleapp" in Lab Module 4 ("Cluster Management Basics")

[Mod 2 - Install] need a note about self-signed console certificate

The lab instructions don't warn the student about accepting the self-signed certificate or why it's happening.

Lab module 1 is not using correct hostnames

Host names in Lab module 1 are generic and hard-coded. The lab guide should state the correct internal hostnames of the respective instances.

[mod 0 - overview] formatting is broken and content needs updating

[Mod 4 - Cluster Management] Project limits and quotas mismatch and result in scheduling failures

automation/testing is not particularly idempotent

Although these things aren't designed to be run more than once... if it fails it means something is really wrong...

[mod 3 - app mgmt] verification probably needs to wait for new deployments before continuing

It's actually not really doing the right thing at present.

[Mod 3 - App Management] should create a project before new-app

Perhaps call it app-management

ldap group sync validation failes

@kmurudi

`TASK [Checking if all the groups have been created by 'oc adm groups sync'] *********************************************
Tuesday 25 July 2017 11:13:28 -0400 (0:00:00.547) 0:00:00.637 **********
failed: [master.unset.ocp-admin.aws.openshifttestdrive.com] (item=ose-users) => {
"changed": true,
"cmd": [
"oc",
"get",
"group",
"ose-users"
],
"delta": "0:00:00.201054",
"end": "2017-07-25 11:13:29.162421",
"failed": true,
"item": "ose-users",
"rc": 1,
"start": "2017-07-25 11:13:28.961367"
}

STDERR:

Error from server (NotFound): groups "ose-users" not found`

Happens after successful execution of ldap_automation.yml

Add master public ip address into /etc/sysconfig/workshopper

Currently we have:

                MASTER_EXTERNAL_FQDN="master.${AWS::AccountId}.${PublicHostedZone}"
                MASTER_INTERNAL_FQDN="master.internal.${PublicHostedZone}"

We probably should also add the master public IP address, since that's what we are going to tell people to SSH into. This also means that, once they find the lab guide, they don't have to worry about going back to the Qwiklab interface to find the IP address.

LDAP auth fails in OpenShift: Unwilling to perform: too many failed logins.

In a fresh, successfully finished deployment I cannot login as an IdM user. It yields "Internal error occurred: unexpected response: 500". In the system logs I can see: "logging error output: "Error: LDAP Result Code 53 Unwilling To Perform: Too many failed logins.".
This behavior is not consistently reproducible but appears every 10-20 deployments. Thoughts?

[Mod 2 - Install tests] break version check into two tests

The current test relies on the output of "oc version" being a guaranteed api:

https://github.com/openshift/openshift-cns-testdrive/blob/master/tests/installation_verify.yml#L17

This should probably be two tests.

move all littered content to support folder

We have content littered in several locations. If a script, file, or etc. needs to be used during the exercises, it should go into the repo in the support folder. For generated files that will be used (like the groupsync config), the write_files section of cloud-init should be relocated such that this repo is cloned first, and then the files are written out.

[Mod 3 - App Management] Don't need to pull image locally

Admonition block error in app management

#125 introduced an error -- the admonition block uses the wrong delineator (- instead of =)

[Mod 3 - App Management] add test application to packer role

https://github.com/openshift/openshift-cns-testdrive/blob/master/packer/roles/registry-mirror/vars/main.yml

We should add docker.io/siamaksade/mapit to the packer role for completeness.

Environment specific tests - how to?

When doing lab automation and verification we need to have access to the environment specific variables. E.g. the device name of the CNS bricks, the default routing suffix for OCP, the name of the project that we create for CNS? Some of this info could be just hard-coded in the lab guide and the automation but we may want to externalize that for easy updates later.

For the lab guide we are writing this info into /etc/sysconfig/workshopper on the guide node. But tests will likely need to be running from the master node, to get /etc/ansible/hosts as a second source for info.

What is the best way to get environment-specific information?