redhat-openshift-ecosystem / operator-certification-operator Goto Github PK

A Kubernetes operator to provision the OpenShift Operator Certification Pipeline.

License: Apache License 2.0

Dockerfile 1.98% Makefile 20.06% Go 77.96%

openshift certification pipeline operator partner

operator-certification-operator's Introduction

operator-certification-operator

A Kubernetes operator to provision resources for the operator certification pipeline. This operator is installed in all namespaces which can support multi-tenant scenarios. Note: This operator should only be used by Red Hat partners attempting to certify their operator(s).

Requirements

The certification operator requires that you have the following tools installed, functional, and in your path.

Install oc, the OpenShift CLI tool (tested with version 4.7.13)
Install tkn, the Tekton CLI tool (tested with version 0.19.1)
Install git, the Git CLI tool (tested with 2.32.0)
The certification pipeline expects you to have the source files of your Operator bundle

Pre - Installation

The below steps exist for simplicity and to make the installation clearer. The operator watches all namespaces, so if secrets/configs/etc already exist in another namespace, feel free to use the existing namespace when following the operator installation steps.

Create a new namespace where the following secrets will be applied.

oc new-project oco

Add the kubeconfig secret which will be used to deploy the operator under test and run the certification checks.

Open a terminal window
Set the KUBECONFIG environment variable

export KUBECONFIG=/path/to/your/cluster/kubeconfig

This kubeconfig will be used to deploy the Operator under test and run the certification checks.

oc create secret generic kubeconfig --from-file=kubeconfig=$KUBECONFIG

Configuring steps for submitting the results

Add the github API token to the repo where the PR will be created

oc create secret generic github-api-token --from-literal GITHUB_TOKEN=<github token>

Add RedHat Container API access key

This API access key is specifically related to your unique partner account for Red Hat Connect portal. Instructions to obtain your API key can be found: here

oc create secret generic pyxis-api-secret --from-literal pyxis_api_key=< API KEY >

Optional pipeline configurations can be found here

Installation Steps

Since this operator isn't in OperatorHub, the process to get it into a cluster is manual at this point. Please follow the below steps to get the operator into your cluster. Follow the steps from here

Execute the Pipeline (Development Iterations)

A pre-requisite to running a pipeline is that a workspace-template.yaml exists in the directory you want to execute the tkn commands from.

To create a workspace-template.yaml

cat <<EOF > workspace-template.yaml
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
EOF

There are multiple ways to execute the Pipeline which can be found here

operator-certification-operator's People

Contributors

Stargazers

Watchers

Forkers

acornett21 itroyano acmenezes bcrochet skattoju yuriolisa dcurran90 rocrisp yashoza19 spabba17 dtrawins crozart

operator-certification-operator's Issues

Conditionally Apply Various Pipelines via the CR

Is your feature request related to a problem? Please describe.

Right now we are applying all of the pipelines ci/hosted/release this could be confusing to partners since they only care about the ci pipeline.

Describe the solution you'd like.

Update the CRD to have a boolean for for each for each pipeline type, something like the below.

applyCiPipeline: true - true by default in our sample
applyHostedPipeline: false false by default in our sample
applyReleasePipeline: false false by default in our sample

Based on what the user provides when they apply the CR, we would only apply the tekton Pipelineresource(s) that the user requested. To keep things simple for the operator we can still apply all the tasks so we do not have to worry about looking in the pipeline yaml files to distinguish which task(s) goes with which pipeline(s).

We would then check for these booleans in the applyManifests code found here.

Describe alternatives you've considered.

Open to other ideas, but these are my initial thoughts.

Additional context.

N/A

Bundle in main does not have all the changes from the code base

Bug Description

The bundle folder is out of sync with the the code/files in other directories

Version and Command Invocation

main

Steps to Reproduce:

Run make bundle
See files change other then the ones intended to be changed

Expected Result

Except that the bundle folder in main has all of the previous changes applied to other files/code in the project

Actual Result

There were more changes present then should be

Additional Context

N/A

bundle build GH Action fails since there is no previous release to grab a version from

build-main relies on getting a previous release.
in case there's no previous release then an empty string gets appended to "-alpha".
bundle build fails on that since "-alpha" is not a semantic version.
need put a default to make it a "0.0.0-alpha" in case there were no releases.

Suppress `already-exists` exception for `operator-ci-pipeline`

Bug Description

When reconcile happens for a 2nd time the below error happens and causes the operator to not reconcile further.

2021-11-11T10:15:14.370-0700    ERROR   controller_operatorpipeline     Couldn't iterate over operator-pipelines yaml manifest files    {"error": "pipelines.tekton.dev \"operator-ci-pipeline\" already exists"}

Version and Command Invocation

sha of code under test b587f2a87e0328c1d910c9cc3fa685c984d76590

Steps to Reproduce:

Run the application
wait for the reconcile to happy for a second time and see failure

Expected Result

I expect that this operator suppress already exists exceptions to not output to the logs/console.

Actual Result

This error is logged and throw back up the stack, thus stopping the reconciliation process

Additional Context

N/A

Remove duplicate `Get` code in `ensureSecret` method

Bug Description

There is duplicate code, where we are calling out to the cluster twice to find a given resource.
Code can be found here.

Version and Command Invocation

Image is image: quay.io/opdev/operator-certification-operator-index:latest
which points to latests of the bundle image.

Steps to Reproduce:

Run the code
See the call to the cluster happening twice for the same resource.

Expected Result

That we only call the cluster once to get a given resource.

Actual Result

We call for an given secret resource twice.

Additional Context

N/A

Create the operator bundle

run make bundle
do the changes needed in manifests and kustomize
uncomment the build steps in GH Action

Add operator-sdk install to Makefile

Is your feature request related to a problem? Please describe.

Since every developer working on this operator could potentially have a different version of operator-sdk installed locally, when make bundle is run, a new version of the SDK is listed in various artifacts. This makes reviews, problematic, but could also cause unintended issues if someone has an extremely old version of the SDK.

Describe the solution you'd like.

Pull down the SDK like other binaries, ie controller-gen. The only issue I see with this approach is that the binaries in the /bin folder in the app do not have a version on them when running controller-gen version since go is pulling and building them manually. Im not sure the right approach to make sure we get a version on the operator-sdk binary.

Logs for context.

╭─acornett at acornett-mac in ~/go/src/github.com/acornett21/operator-certification-operator on set_condition_pryxis_secret✘✘✘
╰─± ./bin/operator-sdk version
operator-sdk version: "unknown", commit: "unknown", kubernetes version: "unknown", go version: "go1.17.3", GOOS: "darwin", GOARCH: "amd64"
╭─acornett at acornett-mac in ~/go/src/github.com/acornett21/operator-certification-operator on set_condition_pryxis_secret✘✘✘
╰─± ./bin/kustomize version
{Version:unknown GitCommit:$Format:%H$ BuildDate:1970-01-01T00:00:00Z GoOs:darwin GoArch:amd64}

Describe alternatives you've considered.

N/A

Additional context.

Below is some quick changes I put together, but like mentioned above, this approach leads to generation with a version appearing as unknown

OPERATOR_SDK = $(shell pwd)/bin/operator-sdk
operator-sdk: ## Download operator-sdk locally if necessary.
	$(call go-get-tool,$(OPERATOR_SDK),github.com/operator-framework/operator-sdk/cmd/[email protected])
	
bundle: operator-sdk manifests kustomize ## Generate bundle manifests and metadata, then validate generated files.
	./bin/operator-sdk generate kustomize manifests -q
	cd config/manager && $(KUSTOMIZE) edit set image controller=$(IMG)
	$(KUSTOMIZE) build config/manifests | ./bin/operator-sdk generate bundle -q --overwrite --version $(RELEASE_TAG) $(BUNDLE_METADATA_OPTS)
	./bin/operator-sdk bundle validate ./bundle

Pipeline must fail the build if the CRD has not been updated

Bug Description

Currently, the pipeline does not check if the CRD has been updated if a new field was added to the go type.

Version and Command Invocation

Steps to Reproduce:

Expected Result

The pipeline must block the PR if the CRD has not been updated.

Actual Result

Additional Context

Update CSV with real information

Is your feature request related to a problem? Please describe.

We definitely need an appropriate icon and description.

Describe the solution you'd like.

We do not have the icon or description at the moment so part of this effort will be to lock them down.
The preflight run may also provide more required updates so watch those results if they are available or we can just create new issues when those results become available.

Add Openshift-Pipelines as a certification-operator dependency

Is your feature request related to a problem? Please describe.

Yes. Operator fails in the absence of the tekton CRDs or when openshift-pipelines are not ready.

Describe the solution you'd like.

Describe alternatives you've considered.

We've considered creating the subscription to openshift-pipelines directly from the certification operator itself but it doesn't build the controller when owning the tekton resources since it doesn't have the CRD in the cluster yet.

Additional context.

Test were made in brand new clusters and it breaks as it is if the openshift-pipelines is not installed.

Linked issues describing it are #58 and #59

Some constant are publicly available

Bug Description

Some constants in the codebase are publicly available. These constants need to be private to limit the risk of exposure.

Version and Command Invocation

Steps to Reproduce:

Expected Result

Actual Result

Additional Context

Operator Fails to Install Due to Service Name Length

Bug Description

When trying to deploy the operator it fails since the length of the operators name exceeds 63 characters.

Version and Command Invocation

main sha 54555f37e4e63884a7ce2c713758528d9e1fb95e

Steps to Reproduce:

run make deploy

Expected Result

Expected the operator to be deployed to a cluster

Actual Result

The operator failed to deploy with the below error

The Service "operator-certification-operator-controller-manager-metrics-service" is invalid: metadata.name: Invalid value: "operator-certification-operator-controller-manager-metrics-service": must be no more than 63 characters

Additional Context

Suggested fix is to update the namespace and namePrefix in config/default/kustomization.yaml file with the follow
certification-operator-

Document usage steps into readme.md

Also consider creating a dev doc for contributors, similar to how it's done in Preflight.

Pyxis secret check always fails, since it is looking for the wrong key in the Secret

Bug Description

Even when the Pyxis secret is in the cluster, the current logic fails to find it and throws an exception.

Version and Command Invocation

sha of code under test b587f2a87e0328c1d910c9cc3fa685c984d76590

Steps to Reproduce:

Add a Pyxis secrete to a cluster oc create secret generic pyxis-api-secret --from-literal pyxis_api_key=< API KEY >
Run the operator

Expected Result

I expect the operator to not throw an exception on the above. And to finish processing successfully.

Actual Result

The operator still throws an exception because it can't find the proper key int he secret.

Additional Context

The below code is incorrect
https://github.com/redhat-openshift-ecosystem/operator-certification-operator/blob/main/controllers/secret.go#L176
the key to check in the map should be PYXIS_API_KEY

also we should update the log to say say pyxis and not kubeconfig so the log has clarity when there is an issue.

Centralized logging

I suggest we take the preflight path and use logrus for logging. We can configure the logging options once and reuse it.

Originally posted by @samira-barouti in #13 (comment)

In a net new cluster operator CrashBackLoops until the Red Hat Openshift Pipelines is installed

Bug Description

If the operator is installed on a cluster that has never had the Red Hat Openshift Pipelines installed on it, the cert operator manager pod will crashback loop until the RH OS Pipelines operator is installed. Once the RH OS Pipelines operator is installed, the cert operator will come up fine and can be installed/uninstalled without the need of the RH OS Pipelines to be present.

Version and Command Invocation

index image from latest

Steps to Reproduce:

Spin up a net new cluster
Create a catalog source for the operator
Install the operator and try to tail the managers logs or see in Installed Operators that the status flips from Success -> Failed -> Installing over and over

Expected Result

I'd expect that we would be able to install this operator in a net new cluster and not have to install the RH OS Pipelines operator first.

Actual Result

Controller manager crashback looped and operator install never was successful

Additional Context

Logs from the controller manager

I1118 23:30:21.298373       1 request.go:668] Waited for 1.046372344s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/batch/v1?timeout=32s
2021-11-18T23:30:22.554Z	ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "Task.tekton.dev", "error": "no matches for kind \"Task\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:127
2021-11-18T23:30:26.354Z	ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "Pipeline.tekton.dev", "error": "no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:127
2021-11-18T23:30:26.354Z	ERROR	controller-runtime.manager.controller.operatorpipeline	Could not wait for Cache to sync	{"reconciler group": "certification.redhat.com", "reconciler kind": "OperatorPipeline", "error": "failed to wait for operatorpipeline caches to sync: no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:195
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:221
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:696
2021-11-18T23:30:26.354Z	ERROR	controller-runtime.manager	error received after stop sequence was engaged	{"error": "leader election lost"}
2021-11-18T23:30:26.354Z	ERROR	setup	problem running manager	{"error": "failed to wait for operatorpipeline caches to sync: no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}
main.main
	/workspace/main.go:117
runtime.main
	/usr/local/go/src/runtime/proc.go:225

Operator should wait after subscription for openshift-pipelines operator to get ready and in succeed phase.

Bug Description

After creating the subscription to the openshift-pipelines operator the certification operator doesn't wait for the first operator to get ready. An error can occur without being notified and/or it may take more time than expected to fully install and break subsequent tasks.

Steps to Reproduce:

It's an intermittent problem may not be always reproducible.

Expected Result

Expect it to make sure that the openshift-pipelines is fully installed and ready to be used.

Actual Result

Although it is intermittent and doesn't happen always.

ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "Pipeline.tekton.dev", "error": "no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:127
2021-11-18T20:42:53.084Z	ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "Task.tekton.dev", "error": "no matches for kind \"Task\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:127
2021-11-18T20:42:53.084Z	ERROR	controller-runtime.manager.controller.operatorpipeline	Could not wait for Cache to sync	{"reconciler group": "certification.redhat.com", "reconciler kind": "OperatorPipeline", "error": "failed to wait for operatorpipeline caches to sync: no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}

Document usage steps into readme.md

Developer Documentation

Add docs directory for developer documentation and start a developer readme.
We stick pretty closely to the standard operator-sdk practices right now but it should still be useful to have included especially if/when things start to diverge

Bundle CSV should point to "latest" operator image instead of specific GH sha

Expected Result

command:
                - /manager
                image: quay.io/opdev/operator-certification-operator:latest

Actual Result

command:
                - /manager
                image: quay.io/opdev/operator-certification-operator:09f54622a8707ebc3e94aaef9a8781cf9bf6cc6d

Operator Fails to Deploy via CatalogSource

Bug Description

The operator fails to deploy via a catalog source. I believe the issue is that the image tag in the bundle does not point to a valid container image, nor does the code checked into the main branch of this repo.

Version and Command Invocation

sha of bundle in quay 5c3f130d9a450b2b43720df594f203b058cc9554

Steps to Reproduce:

Create a CatalogSource with quay.io/opdev/operator-certification-operator-bundle:5c3f130d9a450b2b43720df594f203b058cc9554 as the image
run oc apply
tail logs kubectl logs -f -n openshift-operator-lifecycle-manager olm-operator-<whateveridofrunningpod>

Expected Result

I expect to be able to run this operator via a catalog source so that it shows up in operator hub.

Actual Result

Operator does not appear in the catalog.

Additional Context

CatalogSource

oc apply -f - <<'EOF'
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: operator-certification-operator
  namespace: openshift-marketplace
spec:
  displayName: Operator Certification Operator
  icon:
    base64data: ""
    mediatype: ""
  image: quay.io/opdev/operator-certification-operator-bundle:5c3f130d9a450b2b43720df594f203b058cc9554
  priority: -200
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m0s
EOF

Error


{
  "level": "error",
  "ts": 1637185074.8848817,
  "logger": "controller-runtime.manager.controller.operatorcondition",
  "msg": "Reconciler error",
  "reconciler group": "operators.coreos.com",
  "reconciler kind": "OperatorCondition",
  "name": "operator-certification-operator.v0.0.0-alpha",
  "namespace": "oco",
  "error": "OperatorCondition.operators.coreos.com \"operator-certification-operator.v0.0.0-alpha\" not found",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"
}

Image I believe is trying to be pulled based on code in main
quay.io/opdev/operator-certification-operator:09f54622a8707ebc3e94aaef9a8781cf9bf6cc6d

podman pull of the above

podman pull quay.io/opdev/operator-certification-operator:09f54622a8707ebc3e94aaef9a8781cf9bf6cc6d
Trying to pull quay.io/opdev/operator-certification-operator:09f54622a8707ebc3e94aaef9a8781cf9bf6cc6d...
Error: initializing source docker://quay.io/opdev/operator-certification-operator:09f54622a8707ebc3e94aaef9a8781cf9bf6cc6d: reading manifest 09f54622a8707ebc3e94aaef9a8781cf9bf6cc6d in quay.io/opdev/operator-certification-operator: manifest unknown: manifest unknown

Run preflight on operator-certification-operator

Is your feature request related to a problem? Please describe.

Run preflight on the operator-certification-operator to get initial report so we know what needs to be done for certification

Describe the solution you'd like.

Result of this issue should be new issues to resolve any problems that preflight finds

Rename the Operator

Is your feature request related to a problem? Please describe.

The current name for the operator, Operator Certification Operator, feels a bit redundant and doesn't exactly roll off the tongue. In addition, the length of the name has caused problems with several child resource names being too long and resulting in those resources not being created without a workaround. Several community members have expressed frustration with the name and would like it changed.

Describe the solution you'd like.

Ideally, this should be done before the operator begins to see use by partners.

I propose the following candidates for renaming the operator, in the order presented, but I would love alternatives to these as well.

Certification Operator
OpenShift Certification Operator
Red Hat Certification Operator

Describe alternatives you've considered.

Open to alternatives, as each candidate in the list above has an issue of some sort. For example, Certification Operator could be a bit too vague and the other two do not really solve the length problem, although that has already been solved in the code, so is not a major concern.

Additional context.

Naming software projects is strangely one of the hardest parts of creating good software.

Status is not updated unless all succeed

Status needs to be updated as things succeed. Currently, if anything fails within the status reconciler, it just returns, rather than record the status then return.

Add in the Ability for the Operator to Update OperatorPipelineStatus

Is your feature request related to a problem? Please describe.

Currently the operator only logs errors, and unless a cluster admin is tailing the controller manager logs, the OperatorPipelineStatus isn't reflected properly in the OperatorPipeline CR.

Describe the solution you'd like.

It would be great if the operator could implement OperatorPipelineStatus for any information (secrets/kubeconfigs/etc) that the cluster admin has not provided, so OperatorPipeline status is clearly defined.

Describe alternatives you've considered.

N/A

Additional context.

N/A

Allow usage of GitHub API Token for results submission

The operator should allow the user to provide a secret containing a GitHub API Token for results submission as documented in the Operator Certification CI Pipeline Instructions.

https://github.com/redhat-openshift-ecosystem/certification-releases/blob/main/4.9/ga/ci-pipeline.md#add-a-github-api-token-for-the-repo-where-the-pr-will-be-created

Controller Randomly Throws `panic`

Bug Description

Depending on the order that kubernetes processed an edit or a delete we can sometimes get a panic, since we are not always doing a get for the OperatorPipeline before we try to update the CR.

Version and Command Invocation

main branch

Steps to Reproduce:

I was only able to reproduce this once here were the steps

kubectl apply -f config/samples/certification_v1alpha1_operatorpipeline.yaml
kubectl edit operatorpipelines.certification.redhat.com operatorpipeline-sample and edit an attribute on the CR
immediately after the edit kubectl delete -f config/samples/certification_v1alpha1_operatorpipeline.yaml

Expected Result

I'd expect the controller to process both the edit and delete successfully.

Actual Result

The delete was processed successfully, then the edit threw a panic, since the CR no longer existed in the cluster. The main code that I see being the issue is the updateStatusCondition. We probably need to do a get ie r.Client.Get(ctx, req.NamespacedName, pipeline) before we try to do any other actions in the code and short circuit the method if the error returned is an errors.IsNotFound(err) error.

Additional Context

n/a

Create build and test GH actions

create Github Actions for building, testing, tagging and pushing the operator

Image Streams Status Condition does not Update Properly

Bug Description

If either image stream has been added to the cluster by the operator, the status condition gets stuck on unknown since if we find the image stream we immediately return nil

Version and Command Invocation

Image is image: quay.io/opdev/operator-certification-operator-index:latest
which points to latests of the bundle image.

Steps to Reproduce:

Install the operator in a cluster
run kubectl get operatorpipeline operatorpipeline-sample -o yaml -w
Create a CR
Delete a CR
Create a CR
See that the status is not correct

Expected Result

Expect the below in the output of the CR's yaml

  - lastTransitionTime: "2021-12-08T16:30:28Z"
    message: ""
    reason: ReconcileSucceeded
    status: "True"
    type: CertifiedImageStreamAvailable
  - lastTransitionTime: "2021-12-08T16:30:28Z"
    message: ""
    reason: ReconcileSucceeded
    status: "True"
    type: MarketplaceImageStreamAvailable

Actual Result

Below is output of the CR's yaml

  - lastTransitionTime: "2021-12-08T16:30:28Z"
    message: ""
    reason: ReconcileUnknown
    status: Unknown
    type: CertifiedImageStreamAvailable
  - lastTransitionTime: "2021-12-08T16:30:28Z"
    message: ""
    reason: ReconcileUnknown
    status: Unknown
    type: MarketplaceImageStreamAvailable

Additional Context

N/A

Add support for workspace-template.yaml so the user does not have to clone the operator-pipeline repo

Is your feature request related to a problem? Please describe.

Right now the operator only applies tasks and pipelines to the cluster, this means that a user still has to still clone the repo in order to pass configurations into the tkn command for pipeline to be triggered and executed properly

Describe the solution you'd like.

Since the operator is already cloning the repo, would we be able to create the volumeClaimTemplate so the user didn't have to? And then the user does not have to clone the repo and also pass the volumeClaimTemplate argument to the tkn command.

Describe alternatives you've considered.

As a workout we just document that the user still needs to clone the repo

Additional context.

CMD that the user executes

tkn pipeline start operator-ci-pipeline \
  --param git_repo_url=$GIT_REPO_URL \
  --param git_branch=main \
  --param bundle_path=$BUNDLE_PATH \
  --param env=prod \
  --workspace name=pipeline,volumeClaimTemplateFile=templates/workspace-template.yml \
  --workspace name=kubeconfig,secret=kubeconfig \
  --showlog

This link on volumeClaimTemplate might be helpful.

We also might need to support other files that live in the /templates/ folder in the operator-pipeline repo.

Allow usage of a Red Hat Container API access key for results submission

The operator should allow the user to provide a secret containing a Red Hat Container API access key for results submission as documented in the Operator Certification CI Pipeline Instructions.

https://github.com/redhat-openshift-ecosystem/certification-releases/blob/main/4.9/ga/ci-pipeline.md#add-red-hat-container-api-access-key

Manger does not always pull controller image

Bug Description

Since we are in a pre-release where our bundle/images aren't versions, the manager does not pull the latest container code if the latest tag is already present in the cluster.

Version and Command Invocation

bundle:latest

Steps to Reproduce:

Apply a CatalogSource with latest index image
See manger not pull controller container

Expected Result

I expect the manager to always get the latest container code.

Actual Result

No Pull happens

Additional Context

N/A

Update Docker File and/or Project Dependencies Such that the `as builder` image is not GB of data

Is your feature request related to a problem? Please describe.

Currently when trying to build the container image locally it takes a very log time and alot of space (see additional context below). This means that not all developers are able to build this image and test their work in an iterative way.

Describe the solution you'd like.

I would like the as builder image to be as small as possible so that it can be built with a standard podman vm. I'm not sure if this is due to the docker file and build process, or the dependencies that the project is pulling in. Further investigation is needed.

Describe alternatives you've considered.

N/A

Additional context.

Below are the image sizes of the entire build process, the image marked as <none> is the as builder image.

8:16
─acornett at acornett-mac in ~/go/src/github.com/acornett21/operator-certification-operator on fix_weird_build_issue✘✘✘
╰─± podman images
REPOSITORY                                     TAG         IMAGE ID      CREATED        SIZE
quay.io/opdev/operator-certification-operator  latest      b91df0adf025  2 minutes ago  163 MB
<none>                                         <none>      839a40940b8c  3 minutes ago  3.59 GB
docker.io/library/golang                       1.17        e5993351274b  4 hours ago    963 MB
registry.access.redhat.com/ubi8/ubi-minimal    latest      cc54f67dc341  4 weeks ago    107 MB

Poll for Required CRD's On Operator Installation

Is your feature request related to a problem? Please describe.

Since we are now dependent Red Hat OpenShift Pipelines it takes a good amount of time for OLM to pull in all of the CRD's that we require and can cause our controller pod to CrashbackLoop a few times before our install is successful.

Describe the solution you'd like.

Before a call to start (linked here) add a method to poll for the below GVK's.

{"kind": "Task.tekton.dev", "error": "no matches for kind \"Task\" in version \"tekton.dev/v1beta1\""}
{"kind": "Pipeline.tekton.dev", "error": "no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}

Describe alternatives you've considered.

This is the best I can think of right now. Otherwise a cheap thread.sleep somewhere

Additional context.

Stack trace with error

2021-12-07T18:56:35.576Z	ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "Task.tekton.dev", "error": "no matches for kind \"Task\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:127
2021-12-07T18:56:39.274Z	ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "Pipeline.tekton.dev", "error": "no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:127
2021-12-07T18:56:40.882Z	ERROR	controller-runtime.manager.controller.operatorpipeline	Could not wait for Cache to sync	{"reconciler group": "certification.redhat.com", "reconciler kind": "OperatorPipeline", "error": "failed to wait for operatorpipeline caches to sync: no matches for kind \"Pipeline\" in version \"tekton.dev/v1beta1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:195
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:221
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:696

Add support for private container registries

The operator should support private container registries as documented in the Operator Certification CI Pipeline Instructions.

https://github.com/redhat-openshift-ecosystem/certification-releases/blob/main/4.9/ga/ci-pipeline.md#required-step-will-be-optional-soon---if-using-a-private-container-registry

Refactor repeated code

Refactor repeated code resulting from #16, #17 and #23

Improve usage of context

Make better use of context than the current placeholder Context.TODO() This should help further clean up the code.
The context should be setup in the main reconciler and passed into sub reconcilers..

Add operator-sdk install to Makefile

To have consistency when running make bundle, it should be run against a version that is dictated by the project, not buy what a user has installed on their system. The issue I see with this is that the go-get-tool when placing something in the bin folder does not retain the version of the binary. That is something we would need to overcome

kubeconfig secret needs valid kubeconfig

Currently the kubeconfig secret that is being created is empty. This secret should contain a valid kubconfig that the pipeline can use to provision resources.

Dynamically set version label values as part of the CI

In https://github.com/redhat-openshift-ecosystem/operator-certification-operator/blob/main/Dockerfile the labels version and release need to get values dynamically as part of the GH Action main build and release build.

Deploy the operator certification pipeline resources

The operator needs to deploy the operator certification pipeline resources inline with the instructions in step 6 from the Operator Certification CI Pipeline Instructions document.

https://github.com/redhat-openshift-ecosystem/certification-releases/blob/main/4.9/ga/ci-pipeline.md#step6

Resources created by the operator do not get cleaned up when CR is deleted

Bug Description

The operator creates various resources (pipelines, subscriptions) but not all of them are owned by the custom resource and end up not getting removed when the OperatorPipeline CR is deleted

Steps to Reproduce:

Created OperatorPipeline CR oc apply -f config/samples/certification_v1alpha1_operatorpipeline.yaml
Wait for operator to resolve
Delete OperatorPipeline CR oc delete -f config/samples/certification_v1alpha1_operatorpipeline.yaml
View pipelines (or any other created resources) in the UI and see that they still exist

Expected Result

All resources directly created by the operator get removed

Actual Result

Some resources (pipelines specifically) stick around

Additional Context

Mostly has to do with ownership of the new resources not being set to the OperatorPipeline resource. This will likely not be a standard case because we run yaml from another repo to create most (if not all) of the resources in question.
https://github.com/redhat-openshift-ecosystem/operator-certification-operator/blob/main/controllers/dependencies.go#L24-L76

Name and Version in cluster do not match

Bug Description

The latest bundle does not show the proper Name in the cluster.

Version and Command Invocation

latest

Steps to Reproduce:

Run the below CMD

oc get csv -n openshift-operators
NAME                   DISPLAY              VERSION  REPLACES              PHASE
openshift-pipelines-operator-rh.v1.6.2  Red Hat OpenShift Pipelines    1.6.2   redhat-openshift-pipelines.v1.5.2  Succeeded
operator-certification-operator.v0.0.0  Operator Certification Operator  1.0.0                     Succeeded

Expected Result

I would expect the version number to match in both the name and the version field, this leads to confusion from partners trying to use this operator.

Actual Result

There is a miss match between display/version

Additional Context

N/A

Add support for digest pinning

The operator should support digest pinning as documented in the Operator Certification CI Pipeline Instructions.

https://github.com/redhat-openshift-ecosystem/certification-releases/blob/main/4.9/ga/ci-pipeline.md#digest-pinning-config

Makefile should be more flexible with image naming

As a developer, I should just be able to pass IMAGE_REPO=bcrochet, and it would become quay.io/bcrochet/imagename:latest... Instead of IMAGE_TAG_BASE=quay.io/bcrochet/imagename

update readme/install docs and csv to be more accurate

What is the URL of the document?

Readme and /doc/installation

Which section(s) is the issue in?

We no longer need to reference the installation of the pipelines operator anywhere in our docs

What needs fixing?

Remove unneeded section(s)

Additional context

We also might want to think of how to say 'we can watch any NS that our CR is created in' so things are more clear for a user where to put the secrets.

Suppress Errors for Resources Which are Optional

Is your feature request related to a problem? Please describe.

When creating a CR, seeing errors for optional resources is confusing.

Describe the solution you'd like.

I'd like all errors for optional resources to be suppressed, for now this would be

registry-dockerconfig-secret
github-ssh-credentials

I think we should only check for the secret in the cluster if the user has provided a name in the CR. IE this code should be changed. As well as any other optional checks.

	if operatorPipeline.Spec.GithubSSHSecretName != "" {
		if err := r.ensureSecret(ctx, operatorPipeline.Spec.GithubSSHSecretName, defaultGithubSSHSecretKeyName, meta); err !=nil {
			return err
		}
	}
	
	return nil

We could then get rid of the default constants.

Describe alternatives you've considered.

N/A

Additional context.

Confusing stack trace.

2021-12-08T15:06:46.701Z	ERROR	controller_operatorpipeline	could not find existing secret oco/registry-dockerconfig-secret	{"error": "could not find existing secret"}
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).ensureSecret
	/workspace/controllers/secret.go:193
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).ensureDockerRegistrySecret
	/workspace/controllers/secret.go:171
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).reconcileResources
	/workspace/controllers/resource.go:67
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).Reconcile
	/workspace/controllers/operatorpipeline_controller.go:66
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214
2021-12-08T15:06:46.714Z	INFO	controller_operatorpipeline	Docker Registry Secret not present or correct. Create it to use a private repository.
2021-12-08T15:06:46.714Z	ERROR	controller_operatorpipeline	could not find existing secret oco/github-ssh-credentials	{"error": "could not find existing secret"}
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).ensureSecret
	/workspace/controllers/secret.go:193
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).ensureGithubSSHSecret
	/workspace/controllers/secret.go:185
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).reconcileResources
	/workspace/controllers/resource.go:71
github.com/redhat-openshift-ecosystem/operator-certification-operator/controllers.(*OperatorPipelineReconciler).Reconcile
	/workspace/controllers/operatorpipeline_controller.go:66
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214
2021-12-08T15:06:46.714Z	INFO	controller_operatorpipeline	Github SSH Secret not present or correct. Create it to enable digest pinning.

Bundle Dockerfile shouldn't run Make

Bug Description

As discussed in Certification Operator Check-In meeting Nov 29th, there are two main reasons for running make bundle -

Align any CRD spec, RBAC and Sample CR changes into the CSV and bundle manifests.
Sync the Deployment (manager) of the Bundle with the appropriate Operator image tag, for testing / Preflight.

It was decided that,

1 - actions for updating bundle spec are up to the Developers working on the code/CRD.
Instead of automating this into bundle generation as part of the build, the developer needs to be notified to make the appropriate changes and align the manifests.

2- see #67

Changes to Dockerfile to pass container checks on preflight

Is your feature request related to a problem? Please describe.

Addressing #42 issue and running preflight container check on operator-certification-operator controller image.

Describe the solution you'd like.

To pass the container check, modifications are required to the Dockerfile - add labels, change the base image to ubi, add the license file.

Add Note for Non-Partners to Not Use This Operator

What is the URL of the document?

README and CVS description

Which section(s) is the issue in?

main

What needs fixing?

We need to add a note in both the README as well as the CSV description that is displayed in the Operator Hub that this operator should not be used if you are not a Red Hat partner attempting to certify an operator.