cloudfoundry / capi-k8s-release Goto Github PK

The CF API parts of cloudfoundry/cf-for-k8s

License: Apache License 2.0

Shell 9.64% Dockerfile 0.87% Makefile 1.02% Go 88.44% Procfile 0.02%

cloudfoundry kubernetes k8s ytt kbld kapp

capi-k8s-release's Issues

Cloud Controller deployment names should start with `cloud-controller` instead of `capi`

It is confusing that the Cloud Controller deployments (API, generic worker, clock, and deployment updater) are prefixed with capi- instead of with cloud-controller- or something similar, as Cloud Controller is the component that they run.

These names also do not match the corresponding job names in the CAPI BOSH release (cloud_controller_ng, cloud_controller_worker, cloud_controller_clock, and cc_deployment_updater). Additionally, "CAPI" in a Kubernetes context now typically means the Cluster API, so including capi in the names of these deployments is likely to confuse those already familiar with Kubernetes projects.

Finally, these names should be changed sooner rather than later, before consumers such as cf-for-k8s begin to provide backwards compatibility or high reliability to their users.

EXPOSE port for non-8080 Docker apps not respected

Repo steps:
cf enable-feature-flag diego_docker
cf push console -o splatform/stratos:stable
kubectl describe service -n cf-workloads [app service name]

Stratos app exposes port 5443: https://github.com/cloudfoundry/stratos/blob/7519e6ab5570fc75b471ce6679a75df7d262b3c1/deploy/Dockerfile.all-in-one#L42
Expected Behavior: TargetPort is 5443, per EXPOSE directive in Dockerfile
Actual Behavior: TargetPort is 8080.

Tim says: "the route and destination is created before the docker app is staged. at that point there is no execution metadata so it creates it with the default 8080"
https://app.slack.com/client/T02FL4A1X/threads/thread/C017LDM6KTQ-1603233260.197100

Add app_registry.ca property to trust registries with a self signed cert

We are seeing smoke tests fail with the following error message:

[2020-09-24 17:29:51.27 (UTC)]> cf push cf-for-k8s-smoke-1-app-99029be0dfaba098 -p assets/test-node-app --no-route 
...
...
Waiting for API to complete processing files...
Job (0f30b04d-672f-42b7-abc9-a98c44cff879) failed: An unknown error occurred.
FAILED

The cf-api-server pod contains the following error message (among a few others)

cf-api-server-699d49df87-zqwh4 > package-image-uploader | 2020/09/24 17:29:55 Error from uploadFunc(/tmp/packages/registry_bits_packer20200924-1-1qu1y1d/copied_app_package.zip, harbor.pig.cf-app.com/ci-app-workloads-pr/40b6d190-a263-404b-9865-86836c5b7e5d): Get "https://harbor.pig.cf-app.com/v2/": x509: certificate signed by unknown authority
ccdb-migrate-n6m6k > istio-proxy | 2020-09-24T17:27:10.546624Z	info	sds	resource:ROOTCA connection is terminated: rpc error: code = Canceled desc = context canceled

It looks like the issue might be due to capi not having the app registry CA cert configured. But it also looks like we don’t have a way to do that. Can y'all expose a property to configure the app registry CA please?

`cf push` timeout related to kpack-watcher

As discussed with @piyalibanerjee and @jspawar

We noticed a failed build of smoketests on cf-for-k8s and starting digging in.

We observed that the completion container of the staging pod completed but showed ready=false - which we realized is the expected behavior (even for apps that are successfully pushed).

We next saw the capi-kpack-watcher logs which were surprisingly short (no details about app staging):

2020/03/30 23:36:51 Watcher initialized. Listening...
2020/03/30 23:36:51 [AddFunc] New Build: &{TypeMeta:{Kind:Build APIVersion:build.pivotal.io/v1alpha1} ObjectMeta:{Name:eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw GenerateName:eec29581-395d-4841-adee-d1c37c146684-build-1- Namespace:cf-workloads-staging SelfLink:/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw UID:0d60fe53-0c1b-4c0f-a005-02ee797e8d95 ResourceVersion:1487666 Generation:1 CreationTimestamp:2020-03-30 23:36:48 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[cloudfoundry.org/app_guid:a4bb542b-bf81-4a26-9ff2-28f3734cfd37 cloudfoundry.org/build_guid:5266c24f-dece-49a6-a2b8-e196724c35b2 cloudfoundry.org/source_type:STG image.build.pivotal.io/buildNumber:1 image.build.pivotal.io/image:eec29581-395d-4841-adee-d1c37c146684] Annotations:map[image.build.pivotal.io/reason:CONFIG sidecar.istio.io/inject:false] OwnerReferences:[{APIVersion:build.pivotal.io/v1alpha1 Kind:Image Name:eec29581-395d-4841-adee-d1c37c146684 UID:e22d0931-859e-4e2b-9ce4-7858f4f3a44d Controller:0xc000400079 BlockOwnerDeletion:0xc000400078}] Initializers:nil Finalizers:[] ClusterName: ManagedFields:[]} Spec:{Tags:[gcr.io/cf-relint-greengrass/cf-workloads/eec29581-395d-4841-adee-d1c37c146684 gcr.io/cf-relint-greengrass/cf-workloads/eec29581-395d-4841-adee-d1c37c146684:b1.20200330.233648] Builder:{Image:index.docker.io/cloudfoundry/cnb@sha256:0a718640a4bde8ff65eb00e891ff7f4f23ffd9a0af44d43f6033cc5809768945 ImagePullSecrets:[]} ServiceAccount:cc-kpack-registry-service-account Source:{Git:<nil> Blob:0xc0003ef860 Registry:<nil> SubPath:} CacheName: Env:[] Resources:{Limits:map[] Requests:map[]} LastBuild:<nil>} Status:{Status:{ObservedGeneration:1 Conditions:[{Type:Succeeded Status:False Severity: LastTransitionTime:{Inner:2020-03-30 23:36:50 +0000 UTC} Reason: Message:pods "eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw-build-pod" already exists}]} BuildMetadata:[] Stack:{RunImage: ID:} LatestImage: PodName:eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw-build-pod StepStates:[] StepsCompleted:[]}}

The content of note from that long log line is Reason: Message:pods "eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw-build-pod" already exists, which we've only observed before when trying to restage an existing app, which is currently not supported. (But also is not the case here as this was a new cf push)

Looking more carefully at the capi-kpack-watcher definition, we noticed that it had restarted once. Looking at the timestamps, we see that it came back up at 16:36:51 which is about 10 seconds after the cf push was executed. Looking at the timestamps from the short kpack-watcher logs above, we see that the build pod was created a few seconds before the kpack-watcher came back up.

  capi-kpack-watcher:
    Container ID:   docker://941b7763fec87f9fdadcb4c7d60322f8ca6486b16d7cdde2c803c32b888a01b1
    Image:          cloudfoundry/capi-kpack-watcher:latest
    Image ID:       docker-pullable://cloudfoundry/capi-kpack-watcher@sha256:6e6a3778953aa2b998e933fda726f18f5da89aa6b5738005efc22abd727a69dc
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 30 Mar 2020 16:36:51 -0700
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Mon, 30 Mar 2020 16:33:09 -0700
      Finished:     Mon, 30 Mar 2020 16:36:50 -0700
    Ready:          True
    Restart Count:  1

As far as the kpack-watcher container restarting, we've grabbed the full end of the log here in this gist but here's a good, relevant section:

reason: map[image.build.pivotal.io/reason:CONFIG sidecar.istio.io/inject:false]
E0330 23:36:50.348158       1 runtime.go:73] Observed a panic: runtime.boundsError{x:-1, y:0, signed:true, code:0x0} (runtime error: index out of range [-1])
goroutine 5 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x12102e0, 0xc0000448c0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:69 +0x7b
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:51 +0x82
panic(0x12102e0, 0xc0000448c0)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
capi_kpack_watcher/watcher.(*BuildWatcher).handleFailedBuild(0xc0002ec7e0, 0xc00011e280)
	/capi-kpack-watcher/watcher/build_watcher.go:144 +0x3a7
capi_kpack_watcher/watcher.(*BuildWatcher).UpdateFunc(0xc0002ec7e0, 0x12b1820, 0xc0000e9b80, 0x12b1820, 0xc00011e280)
	/capi-kpack-watcher/watcher/build_watcher.go:48 +0x2cb
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc00005c5c8, 0x41720e, 0x7f1db97fc730)
	/go/pkg/mod/k8s.io/[email protected]
...
f2f3a405f61d/pkg/util/wait/wait.go:69 +0x62
panic: runtime error: index out of range [-1] [recovered]
	panic: runtime error: index out of range [-1]

`cf push` timeout related to capi-api-server

We noticed a failed build of smoke-tests in our CI recently. We dug into it, thinking that it might be another instance of the issue reported in #27, but found that, in this case, the capi-kpack-watcher had not restarted:

  capi-kpack-watcher:
    Container ID:   docker://8c5ae51161a1722dda5ed0c38c774d401914dd05413c0fbb4b2df1a3caeea90e
    Image:          cloudfoundry/capi-kpack-watcher:956150dae0a95dcdf3c1f29c23c3bf11db90f7a0@sha256:67125e0d3a4026a23342d80e09aad9284c08ab4f7b3d9a993ae66e403d5d0796
    Image ID:       docker-pullable://cloudfoundry/capi-kpack-watcher@sha256:67125e0d3a4026a23342d80e09aad9284c08ab4f7b3d9a993ae66e403d5d0796
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 02 Apr 2020 19:16:16 -0700
    Ready:          True
    Restart Count:  0
...

When we dug more into the logs from the capi-kpack-watcher, we saw that it had successfully completed staging the app, but received a 404 when it went to post the update with the image ID back to the API server. When we looked at the logs from the two capi-api-server pods, we saw that one pod had handled the initial POST request (presumably from the CLI) to start staging the app, and the other pod had handled the PATCH request from the watcher to update the app state. Please see this gist for details. To aid with debugging, this test ran with cloudfoundry/cf-for-k8s@6c9f5cd, which uses capi-k8s-release @ 2998092.

please add an OSS license to the repo

Howdy! There are some commercial products that want to package this repo, but cannot until there is an explicit license. It's legally dangerous for anyone to use your code without some form of license on the code. Most CF projects are Apache 2 licensed.

Thanks!

Request Failed 404 from CAPI apiserver : Droplet not found : CF-ResourceNotFound

We deployed cf-for-k8s and performed a scale tests . We pushed source code based apps. We reached till 800 apps . Under such high load we could see the following error in capi-api-server

{"timestamp":1587738150.8981497,"message":"Request failed: 404: {\"errors\"=>[{\"detail\"=>\"Droplet not found\", \"title\"=>\"CF-ResourceNotFound\", \"code\"=>10010, \"test_mode_info\"=>{\"detail\"=>\"Droplet not found\", \"title\"=>\"CF-ResourceNotFound\", \"backtrace\"=>[\"/cloud_controller_ng/app/controllers/v3/application_controller.rb:34:in `resource_not_found!'\", \"/cloud_controller_ng/app/controllers/v3/apps_controller.rb:347:in `droplet_not_found!'\", \"/cloud_controller_ng/app/controllers/v3/apps_controller.rb:322:in `current_droplet'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/base.rb:194:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/rendering.rb:30:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/callbacks.rb:42:in `block in process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:132:in `run_callbacks'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/callbacks.rb:41:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/rescue.rb:22:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/instrumentation.rb:34:in `block in process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `block in instrument'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications/instrumenter.rb:23:in `instrument'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `instrument'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/instrumentation.rb:32:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/params_wrapper.rb:256:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/base.rb:134:in `process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionview-5.2.4.2/lib/action_view/rendering.rb:32:in `process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal.rb:191:in `dispatch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal.rb:252:in `dispatch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:52:in `dispatch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:34:in `serve'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:52:in `block in serve'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:35:in `each'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:35:in `serve'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:840:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/tempfile_reaper.rb:15:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/http/content_security_policy.rb:18:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/callbacks.rb:28:in `block in call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:98:in `run_callbacks'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/callbacks.rb:26:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/debug_exceptions.rb:61:in `call'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/runtime.rb:22:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-
5.2.4.2/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/executor.rb:14:in `call'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/sendfile.rb:110:in `call'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:74:in `block in call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `each'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `call'\", 
\"/cloud_controller_ng/middleware/request_logs.rb:22:in `call'\", 
\"/cloud_controller_ng/middleware/security_context_setter.rb:19:in `call'\", \"/cloud_controller_ng/middleware/vcap_request_id.rb:15:in `call'\",
 \"/cloud_controller_ng/middleware/cors.rb:49:in `call_app'\", 
\"/cloud_controller_ng/middleware/cors.rb:14:in `call'\", 
\"/cloud_controller_ng/middleware/request_metrics.rb:12:in `call'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/builder.rb:244:in `call'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'\", 
\"/usr/local/lib/ruby/gems/2.5.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in 
spawn_threadpool'\"]}}]}","log_level":"info","source":"cc.api","data":{"request_guid":"3b75e731-1246-
4c94-b2a0-b7300d9359e7::02a69280-4037-4b1a-bec8-
aef5a674c325"},"thread_id":47075637335140,"fiber_id":47075631752220,"process_id":1,"file":"/cloud_c
ontroller_ng/app/controllers/v3/application_controller.rb","lineno":178,"method":"handle_exception"}

cf-for-k8s Version: https://github.com/cloudfoundry/cf-for-k8s/releases/tag/v0.1.0

`capi_kpack_watcher` and `nginx` images should be pinned

shoutout to Adrian for catching this

The unpinned images are here

yeah we should totally do that
- Selzo

Missing configuration values

We tried to use this helm chart together with quarks.

We noticed that for this use case several configuration values are missing:

uaa.internal_url in ccng-configmap.yaml

internal_url: {{ .Values.uaa.internal_url | default "http://uaa.{{ .Release.Namespace }}.svc.cluster.local:8080" }}

blobstore.signature_version in ccng-configmap.yaml

aws_signature_version: {{ .Values.blobstore.signature_version | default "2" | quote }}

blobstore.region in ccng-configmap.yaml

region: {{ .Values.blobstore.region | quote }}

nameserver in api_server_deployment.yaml (this is only required to resolve BOSH DNS names like uaa.service.cf.internal. In a kubernetes native environment, this is not really required)

      {{- if .Values.nameserver }}
      dnsConfig:
        nameservers:
        - {{ .Values.nameserver }}
        options:
        - name: ndots
          value: "5"
        searches:
        - {{ .Release.Namespace }}.svc.cluster.local
        - svc.cluster.local
        - cluster.local
        - service.cf.internal
      dnsPolicy: None
      {{- end }}

Configuring GitBot is recommended

Pivotal provides the GitBot service to synchronize pull requests and/or issues made against public GitHub repos with Pivotal Tracker projects. This service does not track individual commits.

If you are a Pivotal employee, you can configure Gitbot to sync your GitHub repo to your Pivotal Tracker project with a pull request. An ask+rd@ ticket is the fastest way to get write access if you get a 404 to the config repo.

If you do not want have pull requests and/or issues copied from GitHub to Pivotal Tracker, you do not need to take any action.

If there are any questions, please reach out to [email protected].

Kpack Staging pod is created in the system namespace

System and user components should live in different namespaces to simplify security (for example with Networking policies).
Right now, the staging pod is created in cf-system namespace.

capi-k8s-release should follow cf-for-k8s best practices for generating, storing, and reading secrets

see Credhub and cf-for-k8s Password Generation

We need to make an effort to get capi-k8s-releases' secrets out of its big global configmap. Part of doing this will be removing unnecessary TLS certificates out of capi-k8s-release, but there are more secrets hiding in the configmap:

cloudfoundry/cf-for-k8s#228 CAPI UAA client secret is stored as a plain text in value in cloud-controller-ng-yaml configmap
cloudfoundry/cf-for-k8s#227 The CC blobstore access key is stored as a plain text value to cloud-controller-ng-yaml configmap
cloudfoundry/cf-for-k8s#226 The CC database password is passed as a plain text value to cloud-controller-ng-yaml config
cloudfoundry/cf-for-k8s#225 UAA client password is passed as a plain text to cf-api-kpack-watcher

I think we'd also need to handle the database encryption keys.

Readiness probe for ccng workers and registry-buddy are not working

Currently, all pods except cf-api-server have TCP readiness probes. They are not working properly because of few reasons:

The probe are listening for 127.0.0.1. However, Kubernetes uses PodIP:port to reach probe, so if the server listens on 127.0.0.1 they won't accept connections from outside.
The probes for ccng jobs stop working when the job finished. When it happens Kubernetes will pods not ready.

To deploy cf-for-k8s without Istio you can deploy cf-for-k8s#contour-ingress branch in a standard way and ensure that networking.ingress_solution_provider is set to contour (should be a default in that branch).

Restaging a buildpack app fails

Steps to reproduce:

Target the CF CLI to a cf-for-k8s deployment
$ cf push catnip
$ cf restage catnip

Relevant snippet from a CF_TRACE=1 cf restage catnip:

Staging app and tracing logs...
REQUEST: [2020-03-24T22:54:42-07:00]
POST /v2/apps/cf328831-b2b6-4f8b-a633-22d0b525a77c/restage HTTP/1.1
...

RESPONSE: [2020-03-24T22:54:42-07:00]
HTTP/1.1 500 Internal Server Error
...
{
  "code": 10001,
  "description": "An unknown error occurred.",
  "error_code": "UnknownError"
}

On the API pod, we see this error log:

{"error_code"=>"UnknownError", "description"=>"An unknown error occurred.", "code"=>10001, "test_mode_info"=>{"description"=>"images.build.pivotal.io \"9fa6af81-8366-4369-a69d-7f6d0d596688\" already exists", "error_code"=>"CF-HttpError", "backtrace"=>[...]}}

Formatted backtrace:

[
  "/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:130:in `rescue in handle_exception'",
  "/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:120:in `handle_exception'",
  "/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:381:in `create_entity'",
  "/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:236:in `block (2 levels) in define_entity_methods'",
  "/cloud_controller_ng/lib/kubernetes/kpack_client.rb:10:in `create_image'",
  "/cloud_controller_ng/lib/cloud_controller/kpack/stager.rb:13:in `stage'",
  "/cloud_controller_ng/app/actions/build_create.rb:77:in `create_and_stage'",
  "/cloud_controller_ng/app/actions/v2/app_stage.rb:23:in `stage'",
  "/cloud_controller_ng/app/controllers/runtime/restages_controller.rb:38:in `restage'",
...
]

Droplet upload and download don't work

The endpoints responsible for droplet upload and download are not functional. At the time of writing, there is no v3 droplet download endpoint.

To reproduce:

cf download-droplet appName --path /tmp/tar.gz
cf push appName --droplet /tmp/tar.gz

CAPI performance issues at scale

One of our (cf-k8s-networking team) goals for GA is for the networking data and configuration planes to operate performantly at a scale of 1,000 routes and 2,000 AIs. To that end we started doing some scaling tests. In doing so we discovered some issues with the capi that we thought we'd bring to your attention.

In a space with 1,000 apps and 1,000 external routes:

cf apps times out after 60 seconds. The CLI makes a single request to /v2/spaces/<guid>/summary that eventually times out with an nginx error.

cf v3-apps hangs seemingly forever. When we use -v we see a constant stream of shorter requests, the ones to /v3/processes/<guid>/stats seem to take a while but not long enough to cause a timeout.

cf app <appname> fails after 3 minutes. It attempts /v3/processes/<guid>/stats 3 times and times out each time.

cf delete works great 😄

cf routes takes 20 seconds.

Some interesting things we found:

log-cache client currently will do three retries with 0.1 seconds of sleep between each retry. This might add a little bit since it's not working yet (we think).
instances reporter has a gobal workpool. We noticed that once we saturate an API with a Space Summary request it causes subsequent individual requests to /v3/processes/<guid>/stats to time out. Before the Space Summary request these took about ~1 second but seemed to succeed.
Eirini's way of fetching instance stats seems less performant than just hitting the Diego BBS

cc @tcdowney @rosenhouse @ndhanushkodi @rodolfo2488

runtime error: index out of range [-1] - capi-kpack-watcher

We deployed cf-for-k8s and performed a scalability tests. We pushed source code based applications and reached till 800 apps. During our performance tests in we could see lot of restarts on capi-kpack watcher deployment. When I stream the logs I could see the following error message

steps:  [prepare detect]

2020/04/24 14:26:45 [UpdateFunc] Update to Build: 19ff6d40-dc57-429b-b6f3-a7bf2c4cf77c-build-1-9b4pt
status: (v1alpha1.Status) {
 ObservedGeneration: (int64) 1,
 Conditions: (v1alpha1.Conditions) (len=1 cap=1) {
  (v1alpha1.Condition) {
   Type: (v1alpha1.ConditionType) (len=9) "Succeeded",
   Status: (v1.ConditionStatus) (len=5) "False",
   Severity: (v1alpha1.ConditionSeverity) "",
   LastTransitionTime: (v1alpha1.VolatileTime) {
    Inner: (v1.Time) 2020-04-24 14:26:45 +0000 UTC
   },
   Reason: (string) "",
   Message: (string) (len=82) "pods \"19ff6d40-dc57-429b-b6f3-a7bf2c4cf77c-build-1-9b4pt-build-pod\" already exists"
  }
 }
}

steps:  []

E0424 14:26:45.169707       1 runtime.go:73] Observed a panic: runtime.boundsError{x:-1, y:0, signed:true, code:0x0} (runtime error: index out of range [-1])
goroutine 20 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x12102e0, 0xc001bb5d00)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:69 +0x7b
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:51 +0x82
panic(0x12102e0, 0xc001bb5d00)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
capi_kpack_watcher/watcher.(*BuildWatcher).handleFailedBuild(0xc0002ce7e0, 0xc00011c280)
	/capi-kpack-watcher/watcher/build_watcher.go:144 +0x3a7
capi_kpack_watcher/watcher.(*BuildWatcher).UpdateFunc(0xc0002ce7e0, 0x12b1820, 0xc0001c6280, 0x12b1820, 0xc00011c280)
	/capi-kpack-watcher/watcher/build_watcher.go:48 +0x29d
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc00006bdc8, 0x41720e, 0x7f1e5c0bcd30)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:605 +0x188
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc0002fbdd8, 0x0, 0xc00006bde8)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:284 +0x51
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:601 +0x79
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00006bf40)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0002fbf40, 0xdf8475800, 0x0, 0x42d101, 0xc0002ec000)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0002bda80)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:599 +0x9b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0002cc4c0, 0xc0002de000)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x59
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:69 +0x62
panic: runtime error: index out of range [-1] [recovered]
	panic: runtime error: index out of range [-1]

goroutine 20 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:58 +0x105
panic(0x12102e0, 0xc001bb5d00)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
capi_kpack_watcher/watcher.(*BuildWatcher).handleFailedBuild(0xc0002ce7e0, 0xc00011c280)
	/capi-kpack-watcher/watcher/build_watcher.go:144 +0x3a7
capi_kpack_watcher/watcher.(*BuildWatcher).UpdateFunc(0xc0002ce7e0, 0x12b1820, 0xc0001c6280, 0x12b1820, 0xc00011c280)
	/capi-kpack-watcher/watcher/build_watcher.go:48 +0x29d
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc00006bdc8, 0x41720e, 0x7f1e5c0bcd30)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:605 +0x188
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc0002fbdd8, 0x0, 0xc00006bde8)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:284 +0x51
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:601 +0x79
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00006bf40)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0002fbf40, 0xdf8475800, 0x0, 0x42d101, 0xc0002ec000)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0002bda80)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:599 +0x9b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0002cc4c0, 0xc0002de000)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x59
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:69 +0x62

Kpack build CRD and pod stay in cf-system namespace after the application is deleted

Steps to reproduce:

cf push with broken staging (ie wrong repository)
cf delete app

Expected:

No strange pods in the staging namespace

Actual:

Leftover pod from staging + leftover build object with "Failed" status

Env variables are not passed to kpack builds

Summary

Some buildpacks require environment variables for configuration. But they are not passed to the kpack build

As you can see in the code no environment variables are passed.

Reproduction Steps

Push a java application with $BP_JAVA_VERSION=8."

Expected behavior

What was the expected result?

The configuration variable is taken into account.

CF API returns 500 if UAA server certs secret not provided

As part of our efforts to eliminate the internal certificate from cf-for-k8s, we tried to deploy without providing a value for the uaa.serverCerts.secretName property. While everything deployed successfully, we found that we received an error from the cf create-org command when running our smoke-tests. The logs from the cf-api-server pod showed this stack trace:

Unexpected error while processing request: system lib
        /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:59:in `add_file'
        /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:59:in `call'
        /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:59:in `block (2 levels) in <class:Store>'
        /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:245:in `add_trust_ca_to_store'
        /usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:236:in `add_trust_ca'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:127:in `http_client'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:116:in `fetch_uaa_issuer'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:111:in `block in uaa_issuer'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:133:in `with_request_error_handling'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:110:in `uaa_issuer'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:85:in `decode_token_with_key'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:66:in `block in decode_token_with_asymmetric_key'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:65:in `each'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:65:in `decode_token_with_asymmetric_key'
        /cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:29:in `decode_token'
        /cloud_controller_ng/lib/cloud_controller/security/security_context_configurer.rb:24:in `decode_token'
        /cloud_controller_ng/lib/cloud_controller/security/security_context_configurer.rb:10:in `configure'
        /cloud_controller_ng/middleware/security_context_setter.rb:12:in `call'
        /cloud_controller_ng/middleware/vcap_request_id.rb:15:in `call'
        /cloud_controller_ng/middleware/cors.rb:49:in `call_app'
        /cloud_controller_ng/middleware/cors.rb:14:in `call'
        /cloud_controller_ng/middleware/request_metrics.rb:12:in `call'
        /usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/builder.rb:244:in `call'
        /usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'
        /usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'
        /usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'
        /usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'
        /usr/local/lib/ruby/gems/2.5.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in spawn_threadpool'

From what we can tell, it looks like the UAA server cert is still actually required by the CC code, even though it is optional in the K8s templates. Given that we are already configuring CAPI with a plain-text http URL to talk to UAA, we would like to also remove the need for the internal certificate in the configuration.

Please feel free to reach out if you have any questions.

Regards,
Dave and @acosta11

cf stacks should show proper description for cflinux3 stack

Describe the bug

cf stacks should show proper stack descriptions

To Reproduce

Install cf-for-k8s
Run cf stacks and see the output

name         description
cflinuxfs3   test cflinuxfs3 entry

Expected behavior

Expect to see the actual stack description, or nothing if none is available

This might seems a little nitpicky so I apologize, but I suspect someone will raise it sooner than later.

cf delete-org fails when org has Docker apps pushed to the org

Our networking acceptance tests have been failing since Oct 8th because some cf commands such as cf delete-org fail and we get the following message:

$ cf delete-org o
Really delete the org o, including its spaces, apps, service instances, routes, private domains and space-scoped service brokers? [yN]: y
Deleting org o as admin...
Job (8da85e84-a9db-4faa-9a9b-6614f75a3239) failed: An unknown error occurred.
FAILED

When I look at the cf-api-worker logs I see the following error:

{"timestamp":"2020-10-15T20:40:30.017852337Z","message":"Request failed: 500: {\"error_code\"=>\"UnknownError\", \"description\"=>\"An unknown error occurred.\", \"code\"=>10001, \"test_mode_info\"=>{\"description\"=>\**"Package type must be bits\"**, \"error_code\"=>\"CF-RuntimeError\", \"backtrace\"=>[\"/workspace/app/models/runtime/package_model.rb:46:in `bits_image_reference'\" ...
{"timestamp":"2020-10-15T20:40:30.027996351Z","message":"2020-10-15T20:40:30+0000: [Worker(cf-api-worker-7497f88cd7-t6nlz)] Job organization.delete (id=19) (queue=cc-generic) FAILED (0 prior attempts) with RuntimeError: Package type must be bits","log_level":"error","source":"cc-worker","data":{},"thread_id":47038285276660,"fiber_id":47038326581520,"process_id":1,"file":"/layers/paketo-community_bundle-install/gems/gems/delayed_job-4.1.8/lib/delayed/worker.rb","lineno":285,"method":"say"}
{"timestamp":"2020-10-15T20:40:30.028442210Z","message":"2020-10-15T20:40:30+0000: [Worker(cf-api-worker-7497f88cd7-t6nlz)] Job organization.delete (id=19) (queue=cc-generic) FAILED permanently because of 1 consecutive failures","log_level":"error","source":"cc-worker","data":{},"thread_id":47038285276660,"fiber_id":47038326581520,"process_id":1,"file":"/layers/paketo-community_bundle-install/gems/gems/delayed_job-4.1.8/lib/delayed/worker.rb","lineno":285,"method":"say"}

For context, my org o has Docker apps deployed to it. I then tried creating another org test2 with buildpack apps deployed to it and cf delete-org works:

$ cf delete-org test2
Really delete the org test2, including its spaces, apps, service instances, routes, private domains and space-scoped service brokers? [yN]: y
Deleting org test2 as admin...
OK

Thanks

Scale out of cf-api-server doesn't work beyond 5 replicas in cf-for-k8s

We did a scalability tests on cf-for-k8s v-0.6.0. We started with one replica of cf-apiserver and started pushing buildpack based apps. After certain number of apps we could see cf-apiserver fails with the following error

Waiting for app to start...
Unexpected Response
Response code: 503
CC code:       0
CC error code:
Request ID:    f4f086f1-0ae3-4f07-9f0d-471c6b8028fa::be866d01-d450-4acb-87f1-f504de7e92e0
Description:   {
  "description": "Instances information unavailable: No running instances",
  "error_code": "CF-InstancesUnavailable",
  "code": 220002
}

500s' get piled up then if increase the replicas ,it takes sometime to starting working again. You can correlate with the following graph.

Again after certain apps push succeeded, it failed with the same error again. In the same phase we increased replicas till 5 and able to push till 1600 Application instances beyond which scaling doesn't help.

Just for the note: we used the default configurations for the capi deployment.

CF spaces should have their own k8s namespaces

see cloudfoundry/eirini#90

Today, CF puts all LRPs in the cf-workloads namespace. If we ever wanted to provide any sort of k8s API access to developers or managers, we'll need to have some tenant separation in the k8s API. Namespaces are the mechanism to do that inside of a cluster.

First, Eirini could allow namespace-specific LRP scheduling. Then, capi-k8s-release could add code for managing namespaces via the CF /v3/spaces API.

Nginx `client_max_body_size`

We tried to use this helm chart together with quarks.

While running a cf push it turned out that the nginx configuration is not capable to forward requests with sizes which exceeds 1MB.

Therefore it's necessary to add the the client_max_body_size configuration value.

location / {
  client_max_body_size       100m;
  access_log  /cloud_controller_ng/nginx-access.log;
  ...

cf push hangs on 'Instance starting...' when trying to push a Docker Image app that runs as root

When pushing a Docker Image that does not have a USER instruction, or specifies root, with cf push app -o <app-image>, the CLI hangs with the following:

cf push nginx -o nginx                     
Pushing app nginx to org o / space s as admin...

Staging app and tracing logs...

Waiting for app nginx to start...

Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...

The behavior repeats until timeout.

cf logs --recent shows the following:

cf logs nginx --recent
Retrieving logs for app nginx in org o / space s as admin...

   2020-10-06T15:44:17.00-0700 [API/0] OUT Creating droplet for app with guid e8cc8497-c414-4233-9cdf-a3e26764501d
   2020-10-06T15:44:17.00-0700 [API/0] OUT Updated app with guid e8cc8497-c414-4233-9cdf-a3e26764501d ({:droplet_guid=>"60684061-1dca-47bd-be8d-bfc60ad9dd45"})
   2020-10-06T15:44:23.00-0700 [API/0] OUT Process has crashed with type: "web"
   2020-10-06T15:44:23.00-0700 [API/0] OUT App instance exited with guid e8cc8497-c414-4233-9cdf-a3e26764501d payload: {"instance"=>"nginx-s-7820289c67-0", "index"=>0, "cell_id"=>"", "reason"=>"CreateContainerConfigError", "exit_description"=>"container has runAsNonRoot and image will run as root", "crash_count"=>0, "crash_timestamp"=>0, "version"=>"a3d1920d-6334-4073-a51c-fc0d02b4d63d"}

I would expect the CLI to provide a message that the container cannot be run, and return.

CC: @paulcwarren

cf create-service-key does not work

The cf create-service-key workflow does not exist. The cli returns 500, "An unknown error occurred.".

The server logs make it look like the config key cc_service_key_client_name is not present in the cloud-controller-ng-yaml config map. Digging a bit further, the cc_service_key_client_secret key is also not present (which we believe would be required for this workflow).

The full log message:

{
  "timestamp": "2020-11-03T18:16:20.960421254Z",
  "message": "Request failed: 500: {\"error_code\"=>\"UnknownError\", \"description\"=>\"An unknown error occurred.\", \"code\"=>10001, \"test_mode_info\"=>{\"description\"=>\"\\\"cc_service_key_client_name\\\" is not a valid config key\", \"error_code\"=>\"CF-InvalidConfigPath\", \"backtrace\"=>[\"/workspace/lib/cloud_controller/config.rb:181:in `invalid_config_path!'\", \"/workspace/lib/cloud_controller/config.rb:125:in `block in valid_config_path?'\", \"/workspace/lib/cloud_controller/config.rb:121:in `each'\", \"/workspace/lib/cloud_controller/config.rb:121:in `valid_config_path?'\", \"/workspace/lib/cloud_controller/config.rb:111:in `get'\", \"/workspace/lib/cloud_controller/dependency_locator.rb:310:in `credhub_client'\", \"/workspace/lib/cloud_controller/dependency_locator.rb:256:in `service_key_credential_object_renderer'\", \"/workspace/lib/cloud_controller/controller_factory.rb:46:in `block in fetch_dependencies'\", \"/workspace/lib/cloud_controller/controller_factory.rb:45:in `map'\", \"/workspace/lib/cloud_controller/controller_factory.rb:45:in `fetch_dependencies'\", \"/workspace/lib/cloud_controller/controller_factory.rb:36:in `dependencies_for_class'\", \"/workspace/lib/cloud_controller/controller_factory.rb:15:in `create_controller'\", \"/workspace/lib/cloud_controller/rest_controller/routes.rb:16:in `block in define_route'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1635:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1635:in `block in compile!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:987:in `block (3 levels) in route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1006:in `route_eval'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:987:in `block (2 levels) in route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1035:in `block in process_route'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1033:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1033:in `process_route'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:985:in `block in route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:984:in `each'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:984:in `route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1098:in `block in dispatch!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `block in invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1095:in `dispatch!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:919:in `block in call!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `block in invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:919:in `call!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:908:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/xss_header.rb:18:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/path_traversal.rb:16:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/json_csrf.rb:26:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/base.rb:50:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/base.rb:50:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/frame_options.rb:31:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/null_logger.rb:11:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/head.rb:12:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:194:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1951:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/urlmap.rb:74:in `block in call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/urlmap.rb:58:in `each'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/urlmap.rb:58:in `call'\", \"/workspace/middleware/request_logs.rb:38:in `call'\", \"/workspace/middleware/security_context_setter.rb:19:in `call'\", \"/workspace/middleware/vcap_request_id.rb:15:in `call'\", \"/workspace/middleware/cors.rb:49:in `call_app'\", \"/workspace/middleware/cors.rb:14:in `call'\", \"/workspace/middleware/request_metrics.rb:12:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/builder.rb:244:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in spawn_threadpool'\"]}}",
  "log_level": "error",
  "source": "cc.api",
  "data": {
    "request_guid": "159af437-3a09-4a90-b2b7-c65ba4dae0b8::b47a3efd-10f4-40c4-97f5-6674457ad0e7"
  },
  "thread_id": 47124571135600,
  "fiber_id": 47124564557040,
  "process_id": 1,
  "file": "/workspace/lib/sinatra/vcap.rb",
  "lineno": 45,
  "method": "block in registered"
}

cc @emmjohnson

Suggested tweaks to CF for K8s configuration

Greetings CAPI Friends! We bring glad tidings from the Release Integration team.

While merging cloudfoundry/cf-for-k8s#63, we noted a few items worth talking through but not so crucial as to block this PR.

Requirement: reference the cnb image using its SHA-256 digest
- This affixes the version of the dependency:
  - ensures that regressions are not introduced indirectly
  - supports better chain-of-custody of assets incorporated into CF for K8s.
- Note that we created an overlay to pin to the 0.0.55-bionic tag as a stopgap; please remove that overlay when you pin to the appropriate digest.
Suggestion: in config/values.yml move kpack key underneath the capi key.
- The capi-k8s-release component owns the kpack dependency.
Suggestion: review the value specified in the blobstore region (mentioned in a conversation on #63).
Suggestion: until kpack is shown to be useable with DockerHub as its backing store, remove references to it in your README.md.

Cheers!

cc: @cloudfoundry/cf-release-integration

Make defaultIdentityProvider configurable

In order to transparently use an external identity provider, we would like to set

login: 
  defaultIdentityProvider: myIdP

Would you accept a PR that exposes this property?

[RFC #0002] Interacting with OCI Image Registries from cloud_controller_ng

Feature Name: Interacting with OCI Image Registries from cloud_controller_ng

Type: Feature

Author: CAKE team

Related components: cloud_controller_ng, capi-k8s-release

Summary

In cf-for-k8s we are now using an OCI image registry to store package uploads instead of a blobstore (see: cloudfoundry/cf-for-k8s#409). Currently only package uploads are supported, and other workflows such as deleting a package and [copying a package](https://v3-apidocs.cloudfoundry.org/version/3.89.0/index.html#copy-a-package] are unsupported.

Our initial goal is to support package deletes, but it would be nice if our solution is general enough to support other operations. At the minimum it should also support copying packages, but we suspect we might need to also support operations on droplets (e.g. deleting a droplet should delete the image in the registry).

Motivation

Prevent disk usage in the registry from growing unbounded
Currently the background jobs around deleting package “blobs” fail and will result in errors, we want them to succeed!

Proposal

Update the package-image-uploader

The package-image-uploader is a micro-service that uses the go-containerregistry package[1] to interact with the OCI registry. Currently, it’s deployed as a container in the cf-api-server pods and is used for converting an uploaded zip file into a container image that is uploaded to the registry.

We propose that we enhance (and rename) the package-image-uploader to support other operations by adding additional endpoints for deleting, copying, etc.

We propose the following means of deploying this new flavor of the package-image-uploader:

As a container colocated with each pod that needs it and is only exposed on localhost
- Leaning towards this
As a separate pod
Leaving it on the cf-api-server pod and exposing its port to other pods

Benefits

We have already done the scaffolding for the Go server for this
It is already configured to work with the same credentials that we use with kpack
It uses the go-containerregistry library[1] 🙂
CCNG is already structured in a way that lends itself to preferring to interact with external services for operations regarding/surrounding app packages (e.g. needing to talk to an external blobstore service, the bits service)
Wouldn’t have to take on any additional operational complexity in our automation to figure out how to to package and deploy some new thing
- This is because we’re already building the package-image-uploader image from source with a buildpack and it’s a pretty simple+straightforward process

Drawbacks

We need to make it accept requests from other Pods (currently it only listens on localhost)
- This is only a drawback if we don't colocate the container with all pods that need it
We need to add security so that only Pods we trust can hit its endpoints
- This is only a drawback if we don't colocate the container with all pods that need it

Alternatives

Write our own Ruby client for the OCI registry and use it in CCNG’s code

We could write our own client for the OCI registry that does what we need to do.

Benefits:

This client would be tailored entirely to our needs
We could invoke it as needed from the Ruby code in CCNG
Wouldn’t have to take on any additional operational complexity in our automation to figure out how to to package and deploy some new thing
- This is because we’re already building the CCNG image from source with a buildpack and it’s a pretty simple+straightforward process

Drawbacks:

The OCI Registry API is non-trivial, we’d have to develop expertise in integrating with it
We would need to figure out how to support the same authentication mechanisms that kpack supports
This could require a lot of development work since we would have to reimplement everything ourselves
We’d be directly responsible for integration testing this client we make against multiple, real OCI registries
- Would have to make something like our blobstore fanout tests but with OCI registries and we probably don’t want another of those
- Using a library somebody else maintains offloads a lot of this burden of testing to those library maintainers

Use a Ruby gem for interacting with the OCI registry

We could use a gem like https://github.com/deitch/docker_registry2 to interact with the registry from within the CCNG codebase.

Benefits:

We already support switching the blobstore client we use (fog vs webdav). We could write a similar registry one that implements the “blobstore client interface” and use this.
Wouldn’t have to take on any additional operational complexity in our automation to figure out how to to package and deploy some new thing
- This is because we’re already building the CCNG image from source with a buildpack and it’s a pretty simple+straightforward process

Drawbacks:

The gem is mostly unsupported
The gem technically implements the Docker Registry v2 API spec, it is unclear to us how much this diverges from the OCI Registry API spec
- Maybe this doesn’t matter for deletes?
- Could matter if we want to use a similar mechanism for copying packages
The gem may require credentials to be provided differently from what kpack (which uses https://github.com/google/go-containerregistry)[1] supports
* It appears to only support password-based authentication
- This could make it difficult to support other authentication methods in the future

Shell out to a binary utility that uses go-containerregistry to interact with the OCI registry from CCNG’s code

We write a Golang utility that leverages the https://github.com/google/go-containerregistry library[1]. Cloud Controller can shell out to this binary when interacting with the registry in places that it would interact with the blobstore.

Benefits:

We shell out to binaries in other places… mainly when zipping and unzipping files.
This could be operationally simpler than an additional micro-service
Don’t need to worry about authentication to the service or troubleshooting it separately
It uses the go-containerregistry library[1] 🙂

Drawbacks:

We already have the package-image-uploader running as a container on the cf-api-server deployment. It knows how to upload images to the registry and could be enhanced to do other operations.
- We would need to enhance this to accept requests on non-localhost
- We would need to add some security mechanisms (NetworkPolicy + Istio AuthorizationPolicy could suffice), right now it accepts requests from anything since it is only reachable from within the cf-api-server Pod
Compiling and packaging this binary into our images is going to be far from trivial with or without using buildpacks to compile our CCNG image
- This is one of the main reasons we made the package-image-uploader in fact
Shelling out to a binary could make it more difficult to surface errors to the user
- Managing children processes is also generally a hassle (e.g. tracking PIDs, figuring out when processes are stuck, parsing their output, etc.)

Notes

This library is often called out since it is the most preferred means of interacting with an OCI registry and OCI image since we have over time, through usage in our projects (e.g. cf-api-controllers) and other CF-related projects (e.g. kpack, lifecycle), to be the most comprehensive library which implements the OCI spec

staging can't find a gcr.io image

Using cf-for-k8s develop with capi-k8s-release commit d84e4bf

Trying to push stratos as a docker app on cf-for-k8s often results in this staging error:

$ cat manifest.yml
applications:
  - name: console
    memory: 1512M
    disk_quota: 1024M
    host: console
    timeout: 180
    docker_image: nwmac/stratos:eirini
    health-check-type: port
$ cf push -f manifest.yml
...
Waiting for API to complete processing files...

Staging app and tracing logs...
   3 of 4 buildpacks participating
   paketo-buildpacks/node-engine 0.1.1
   paketo-buildpacks/npm-install 0.2.0
   paketo-buildpacks/npm-start   0.0.2
   Previous image with name "gcr.io/cf-relint-greengrass/cf-workloads/d8c9059f-1ec5-4598-a2b2-9ed4ce4d1237" not found
StagerError - Stager error: Kpack build failed during container execution: Step failure reason: 'Error', message: ''.
FAILED

cf-api-controllers sometimes fail to notify CAPI of incomplete builds

https://capi.ci.cf-app.com/teams/main/pipelines/capi/jobs/samus-cf-for-k8s/builds/889

grepping the logs of the controller for the image guid that got stuck, we see:

 2020-07-24T18:16:15.327Z    DEBUG    controllers.Build    Build create event received    {"requestLink": "/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"}      │
│ 2020-07-24T18:16:15.714Z    DEBUG    controllers.Build    Build update event received    {"requestLink": "/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"}      │
│ 2020-07-24T18:16:15.998Z    DEBUG    controllers.Build    Build update event received    {"requestLink": "/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"}      │
│ 2020-07-24T18:16:15.998Z    DEBUG    controllers.Build    Build is not complete, took no action    {"buildName": "cf-workloads-staging/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb", "cloudfoundry.org/build_guid": "edcfc220-872e- │
│ 423c-866f-d53e665906af", "status": {"observedGeneration":1,"conditions":[{"type":"Succeeded","status":"False","lastTransitionTime":"2020-07-24T18:16:15Z","message":"pods \"8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb-build-pod\" │
│  already exists"}],"stack":{},"podName":"8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb-build-pod"}}                                                                                                                                   │
│ 2020-07-24T18:16:15.998Z    DEBUG    controller-runtime.controller    Successfully Reconciled    {"controller": "build", "request": "cf-workloads-staging/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"}

This indicates that the build controller sometimes fails to detect builds that Failed due to reasons that aren't listed in their container states. In this case, it seems like we sometimes observe a transient state involving a duplicate build pod. The BARA logs show this build succeeding, so this would probably work out fine if we logged the error and requeued the update event.

Allow `skip_cert_verify` to be configurable so operators can decide if they want to skip ssl validation

Is your feature request related to a problem? Please describe.
We noticed that currently, the skip_cert_verify property is hardcoded to true. See https://github.com/cloudfoundry/cf-for-k8s/blob/eb0e1b1e39900870d54dc3f1d47cf08049cf64fc/config/capi/_ytt_lib/capi-k8s-release/templates/ccng-config.lib.yml#L287. Our component would like to consume this property to toggle ssl validation.

Describe the solution you'd like
This property would be exposed and configurable to operators. This could either be through CCNG values or some kind of top-level/global configured property in the larger cf-for-k8s context, ie #@ data.values.ssl.skip_cert_verify

Thanks,
@belinda-liu && @weymanf

Problem with cf-for-k8s PR 253 (node can't pull custombuilder-built builder image)

We had to revert cloudfoundry/cf-for-k8s#253

The problem is that there was no problem with most of the pipeline, which uses kind. But some of the pipeline tests, and standard manual testing, use GKE, and this code was running into permission problems. In a nutshell, the kpack builder image can be stored in the gcr registry as a public object, but it can't be retrieved, and we get a failure in the cf-workloads-staging pod. See https://www.pivotaltracker.com/story/show/173470211/comments/215536752 for sample output

CF push of spring-music app with v7 CLI fails after building app image

Original issue on cf-for-k8s: cloudfoundry/cf-for-k8s#287

Based on associated slack threads (https://cloudfoundry.slack.com/archives/CH9LF6V1P/p1594916021423300 and https://cloudfoundry.slack.com/archives/CH9LF6V1P/p1594928317432200), CC's behavior in cf-for-k8s seems to be the cause, so I'm opening an issue on the capi-k8s-release repo for the CAPI team to track.

Failed to create/update/delete Route resource

We performed a tests with new routecontroller implementation in cf-for-k8s by pushing 10 apps concurrently. During the course of tests we observed the following error from cf-apiserver.

Failed to create/update/delete Route resource with guid 'f4cc6bd0-4242-4fa1-bc88-57ea8814049c' on Kubernetes\", \"error_code\"=>\"CF-KubernetesRouteResourceError

Detailed log

{"timestamp":1592899641.142837,"message":"Failed to Update Route CRD: HTTP status code 409, Operation cannot be fulfilled on routes.networking.cloudfoundry.org \"c0e42e0d-e6cc-4cce-b008-b8e976be6dea\": the object has been modified; please apply your changes to the latest version and try again for PUT https://kubernetes.default/apis/networking.cloudfoundry.org/v1alpha1/namespaces/cf-workloads/routes/c0e42e0d-e6cc-4cce-b008-b8e976be6dea","log_level":"info","source":"cc.action.route_update","data":{"request_guid":"257c79aa-76f1-4a88-a3a7-7f91c6fdc1f2::8e837f91-8cb2-45ed-ab06-54c40e3221a0"},"thread_id":47384772400140,"fiber_id":47384772089500,"process_id":1,"file":"/cloud_controller_ng/lib/kubernetes/route_crd_client.rb","lineno":53,"method":"rescue in update_destinations"}
{"timestamp":1592899641.1436174,"message":"Request failed: 422: {\"description\"=>\"Failed to create/update/delete Route resource with guid 'c0e42e0d-e6cc-4cce-b008-b8e976be6dea' on Kubernetes\", \"error_code\"=>\"CF-KubernetesRouteResourceError\", \"code\"=>400001, \"test_mode_info\"=>{\"description\"=>\"Failed to create/update/delete Route resource with guid 'c0e42e0d-e6cc-4cce-b008-b8e976be6dea' on Kubernetes\", \"error_code\"=>\"CF-KubernetesRouteResourceError\", \"backtrace\"=>[\"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:130:in `rescue in handle_exception'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:120:in `handle_exception'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:391:in `update_entity'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:240:in `block (2 levels) in define_entity_methods'\", \"/cloud_controller_ng/lib/kubernetes/route_crd_client.rb:50:in `update_destinations'\", \"/cloud_controller_ng/app/actions/v2/route_mapping_create.rb:52:in `add'\", \"/cloud_controller_ng/app/controllers/runtime/routes_controller.rb:262:in `add_app'\", \"/cloud_controller_ng/app/controllers/base/base_controller.rb:84:in `dispatch'\", \"/cloud_controller_ng/lib/cloud_controller/rest_controller/routes.rb:16:in `block in define_route'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1634:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1634:in `block in compile!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:992:in `block (3 levels) in route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1011:in `route_eval'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:992:in `block (2 levels) in route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1040:in `block in process_route'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1038:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1038:in `process_route'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:990:in `block in route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:989:in `each'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:989:in `route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1097:in `block in dispatch!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `block in invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1094:in `dispatch!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:924:in `block in call!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `block in invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:924:in `call!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:913:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/xss_header.rb:18:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/path_traversal.rb:16:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/json_csrf.rb:26:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/base.rb:50:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/base.rb:50:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/frame_options.rb:31:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/null_logger.rb:11:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/head.rb:12:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:194:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1957:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:74:in `block in call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `each'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `call'\", \"/cloud_controller_ng/middleware/request_logs.rb:38:in `call'\", \"/cloud_controller_ng/middleware/security_context_setter.rb:19:in `call'\", \"/cloud_controller_ng/middleware/vcap_request_id.rb:15:in `call'\", \"/cloud_controller_ng/middleware/cors.rb:49:in `call_app'\", \"/cloud_controller_ng/middleware/cors.rb:14:in `call'\", \"/cloud_controller_ng/middleware/request_metrics.rb:12:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/builder.rb:244:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in spawn_threadpool'\"]}}","log_level":"info","source":"cc.api","data":{"request_guid":"257c79aa-76f1-4a88-a3a7-7f91c6fdc1f2::8e837f91-8cb2-45ed-ab06-54c40e3221a0"},"thread_id":47384772400140,"fiber_id":47384772089500,"process_id":1,"file":"/cloud_controller_ng/lib/sinatra/vcap.rb","lineno":44,"method":"block in registered"}

The Route object creation was succeeded. I am able to get the route object with the id but the app was not deployed.

CAPI `registry` configuration doesn't work for dockerhub.com

Expected Behavior

I expect cf-push build pack to work with dockerhub.com registry since it is a publicly trusted registry.

Actual Behavior

cf-push fails when using dockerhub.com as a cf-app registry.

Additional details

The configuration works for gcr.io but fails for dockerhub.com. See the following thread on details, steps, and possible solutions.

cloudfoundry/cf-for-k8s#71

ccdb-migrate job should not wait for Istio

Since upgrading Istio in cf-for-k8s to 1.7 and closing cloudfoundry/cf-for-k8s#189 app containers should not start while the sidecar container is initializing, so there is no need for having custom code for that.

Also, with the current plan of supporting Istio and Contour without Istio ccdb-migrate should not rely at all that sidecar will be injected.

/v3/processes/PROCESS_GUID returns incorrect process command

The BARA "setting_process_commands manifest and Procfile/detected buildpack command interactions prioritizes the manifest command over the Procfile and can be reset via the API" fails when it checks that the process command has been set correctly.

It returns just the first argument in the command, in this case, "bundle" instead of "bundle exec rackup..." For multi-argument commands it should return the entire command.

The app works fine, so we're reasonably sure the problem is a display problem and the underlying process command is set correctly.

build reconciler is not copying the full detected start command for processes

Steps to Reproduce

Push the dora app with a Procfile that defines some processes. I used the following:

web: bundle exec rackup config.ru -p $PORT
worker: sleep 10000
foo: bundle exec rackup config.ru -p 8080

cf push the app
Find the guid for the resulting droplet and cf curl /v3/droplets/<guid>

You will see in the process_types field that only the first part of the start command was captured.

Example output:

{
   "guid": "5efc4402-ca92-4387-b0d0-91a6a21d2daf",
   "created_at": "2020-10-21T00:05:22Z",
   "updated_at": "2020-10-21T00:05:59Z",
   "state": "STAGED",
   "error": null,
   "lifecycle": {
      "type": "kpack",
      "data": {}
   },
   "checksum": null,
   "buildpacks": null,
   "stack": null,
   "image": "gcr.io/cf-capi-arya/cf-workloads/8eb4ed9a-3804-4a66-90b6-6db0fade796c@sha256:fe502f3bbc3efcd0bf3ae647c4c733e127f2dd3625c2aaabc07f1c13b5e6d25f",
   "execution_metadata": null,
   "process_types": {
      "foo": "bundle",
      "web": "bundle",
      "worker": "sleep"
   },
   "relationships": {
      "app": {
         "data": {
            "guid": "8eb4ed9a-3804-4a66-90b6-6db0fade796c"
         }
      }
   },
   "metadata": {
      "labels": {},
      "annotations": {}
   },
   "links": {
      "self": {
         "href": "https://api.tim.k8s.capi.land/v3/droplets/5efc4402-ca92-4387-b0d0-91a6a21d2daf"
      },
      "app": {
         "href": "https://api.tim.k8s.capi.land/v3/apps/8eb4ed9a-3804-4a66-90b6-6db0fade796c"
      },
      "assign_current_droplet": {
         "href": "https://api.tim.k8s.capi.land/v3/apps/8eb4ed9a-3804-4a66-90b6-6db0fade796c/relationships/current_droplet",
         "method": "PATCH"
      },
      "package": {
         "href": "https://api.tim.k8s.capi.land/v3/packages/e0bf1462-ba4b-48cb-aa78-9ac2f91900da"
      }
   }
}

Assuming everything is all wired up correctly, this will result in apps failing to start since they only get the first piece of the start command. So for the web process it just runs bundle and installs its gems again! 🤯

k -n cf-workloads logs multi-dora-proc-s-622cecd9c7-0 -c opi -f
Using bundler 2.1.4
Using diff-lcs 1.4.2
Using json 2.3.0
Using ruby2_keywords 0.0.2
Using mustermann 1.1.1
Using rack 2.2.3
Using rack-protection 2.0.8.1
Using rack-test 1.1.0
Using rspec-support 3.9.3
Using rspec-core 3.9.2
Using rspec-expectations 3.9.2
Using rspec-mocks 3.9.1
Using rspec 3.9.0
Using tilt 2.0.10
Using sinatra 2.0.8.1
Updating files in vendor/cache
Bundle complete! 4 Gemfile dependencies, 15 gems now installed.
Bundled gems are installed into `/layers/paketo-buildpacks_bundle-install/gems`

Why?

It looks like we're only copying over the Command for a Process and ignoring the Args.

[Bug] Retrying the ccdb-migration job thrice fails on AKS

Reported by cf-for-k8s team cc:@jamespollard8

Install is now failing frequently on AKS (2/3 attempts so far)
see https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-for-k8s/jobs/validate-azure/builds/14 and https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-for-k8s/jobs/validate-azure/builds/16

8:45:54PM: ---- waiting on 6 changes [239/245 done] ----
8:45:58PM: ongoing: reconcile deployment/capi-api-server (apps/v1) namespace: cf-system
8:45:58PM:  ^ Waiting for 1 unavailable replicas
8:45:58PM:  L ok: waiting on replicaset/capi-api-server-5cc896b848 (apps/v1) namespace: cf-system
8:45:58PM:  L ok: waiting on podmetrics/capi-api-server-5cc896b848-p25vl (metrics.k8s.io/v1beta1) namespace: cf-system
8:45:58PM:  L ongoing: waiting on pod/capi-api-server-5cc896b848-p25vl (v1) namespace: cf-system
8:45:58PM:     ^ Condition Ready is not True (False)
8:45:58PM:  L ok: waiting on pod/capi-api-server-5cc896b848-mz7lt (v1) namespace: cf-system
8:45:58PM: fail: reconcile job/ccdb-migrate (batch/v1) namespace: cf-system

kapp: Error: waiting on reconcile job/ccdb-migrate (batch/v1) namespace: cf-system: finished unsuccessfully (Failed with reason BackoffLimitExceeded: Job has reached the specified backoff limit)

Background: The cc-db migration job has to wait until the database (postgres) is ready. As a workaround to this issue we added a retry policy which tries 3 times to run the migrations. Looks like it is failing on AKS so we should either increase the number of retries or have a more robust policy.

STRICT Istio mesh mTLS policy is incompatible with Cloud Controller's init container

Background

When STRICT mode is enabled for the mTLS Istio mesh policy all traffic between Pods covered by the policy must use mTLS. This means that the Istio sidecars (istio-proxy Envoy container) must be ready before Pods can successfully communicate with each other over the network. See the Istio Security docs for much information all of this.

The issue with Cloud Controller right now is that it is using an init container to run database migrations. This init container runs before the regular istio-proxy container on the Pod is ready so there is no client sidecar in place to mediate the mTLS connection with the sidecar running on the database Pod¹.

Currently there is no workaround on the Istio side. Their stance² is:

There is no easy solution. Your init container will run before the sidecar starts. If your container runs before Istio’s init container it will not be secure. If your container runs after Istio’s it will not have network access.

If you can avoid doing network I/O in your init containers you should. If you must use init containers that expect connectivity, you’ll need a work-around.

Can Cloud Controller run these migrations in some other way? Maybe in a Job?

Thanks!
Tim && @ndhanushkodi

Footnotes

¹ this actually probably doesn't matter for off cluster databases 🤔💭
² https://discuss.istio.io/t/k8s-istio-sidecar-injection-with-other-init-containers/845

High Latency observed between staging pod completed and stateful set creation

Describe the bug

As a followup for the scalability tests we performed a max concurrent Push test.
During the tests we observed a strange behaviour when we perform concurrent pushes(20) we see huge latency between staging pod completion time in cf-workloads-staging namespace and creation of statefulset in cf-workloads namespace.

We already ensured that cf-api-server has enough resources to handle the load.

Below graph shows latency with X-Axis having Number of Apps deployed vs Y-Axis Latency in seconds(Tme difference between staging pod completed time and Stateful set creation time ).

To Reproduce*

Steps to reproduce the behavior:

Try to push concurrent apps with 20~40 at a time.
Now we used the below method to extract the time difference.

guid=$(kubectl get pods -n cf-workloads-staging $app -o json | jq '.metadata.labels | to_entries | .[] | select(.key=="cloudfoundry.org/app_guid") | .value ' | cut -d'"' -f 2)

stage_completion_time=$(kubectl get pods -n cf-workloads-staging $app -o json | jq '.status.conditions[0].lastTransitionTime' | cut -d'"' -f 2)

sts_creation_time=$(kubectl get sts -n cf-workloads -l  cloudfoundry.org/app_guid=$guid -o json | jq '.items[0].metadata.creationTimestamp' | cut -d'"' -f 2)

Let us know if the issue is from Eirini, we can open the issue over there.

[RFC #0003] Scheduling workloads across multiple k8s Namespaces

Feature Name: Scheduling workloads across multiple k8s Namespaces

Type: feature

Author: Connor Braa

Related components: cloud_controller_ng, capi-k8s-release, eirini

Summary

We plan to introduce a /v3/placements API. A single placement resource describe a namespace or namespaces where CF API workloads may be scheduled. Upon introduction, there will be an initial default placement for existing workloads. /v3/placements will likely be persisted in the k8s API as a Placement CRD. The placements API will provide future flexibility so that we may eventually support:

Dynamic, per-/v3/space kubernetes Namespace creation and deletion
Namespaces in other k8s clusters (Multicluster CF API)
Multiple namespaces in a single placement (Multicluster CF Apps)

Motivation

Currently, the CF API schedules all of its runtime workloads into a 1-per-installation "cf-workloads" namespace via Eirini's OPI LRP API and the cf-k8s-networking Route CRD. All build-time workloads are scheduled into a different, 1-per-installation "cf-workloads-staging" namespace via Kpack's Image CRD.

Recently, the Eirini team has added the capability to schedule LRP workloads across different namespaces. This is available both in OPI and k8s-natively in the new, as-of-yet unused LRP CRD.

cf-for-k8s' first deployers and tire kickers have made explicit requests that apps don't all share one global k8s "cf-workloads" namespace. That global namespace is a bit of a security threat and could present a significant hurdle in the future when we want to give users more k8s API access.

Some amount of initial thought has been given to how the CF API might benefit users that need to schedule workloads across k8s clusters. I'd like to propose that we use namespaces as a way of exploring API modeling around workload placement broadly with the hopes that we'll later extend these same API resources to work across k8s clusters.

Proposal

At a high level, I'd like to propose that we solve this problem through flexible, configurable data modeling.

To give platform engineers the ability to place workloads in different Namespaces, one approach is to begin by creating a representation of the current state of workload placements. What follows are example objects that would represent the current way Apps, Routes, and Kpack Images are spread across Namespaces in cf-for-k8s.

resources: [
{
  "name": "existing-cf-for-k8s-placement",
  "guid": "placement-x",
  "targets": [
    {
       "namespace": "cf-workloads"
    }
  ],
  "global_runtime_default": true,
  "global_staging_default": false,
  "metadata": { "labels": {}, "annotations": {}}
},
{
  "name": "existing-cf-for-k8s-staging-placement",
  "guid": "placement-y",
  "targets": [
    {
       "namespace": "cf-workloads-staging"
    }
  ],
  "global_default": false,
  "global_default_staging": true,
  "metadata": { "labels": {}, "annotations": {}}
}
]

And the associated CRs that platform engineers could configure at deploy-time:

---
apiVersion: apps.cloudfoundry.org/v1alpha1
kind: Placement
metadata:
  labels: 
     apps.cloudfoundry.org/placement_guid: placement-x
  annotations: {}
  name: existing-cf-for-k8s-placement,
  namespace: cf-system
spec:
  targets:
  -  namespace: cf-workloads
  global_runtime_default: true
  global_default_staging: false
---
apiVersion: apps.cloudfoundry.org/v1alpha1
kind: Placement
metadata:
  labels:
     apps.cloudfoundry.org/placement_guid: placement-y
  annotations: {}
  name: "existing-cf-for-k8s-placement",
  namespace: cf-system
spec:
  targets:
  -  namespace: cf-workloads
  global_runtime_default: false
  global_default_staging: true

A single placement describes a sort of multiplexer for CF Apps. In the initial, single-target case this multiplexer passes its inputs straight through to the underlying namespace. In the future, we'd want to support multi-target placements where the multiplexer would create copies of compute and networking resources across its targets. A single-process, routable web app in a space with a 2-target placement would have 2 "web" processes that map to two StatefulSets across 2 namespaces. Likewise, the app's associated Route would need to span those two namespaces or have copies in both.

Once these API resources exist, we can add features to them. Examples that fit pretty naturally into this modeling include:

A platform engineer wants the API to create and use independent Namespaces for each created /v3/space
- This would likely be a something we could enable via CF API configuration at startup
A platform engineer wants to have one space deploy to a specific, pre-existing Namespace
Commercial CF packagers want some apps to be deployed to the cf-system Namespace
A platform engineer wants to give devs the capability to deploy to an independent cluster for compliance reasons
- Or wants to enforce that all apps in a given space deploy to that cluster
- CF could feasibly deploy to clusters where it's a "guest" in a single namespace without elevated permissions.
A platform engineer wants to give devs the capability to deploy across independent clusters for availability reasons
- Or wants to enforce that all apps in a given space deploy across clusters

Drawbacks

It's possible that modeling namespaces as part of (multicluster) workload placement is putting the cart before the horse and coupling together two concepts that don't necessarily need to be related.

Features that might be a bit hairier to build with this modeling include:

A platform engineer wants to move all workloads out of a certain namespace or cluster so it can be safely deleted
- Modifying a Placement's spec in-place seems dangerous and difficult to reconcile
An app developer wants to cf scale appName -i 5 their app and understand where those instances will be placed

Alternatives

Spaces have a Namespace, no "placements" required

Placements are not entirely necessary to get users' apps into separate namespaces. Instead, we could have the CF API and associated controllers create a Namespace for each new space and immediately start scheduling new app workloads into them.

Spaces with "namespace" property

Rather than migrating or modeling placements independently of existing tenancy constructs, spaces could have a "namespace" field. This field would be seeded to "cf-workloads" for existing spaces, configurable upon space creation, and dynamically allocated when omitted.

This option offers similar configurability as placements and backwards compatibility with the existing structure, but is flexible for future enhancements.

Migrate apps from global namespace to segmented namespace

To move them without user intervention, we'd need to find a way to safely move StatefulSets and Routes across Kubernetes namespaces during a cf-for-k8s deployment or the upgrade would incur app downtime or even breakage.

Orgs have a Namespace

This approach creates a situation similar to isolation segments, and would perhaps alleviate the need for dynamic namespace allocation. There's a similar solution space to be explored about migrating or not migrating existing apps.

Notes

TBD

Create instructions/scripts for deploying to a real cluster instead of a minikube

Would be nice to have quick-start instructions for using capi-k8s-release with a real cluster instead of a minikube.

It's not immediately obvious for new users how to use our chart.

Kpack naming/tagging strategy

Raising by request of @selzoc

As it is currently implemented, the kpack integration uses a naming strategy whereby multiple pushes of a single application will each result in unique images being created in the image repository, rather than creating separate tags of the same image.

For example, pushing an app twice might currently create:

library/78f0dcd6-2d8d-464e-8094-e95926dabac3:b1.20200316.233450
library/2ce2458d-0ef0-43f0-929a-b9dd2c106e54:b1.20200316.232455

instead of:

library/2ce2458d-0ef0-43f0-929a-b9dd2c106e54:b1.20200316.232455
library/2ce2458d-0ef0-43f0-929a-b9dd2c106e54:b2.20200316.233450

The strategy taken here should be chosen very deliberately, as there could be consequences related to broader tooling, for example:

In Harbor this creates a separate repository for every single push, how will this scale?
Image scanning tools may provide a user experience assuming a tag per app version, and creating a separate image could create headache for users of this tooling
Various container registries could have limitations which are exacerbated by not using tags, leading to cost or maintenance implications

This is not to say the current approach is wrong, but that it should be carefully weighed against alternatives.

CC should send all build failure messages to the app's `cf logs` stream

In normal app developer operation, it's possible for a kpack build to fail to create a Build pod due to a lack of resources. It's also possible for builds to fail without having container status failures. We've done some work to get those errors correctly propogated through cf-api-controllers to the API, but it's possible for users to miss them.

Ideally, it'd be great to see any build error in the output of cf logs appName.

CAPI api server runs out of DB connections

The number of DB connections is limited to 500 because of which can't able to scale capi-api-server.

Default max connections is set to 25 here
I tried to scale capi-apiserver to 20 replicas and got the following error in pre-start init container

Caused by: PG::ConnectionBad: FATAL: remaining connection slots are reserved for non-replication superuser connections

Next tried by changing max_connections to 100 and were able to scale only 5 replicas of capi-api-server.

And other thing what observed is , even if one replica of api-server is down , API is not accessible. It gives 500 status code

Error Code: 500
Raw Response: {}
FAILED

[RFC #0001] Supporting service bindings used during staging

Feature Name: Supporting service bindings used during staging in cf-for-k8s

Type: feature

Author: CAPI team

Related components: cloud_controller_ng, capi-k8s-release

Summary

We create a Service Binding Controller to manage Service Bindings.

Motivation

We currently provide service bindings to apps at build-time via the VCAP_SERVICES environment variable. This mechanism is not supported by the Paketo Cloud Native Buildpacks.

We need to continue to support build-time service bindings so that the platform can continue to automatically configure APM integrations on behalf of apps. This will require providing the service binding to kpack as a Kubernetes Service Binding. This type of service binding takes the form of a structured directory, mounted to the build container.

Proposal

When the service binding /create endpoint is hit, CAPI will create a Service Binding resource. The controller will call back to CAPI with the state of that service binding whenever it changes. When CAPI will include a reference to that resource when it requests new builds from kpack, and when it requests long-running processes (LRPs) from Eirini.

Drawbacks

This may require some work from Eirini to actually mount the secret to the running image.

This adds another codebase and component that will need to be managed.

There's currently no endpoint for updating a service binding. We don't know why.

Alternatives

We could imperatively create service bindings. In that case CAPI would need to monitor and manage the status of the service binding resources and synchronize its data state according.

Notes

We're not sure what the failure modes are for service binding creation/management, and probably need to investigate that further.

As of this writing (August 18 2020) kpack does not support the Kubernetes service binding spec, it supports the CNB service binding spec. We expect it to support the Kubernetes spec soon, since the CNB spec is already deprecated.

Placeholder tracker stories
[PLACEHOLDER] mount service bindings to runtime container
[PLACEHOLDER] convert VCAP_SERVICES to kpack bindings at build time

ruby app fails to start with 'bundle: not found' error

We've noticed this attempting to deploy postfacto on cf-for-k8s v0.6.0.

Steps to reproduce:

Unzip postfacto package from https://github.com/pivotal/postfacto/releases/download/4.3.0/package.zip
Modify package/cf/manifest.yml to remove the buildpacks and services lists
Remove the line with ruby '2.6.3' from `package/assets/Gemfile
cd package/cf
./deploy.sh app-name

Staging completes successfully, but the app fails to start. Logs on the app container show /bin/sh: 1: bundle: not found.

If you remove command from the package/cf/manifest.yml, and instead create package/assets/Procfile with contents web: bundle exec rake db:migrate && bundle exec rails s -p \$PORT -e development, the app will start (although that also requires configuring the app to use mysql and redis correctly. Without that, at least you'll see bundle is found).

Slack thread: https://cloudfoundry.slack.com/archives/CH9LF6V1P/p1600788157043700

cc: @cloudfoundry/eirini

cloudfoundry / capi-k8s-release Goto Github PK

capi-k8s-release's Issues

Summary

Reproduction Steps

Expected behavior

Describe the bug

To Reproduce

Expected behavior

Summary

Motivation

Proposal

Update the package-image-uploader

Benefits

Drawbacks

Alternatives

Write our own Ruby client for the OCI registry and use it in CCNG’s code

Use a Ruby gem for interacting with the OCI registry

Shell out to a binary utility that uses go-containerregistry to interact with the OCI registry from CCNG’s code

Notes

Expected Behavior

Actual Behavior

Additional details

Steps to Reproduce

Why?

Background

Footnotes

Describe the bug

To Reproduce*

Summary

Motivation

Proposal

Drawbacks

Alternatives

Spaces have a Namespace, no "placements" required

Spaces with "namespace" property

Migrate apps from global namespace to segmented namespace

Orgs have a Namespace

Notes

Summary

Motivation

Proposal

Drawbacks

Alternatives

Notes

Recommend Projects

Recommend Topics

Recommend Org