cloudfoundry / capi-k8s-release Goto Github PK
View Code? Open in Web Editor NEWThe CF API parts of cloudfoundry/cf-for-k8s
License: Apache License 2.0
The CF API parts of cloudfoundry/cf-for-k8s
License: Apache License 2.0
It is confusing that the Cloud Controller deployments (API, generic worker, clock, and deployment updater) are prefixed with capi-
instead of with cloud-controller-
or something similar, as Cloud Controller is the component that they run.
These names also do not match the corresponding job names in the CAPI BOSH release (cloud_controller_ng
, cloud_controller_worker
, cloud_controller_clock
, and cc_deployment_updater
). Additionally, "CAPI" in a Kubernetes context now typically means the Cluster API, so including capi
in the names of these deployments is likely to confuse those already familiar with Kubernetes projects.
Finally, these names should be changed sooner rather than later, before consumers such as cf-for-k8s begin to provide backwards compatibility or high reliability to their users.
Repo steps:
cf enable-feature-flag diego_docker
cf push console -o splatform/stratos:stable
kubectl describe service -n cf-workloads [app service name]
Stratos app exposes port 5443: https://github.com/cloudfoundry/stratos/blob/7519e6ab5570fc75b471ce6679a75df7d262b3c1/deploy/Dockerfile.all-in-one#L42
Expected Behavior: TargetPort is 5443, per EXPOSE directive in Dockerfile
Actual Behavior: TargetPort is 8080.
Tim says: "the route and destination is created before the docker app is staged. at that point there is no execution metadata so it creates it with the default 8080"
https://app.slack.com/client/T02FL4A1X/threads/thread/C017LDM6KTQ-1603233260.197100
We are seeing smoke tests fail with the following error message:
[2020-09-24 17:29:51.27 (UTC)]> cf push cf-for-k8s-smoke-1-app-99029be0dfaba098 -p assets/test-node-app --no-route
...
...
Waiting for API to complete processing files...
Job (0f30b04d-672f-42b7-abc9-a98c44cff879) failed: An unknown error occurred.
FAILED
The cf-api-server pod contains the following error message (among a few others)
cf-api-server-699d49df87-zqwh4 > package-image-uploader | 2020/09/24 17:29:55 Error from uploadFunc(/tmp/packages/registry_bits_packer20200924-1-1qu1y1d/copied_app_package.zip, harbor.pig.cf-app.com/ci-app-workloads-pr/40b6d190-a263-404b-9865-86836c5b7e5d): Get "https://harbor.pig.cf-app.com/v2/": x509: certificate signed by unknown authority
ccdb-migrate-n6m6k > istio-proxy | 2020-09-24T17:27:10.546624Z info sds resource:ROOTCA connection is terminated: rpc error: code = Canceled desc = context canceled
It looks like the issue might be due to capi not having the app registry CA cert configured. But it also looks like we don’t have a way to do that. Can y'all expose a property to configure the app registry CA please?
As discussed with @piyalibanerjee and @jspawar
We noticed a failed build of smoketests on cf-for-k8s and starting digging in.
We observed that the completion container of the staging pod completed but showed ready=false
- which we realized is the expected behavior (even for apps that are successfully pushed).
We next saw the capi-kpack-watcher
logs which were surprisingly short (no details about app staging):
2020/03/30 23:36:51 Watcher initialized. Listening...
2020/03/30 23:36:51 [AddFunc] New Build: &{TypeMeta:{Kind:Build APIVersion:build.pivotal.io/v1alpha1} ObjectMeta:{Name:eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw GenerateName:eec29581-395d-4841-adee-d1c37c146684-build-1- Namespace:cf-workloads-staging SelfLink:/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw UID:0d60fe53-0c1b-4c0f-a005-02ee797e8d95 ResourceVersion:1487666 Generation:1 CreationTimestamp:2020-03-30 23:36:48 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[cloudfoundry.org/app_guid:a4bb542b-bf81-4a26-9ff2-28f3734cfd37 cloudfoundry.org/build_guid:5266c24f-dece-49a6-a2b8-e196724c35b2 cloudfoundry.org/source_type:STG image.build.pivotal.io/buildNumber:1 image.build.pivotal.io/image:eec29581-395d-4841-adee-d1c37c146684] Annotations:map[image.build.pivotal.io/reason:CONFIG sidecar.istio.io/inject:false] OwnerReferences:[{APIVersion:build.pivotal.io/v1alpha1 Kind:Image Name:eec29581-395d-4841-adee-d1c37c146684 UID:e22d0931-859e-4e2b-9ce4-7858f4f3a44d Controller:0xc000400079 BlockOwnerDeletion:0xc000400078}] Initializers:nil Finalizers:[] ClusterName: ManagedFields:[]} Spec:{Tags:[gcr.io/cf-relint-greengrass/cf-workloads/eec29581-395d-4841-adee-d1c37c146684 gcr.io/cf-relint-greengrass/cf-workloads/eec29581-395d-4841-adee-d1c37c146684:b1.20200330.233648] Builder:{Image:index.docker.io/cloudfoundry/cnb@sha256:0a718640a4bde8ff65eb00e891ff7f4f23ffd9a0af44d43f6033cc5809768945 ImagePullSecrets:[]} ServiceAccount:cc-kpack-registry-service-account Source:{Git:<nil> Blob:0xc0003ef860 Registry:<nil> SubPath:} CacheName: Env:[] Resources:{Limits:map[] Requests:map[]} LastBuild:<nil>} Status:{Status:{ObservedGeneration:1 Conditions:[{Type:Succeeded Status:False Severity: LastTransitionTime:{Inner:2020-03-30 23:36:50 +0000 UTC} Reason: Message:pods "eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw-build-pod" already exists}]} BuildMetadata:[] Stack:{RunImage: ID:} LatestImage: PodName:eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw-build-pod StepStates:[] StepsCompleted:[]}}
The content of note from that long log line is Reason: Message:pods "eec29581-395d-4841-adee-d1c37c146684-build-1-8lzkw-build-pod" already exists
, which we've only observed before when trying to restage an existing app, which is currently not supported. (But also is not the case here as this was a new cf push
)
Looking more carefully at the capi-kpack-watcher
definition, we noticed that it had restarted once. Looking at the timestamps, we see that it came back up at 16:36:51 which is about 10 seconds after the cf push was executed. Looking at the timestamps from the short kpack-watcher logs above, we see that the build pod was created a few seconds before the kpack-watcher came back up.
capi-kpack-watcher:
Container ID: docker://941b7763fec87f9fdadcb4c7d60322f8ca6486b16d7cdde2c803c32b888a01b1
Image: cloudfoundry/capi-kpack-watcher:latest
Image ID: docker-pullable://cloudfoundry/capi-kpack-watcher@sha256:6e6a3778953aa2b998e933fda726f18f5da89aa6b5738005efc22abd727a69dc
Port: <none>
Host Port: <none>
State: Running
Started: Mon, 30 Mar 2020 16:36:51 -0700
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 30 Mar 2020 16:33:09 -0700
Finished: Mon, 30 Mar 2020 16:36:50 -0700
Ready: True
Restart Count: 1
As far as the kpack-watcher container restarting, we've grabbed the full end of the log here in this gist but here's a good, relevant section:
reason: map[image.build.pivotal.io/reason:CONFIG sidecar.istio.io/inject:false]
E0330 23:36:50.348158 1 runtime.go:73] Observed a panic: runtime.boundsError{x:-1, y:0, signed:true, code:0x0} (runtime error: index out of range [-1])
goroutine 5 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x12102e0, 0xc0000448c0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:69 +0x7b
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:51 +0x82
panic(0x12102e0, 0xc0000448c0)
/usr/local/go/src/runtime/panic.go:679 +0x1b2
capi_kpack_watcher/watcher.(*BuildWatcher).handleFailedBuild(0xc0002ec7e0, 0xc00011e280)
/capi-kpack-watcher/watcher/build_watcher.go:144 +0x3a7
capi_kpack_watcher/watcher.(*BuildWatcher).UpdateFunc(0xc0002ec7e0, 0x12b1820, 0xc0000e9b80, 0x12b1820, 0xc00011e280)
/capi-kpack-watcher/watcher/build_watcher.go:48 +0x2cb
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc00005c5c8, 0x41720e, 0x7f1db97fc730)
/go/pkg/mod/k8s.io/[email protected]
...
f2f3a405f61d/pkg/util/wait/wait.go:69 +0x62
panic: runtime error: index out of range [-1] [recovered]
panic: runtime error: index out of range [-1]
We noticed a failed build of smoke-tests in our CI recently. We dug into it, thinking that it might be another instance of the issue reported in #27, but found that, in this case, the capi-kpack-watcher
had not restarted:
capi-kpack-watcher:
Container ID: docker://8c5ae51161a1722dda5ed0c38c774d401914dd05413c0fbb4b2df1a3caeea90e
Image: cloudfoundry/capi-kpack-watcher:956150dae0a95dcdf3c1f29c23c3bf11db90f7a0@sha256:67125e0d3a4026a23342d80e09aad9284c08ab4f7b3d9a993ae66e403d5d0796
Image ID: docker-pullable://cloudfoundry/capi-kpack-watcher@sha256:67125e0d3a4026a23342d80e09aad9284c08ab4f7b3d9a993ae66e403d5d0796
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 02 Apr 2020 19:16:16 -0700
Ready: True
Restart Count: 0
...
When we dug more into the logs from the capi-kpack-watcher
, we saw that it had successfully completed staging the app, but received a 404 when it went to post the update with the image ID back to the API server. When we looked at the logs from the two capi-api-server
pods, we saw that one pod had handled the initial POST request (presumably from the CLI) to start staging the app, and the other pod had handled the PATCH request from the watcher to update the app state. Please see this gist for details. To aid with debugging, this test ran with cloudfoundry/cf-for-k8s@6c9f5cd, which uses capi-k8s-release @ 2998092.
Howdy! There are some commercial products that want to package this repo, but cannot until there is an explicit license. It's legally dangerous for anyone to use your code without some form of license on the code. Most CF projects are Apache 2 licensed.
Thanks!
We deployed cf-for-k8s and performed a scale tests . We pushed source code based apps. We reached till 800 apps . Under such high load we could see the following error in capi-api-server
{"timestamp":1587738150.8981497,"message":"Request failed: 404: {\"errors\"=>[{\"detail\"=>\"Droplet not found\", \"title\"=>\"CF-ResourceNotFound\", \"code\"=>10010, \"test_mode_info\"=>{\"detail\"=>\"Droplet not found\", \"title\"=>\"CF-ResourceNotFound\", \"backtrace\"=>[\"/cloud_controller_ng/app/controllers/v3/application_controller.rb:34:in `resource_not_found!'\", \"/cloud_controller_ng/app/controllers/v3/apps_controller.rb:347:in `droplet_not_found!'\", \"/cloud_controller_ng/app/controllers/v3/apps_controller.rb:322:in `current_droplet'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/basic_implicit_render.rb:6:in `send_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/base.rb:194:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/rendering.rb:30:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/callbacks.rb:42:in `block in process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:132:in `run_callbacks'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/callbacks.rb:41:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/rescue.rb:22:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/instrumentation.rb:34:in `block in process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `block in instrument'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications/instrumenter.rb:23:in `instrument'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/notifications.rb:168:in `instrument'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/instrumentation.rb:32:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal/params_wrapper.rb:256:in `process_action'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/abstract_controller/base.rb:134:in `process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionview-5.2.4.2/lib/action_view/rendering.rb:32:in `process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal.rb:191:in `dispatch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_controller/metal.rb:252:in `dispatch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:52:in `dispatch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:34:in `serve'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:52:in `block in serve'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:35:in `each'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/journey/router.rb:35:in `serve'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/routing/route_set.rb:840:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/tempfile_reaper.rb:15:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/http/content_security_policy.rb:18:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/callbacks.rb:28:in `block in call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.4.2/lib/active_support/callbacks.rb:98:in `run_callbacks'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/callbacks.rb:26:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/debug_exceptions.rb:61:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/runtime.rb:22:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/activesupport- 5.2.4.2/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/actionpack-5.2.4.2/lib/action_dispatch/middleware/executor.rb:14:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/sendfile.rb:110:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:74:in `block in call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `each'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `call'\", \"/cloud_controller_ng/middleware/request_logs.rb:22:in `call'\", \"/cloud_controller_ng/middleware/security_context_setter.rb:19:in `call'\", \"/cloud_controller_ng/middleware/vcap_request_id.rb:15:in `call'\", \"/cloud_controller_ng/middleware/cors.rb:49:in `call_app'\", \"/cloud_controller_ng/middleware/cors.rb:14:in `call'\", \"/cloud_controller_ng/middleware/request_metrics.rb:12:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/builder.rb:244:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in spawn_threadpool'\"]}}]}","log_level":"info","source":"cc.api","data":{"request_guid":"3b75e731-1246- 4c94-b2a0-b7300d9359e7::02a69280-4037-4b1a-bec8- aef5a674c325"},"thread_id":47075637335140,"fiber_id":47075631752220,"process_id":1,"file":"/cloud_c ontroller_ng/app/controllers/v3/application_controller.rb","lineno":178,"method":"handle_exception"}
cf-for-k8s Version: https://github.com/cloudfoundry/cf-for-k8s/releases/tag/v0.1.0
shoutout to Adrian for catching this
The unpinned images are here
yeah we should totally do that
- Selzo
We tried to use this helm chart together with quarks.
We noticed that for this use case several configuration values are missing:
uaa.internal_url
in ccng-configmap.yaml
internal_url: {{ .Values.uaa.internal_url | default "http://uaa.{{ .Release.Namespace }}.svc.cluster.local:8080" }}
blobstore.signature_version
in ccng-configmap.yaml
aws_signature_version: {{ .Values.blobstore.signature_version | default "2" | quote }}
blobstore.region
in ccng-configmap.yaml
region: {{ .Values.blobstore.region | quote }}
nameserver
in api_server_deployment.yaml
(this is only required to resolve BOSH DNS names like uaa.service.cf.internal
. In a kubernetes native environment, this is not really required) {{- if .Values.nameserver }}
dnsConfig:
nameservers:
- {{ .Values.nameserver }}
options:
- name: ndots
value: "5"
searches:
- {{ .Release.Namespace }}.svc.cluster.local
- svc.cluster.local
- cluster.local
- service.cf.internal
dnsPolicy: None
{{- end }}
Pivotal provides the GitBot service to synchronize pull requests and/or issues made against public GitHub repos with Pivotal Tracker projects. This service does not track individual commits.
If you are a Pivotal employee, you can configure Gitbot to sync your GitHub repo to your Pivotal Tracker project with a pull request. An ask+rd@ ticket is the fastest way to get write access if you get a 404 to the config repo.
If you do not want have pull requests and/or issues copied from GitHub to Pivotal Tracker, you do not need to take any action.
If there are any questions, please reach out to [email protected].
System and user components should live in different namespaces to simplify security (for example with Networking policies).
Right now, the staging pod is created in cf-system namespace.
see Credhub and cf-for-k8s Password Generation
We need to make an effort to get capi-k8s-releases' secrets out of its big global configmap. Part of doing this will be removing unnecessary TLS certificates out of capi-k8s-release, but there are more secrets hiding in the configmap:
cf-api-kpack-watcher
I think we'd also need to handle the database encryption keys.
Currently, all pods except cf-api-server have TCP readiness probes. They are not working properly because of few reasons:
127.0.0.1
. However, Kubernetes uses PodIP:port to reach probe, so if the server listens on 127.0.0.1
they won't accept connections from outside.To deploy cf-for-k8s without Istio you can deploy cf-for-k8s#contour-ingress branch in a standard way and ensure that networking.ingress_solution_provider is set to contour
(should be a default in that branch).
Steps to reproduce:
$ cf push catnip
$ cf restage catnip
Relevant snippet from a CF_TRACE=1 cf restage catnip
:
Staging app and tracing logs...
REQUEST: [2020-03-24T22:54:42-07:00]
POST /v2/apps/cf328831-b2b6-4f8b-a633-22d0b525a77c/restage HTTP/1.1
...
RESPONSE: [2020-03-24T22:54:42-07:00]
HTTP/1.1 500 Internal Server Error
...
{
"code": 10001,
"description": "An unknown error occurred.",
"error_code": "UnknownError"
}
On the API pod, we see this error log:
{"error_code"=>"UnknownError", "description"=>"An unknown error occurred.", "code"=>10001, "test_mode_info"=>{"description"=>"images.build.pivotal.io \"9fa6af81-8366-4369-a69d-7f6d0d596688\" already exists", "error_code"=>"CF-HttpError", "backtrace"=>[...]}}
Formatted backtrace:
[
"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:130:in `rescue in handle_exception'",
"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:120:in `handle_exception'",
"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:381:in `create_entity'",
"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:236:in `block (2 levels) in define_entity_methods'",
"/cloud_controller_ng/lib/kubernetes/kpack_client.rb:10:in `create_image'",
"/cloud_controller_ng/lib/cloud_controller/kpack/stager.rb:13:in `stage'",
"/cloud_controller_ng/app/actions/build_create.rb:77:in `create_and_stage'",
"/cloud_controller_ng/app/actions/v2/app_stage.rb:23:in `stage'",
"/cloud_controller_ng/app/controllers/runtime/restages_controller.rb:38:in `restage'",
...
]
The endpoints responsible for droplet upload and download are not functional. At the time of writing, there is no v3 droplet download endpoint.
To reproduce:
cf download-droplet appName --path /tmp/tar.gz
cf push appName --droplet /tmp/tar.gz
One of our (cf-k8s-networking team) goals for GA is for the networking data and configuration planes to operate performantly at a scale of 1,000 routes and 2,000 AIs. To that end we started doing some scaling tests. In doing so we discovered some issues with the capi that we thought we'd bring to your attention.
In a space with 1,000 apps and 1,000 external routes:
cf apps
times out after 60 seconds. The CLI makes a single request to /v2/spaces/<guid>/summary
that eventually times out with an nginx error.
cf v3-apps
hangs seemingly forever. When we use -v we see a constant stream of shorter requests, the ones to /v3/processes/<guid>/stats
seem to take a while but not long enough to cause a timeout.
cf app <appname>
fails after 3 minutes. It attempts /v3/processes/<guid>/stats
3 times and times out each time.
cf delete
works great 😄
cf routes
takes 20 seconds.
Some interesting things we found:
0.1
seconds of sleep between each retry. This might add a little bit since it's not working yet (we think)./v3/processes/<guid>/stats
to time out. Before the Space Summary request these took about ~1 second but seemed to succeed.cc @tcdowney @rosenhouse @ndhanushkodi @rodolfo2488
We deployed cf-for-k8s and performed a scalability tests. We pushed source code based applications and reached till 800 apps. During our performance tests in we could see lot of restarts on capi-kpack watcher deployment. When I stream the logs I could see the following error message
steps: [prepare detect] 2020/04/24 14:26:45 [UpdateFunc] Update to Build: 19ff6d40-dc57-429b-b6f3-a7bf2c4cf77c-build-1-9b4pt status: (v1alpha1.Status) { ObservedGeneration: (int64) 1, Conditions: (v1alpha1.Conditions) (len=1 cap=1) { (v1alpha1.Condition) { Type: (v1alpha1.ConditionType) (len=9) "Succeeded", Status: (v1.ConditionStatus) (len=5) "False", Severity: (v1alpha1.ConditionSeverity) "", LastTransitionTime: (v1alpha1.VolatileTime) { Inner: (v1.Time) 2020-04-24 14:26:45 +0000 UTC }, Reason: (string) "", Message: (string) (len=82) "pods \"19ff6d40-dc57-429b-b6f3-a7bf2c4cf77c-build-1-9b4pt-build-pod\" already exists" } } } steps: [] E0424 14:26:45.169707 1 runtime.go:73] Observed a panic: runtime.boundsError{x:-1, y:0, signed:true, code:0x0} (runtime error: index out of range [-1]) goroutine 20 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x12102e0, 0xc001bb5d00) /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:69 +0x7b k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:51 +0x82 panic(0x12102e0, 0xc001bb5d00) /usr/local/go/src/runtime/panic.go:679 +0x1b2 capi_kpack_watcher/watcher.(*BuildWatcher).handleFailedBuild(0xc0002ce7e0, 0xc00011c280) /capi-kpack-watcher/watcher/build_watcher.go:144 +0x3a7 capi_kpack_watcher/watcher.(*BuildWatcher).UpdateFunc(0xc0002ce7e0, 0x12b1820, 0xc0001c6280, 0x12b1820, 0xc00011c280) /capi-kpack-watcher/watcher/build_watcher.go:48 +0x29d k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:202 k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc00006bdc8, 0x41720e, 0x7f1e5c0bcd30) /go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:605 +0x188 k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc0002fbdd8, 0x0, 0xc00006bde8) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:284 +0x51 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:601 +0x79 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00006bf40) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0002fbf40, 0xdf8475800, 0x0, 0x42d101, 0xc0002ec000) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 k8s.io/client-go/tools/cache.(*processorListener).run(0xc0002bda80) /go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:599 +0x9b k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0002cc4c0, 0xc0002de000) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x59 created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:69 +0x62 panic: runtime error: index out of range [-1] [recovered] panic: runtime error: index out of range [-1] goroutine 20 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:58 +0x105 panic(0x12102e0, 0xc001bb5d00) /usr/local/go/src/runtime/panic.go:679 +0x1b2 capi_kpack_watcher/watcher.(*BuildWatcher).handleFailedBuild(0xc0002ce7e0, 0xc00011c280) /capi-kpack-watcher/watcher/build_watcher.go:144 +0x3a7 capi_kpack_watcher/watcher.(*BuildWatcher).UpdateFunc(0xc0002ce7e0, 0x12b1820, 0xc0001c6280, 0x12b1820, 0xc00011c280) /capi-kpack-watcher/watcher/build_watcher.go:48 +0x29d k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:202 k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc00006bdc8, 0x41720e, 0x7f1e5c0bcd30) /go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:605 +0x188 k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc0002fbdd8, 0x0, 0xc00006bde8) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:284 +0x51 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:601 +0x79 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00006bf40) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0002fbf40, 0xdf8475800, 0x0, 0x42d101, 0xc0002ec000) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 k8s.io/client-go/tools/cache.(*processorListener).run(0xc0002bda80) /go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:599 +0x9b k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0002cc4c0, 0xc0002de000) /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x59 created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:69 +0x62
Steps to reproduce:
Expected:
No strange pods in the staging namespace
Actual:
Leftover pod from staging + leftover build object with "Failed" status
Some buildpacks require environment variables for configuration. But they are not passed to the kpack build
As you can see in the code no environment variables are passed.
Push a java application with $BP_JAVA_VERSION=8."
What was the expected result?
The configuration variable is taken into account.
As part of our efforts to eliminate the internal certificate from cf-for-k8s, we tried to deploy without providing a value for the uaa.serverCerts.secretName property. While everything deployed successfully, we found that we received an error from the cf create-org
command when running our smoke-tests. The logs from the cf-api-server pod showed this stack trace:
Unexpected error while processing request: system lib
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:59:in `add_file'
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:59:in `call'
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:59:in `block (2 levels) in <class:Store>'
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:245:in `add_trust_ca_to_store'
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.8.3/lib/httpclient/ssl_config.rb:236:in `add_trust_ca'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:127:in `http_client'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:116:in `fetch_uaa_issuer'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:111:in `block in uaa_issuer'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:133:in `with_request_error_handling'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:110:in `uaa_issuer'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:85:in `decode_token_with_key'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:66:in `block in decode_token_with_asymmetric_key'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:65:in `each'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:65:in `decode_token_with_asymmetric_key'
/cloud_controller_ng/lib/cloud_controller/uaa/uaa_token_decoder.rb:29:in `decode_token'
/cloud_controller_ng/lib/cloud_controller/security/security_context_configurer.rb:24:in `decode_token'
/cloud_controller_ng/lib/cloud_controller/security/security_context_configurer.rb:10:in `configure'
/cloud_controller_ng/middleware/security_context_setter.rb:12:in `call'
/cloud_controller_ng/middleware/vcap_request_id.rb:15:in `call'
/cloud_controller_ng/middleware/cors.rb:49:in `call_app'
/cloud_controller_ng/middleware/cors.rb:14:in `call'
/cloud_controller_ng/middleware/request_metrics.rb:12:in `call'
/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/builder.rb:244:in `call'
/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'
/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'
/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'
/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'
/usr/local/lib/ruby/gems/2.5.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in spawn_threadpool'
From what we can tell, it looks like the UAA server cert is still actually required by the CC code, even though it is optional in the K8s templates. Given that we are already configuring CAPI with a plain-text http URL to talk to UAA, we would like to also remove the need for the internal certificate in the configuration.
Please feel free to reach out if you have any questions.
Regards,
Dave and @acosta11
cf stacks
should show proper stack descriptions
cf stacks
and see the outputname description
cflinuxfs3 test cflinuxfs3 entry
Expect to see the actual stack description, or nothing if none is available
This might seems a little nitpicky so I apologize, but I suspect someone will raise it sooner than later.
Our networking acceptance tests have been failing since Oct 8th because some cf commands such as cf delete-org
fail and we get the following message:
$ cf delete-org o
Really delete the org o, including its spaces, apps, service instances, routes, private domains and space-scoped service brokers? [yN]: y
Deleting org o as admin...
Job (8da85e84-a9db-4faa-9a9b-6614f75a3239) failed: An unknown error occurred.
FAILED
When I look at the cf-api-worker
logs I see the following error:
{"timestamp":"2020-10-15T20:40:30.017852337Z","message":"Request failed: 500: {\"error_code\"=>\"UnknownError\", \"description\"=>\"An unknown error occurred.\", \"code\"=>10001, \"test_mode_info\"=>{\"description\"=>\**"Package type must be bits\"**, \"error_code\"=>\"CF-RuntimeError\", \"backtrace\"=>[\"/workspace/app/models/runtime/package_model.rb:46:in `bits_image_reference'\" ...
{"timestamp":"2020-10-15T20:40:30.027996351Z","message":"2020-10-15T20:40:30+0000: [Worker(cf-api-worker-7497f88cd7-t6nlz)] Job organization.delete (id=19) (queue=cc-generic) FAILED (0 prior attempts) with RuntimeError: Package type must be bits","log_level":"error","source":"cc-worker","data":{},"thread_id":47038285276660,"fiber_id":47038326581520,"process_id":1,"file":"/layers/paketo-community_bundle-install/gems/gems/delayed_job-4.1.8/lib/delayed/worker.rb","lineno":285,"method":"say"}
{"timestamp":"2020-10-15T20:40:30.028442210Z","message":"2020-10-15T20:40:30+0000: [Worker(cf-api-worker-7497f88cd7-t6nlz)] Job organization.delete (id=19) (queue=cc-generic) FAILED permanently because of 1 consecutive failures","log_level":"error","source":"cc-worker","data":{},"thread_id":47038285276660,"fiber_id":47038326581520,"process_id":1,"file":"/layers/paketo-community_bundle-install/gems/gems/delayed_job-4.1.8/lib/delayed/worker.rb","lineno":285,"method":"say"}
For context, my org o has Docker apps deployed to it. I then tried creating another org test2 with buildpack apps deployed to it and cf delete-org
works:
$ cf delete-org test2
Really delete the org test2, including its spaces, apps, service instances, routes, private domains and space-scoped service brokers? [yN]: y
Deleting org test2 as admin...
OK
Thanks
We did a scalability tests on cf-for-k8s v-0.6.0. We started with one replica of cf-apiserver and started pushing buildpack based apps. After certain number of apps we could see cf-apiserver fails with the following error
Waiting for app to start...
Unexpected Response
Response code: 503
CC code: 0
CC error code:
Request ID: f4f086f1-0ae3-4f07-9f0d-471c6b8028fa::be866d01-d450-4acb-87f1-f504de7e92e0
Description: {
"description": "Instances information unavailable: No running instances",
"error_code": "CF-InstancesUnavailable",
"code": 220002
}
500s' get piled up then if increase the replicas ,it takes sometime to starting working again. You can correlate with the following graph.
Again after certain apps push succeeded, it failed with the same error again. In the same phase we increased replicas till 5 and able to push till 1600 Application instances beyond which scaling doesn't help.
Just for the note: we used the default configurations for the capi deployment.
Today, CF puts all LRPs in the cf-workloads namespace. If we ever wanted to provide any sort of k8s API access to developers or managers, we'll need to have some tenant separation in the k8s API. Namespaces are the mechanism to do that inside of a cluster.
First, Eirini could allow namespace-specific LRP scheduling. Then, capi-k8s-release could add code for managing namespaces via the CF /v3/spaces API.
We tried to use this helm chart together with quarks.
While running a cf push
it turned out that the nginx configuration is not capable to forward requests with sizes which exceeds 1MB.
Therefore it's necessary to add the the client_max_body_size
configuration value.
location / {
client_max_body_size 100m;
access_log /cloud_controller_ng/nginx-access.log;
...
When pushing a Docker Image that does not have a USER
instruction, or specifies root, with cf push app -o <app-image>
, the CLI hangs with the following:
cf push nginx -o nginx
Pushing app nginx to org o / space s as admin...
Staging app and tracing logs...
Waiting for app nginx to start...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
Instances starting...
The behavior repeats until timeout.
cf logs --recent
shows the following:
cf logs nginx --recent
Retrieving logs for app nginx in org o / space s as admin...
2020-10-06T15:44:17.00-0700 [API/0] OUT Creating droplet for app with guid e8cc8497-c414-4233-9cdf-a3e26764501d
2020-10-06T15:44:17.00-0700 [API/0] OUT Updated app with guid e8cc8497-c414-4233-9cdf-a3e26764501d ({:droplet_guid=>"60684061-1dca-47bd-be8d-bfc60ad9dd45"})
2020-10-06T15:44:23.00-0700 [API/0] OUT Process has crashed with type: "web"
2020-10-06T15:44:23.00-0700 [API/0] OUT App instance exited with guid e8cc8497-c414-4233-9cdf-a3e26764501d payload: {"instance"=>"nginx-s-7820289c67-0", "index"=>0, "cell_id"=>"", "reason"=>"CreateContainerConfigError", "exit_description"=>"container has runAsNonRoot and image will run as root", "crash_count"=>0, "crash_timestamp"=>0, "version"=>"a3d1920d-6334-4073-a51c-fc0d02b4d63d"}
I would expect the CLI to provide a message that the container cannot be run, and return.
CC: @paulcwarren
The cf create-service-key
workflow does not exist. The cli returns 500, "An unknown error occurred.".
The server logs make it look like the config key cc_service_key_client_name
is not present in the cloud-controller-ng-yaml
config map. Digging a bit further, the cc_service_key_client_secret
key is also not present (which we believe would be required for this workflow).
The full log message:
{
"timestamp": "2020-11-03T18:16:20.960421254Z",
"message": "Request failed: 500: {\"error_code\"=>\"UnknownError\", \"description\"=>\"An unknown error occurred.\", \"code\"=>10001, \"test_mode_info\"=>{\"description\"=>\"\\\"cc_service_key_client_name\\\" is not a valid config key\", \"error_code\"=>\"CF-InvalidConfigPath\", \"backtrace\"=>[\"/workspace/lib/cloud_controller/config.rb:181:in `invalid_config_path!'\", \"/workspace/lib/cloud_controller/config.rb:125:in `block in valid_config_path?'\", \"/workspace/lib/cloud_controller/config.rb:121:in `each'\", \"/workspace/lib/cloud_controller/config.rb:121:in `valid_config_path?'\", \"/workspace/lib/cloud_controller/config.rb:111:in `get'\", \"/workspace/lib/cloud_controller/dependency_locator.rb:310:in `credhub_client'\", \"/workspace/lib/cloud_controller/dependency_locator.rb:256:in `service_key_credential_object_renderer'\", \"/workspace/lib/cloud_controller/controller_factory.rb:46:in `block in fetch_dependencies'\", \"/workspace/lib/cloud_controller/controller_factory.rb:45:in `map'\", \"/workspace/lib/cloud_controller/controller_factory.rb:45:in `fetch_dependencies'\", \"/workspace/lib/cloud_controller/controller_factory.rb:36:in `dependencies_for_class'\", \"/workspace/lib/cloud_controller/controller_factory.rb:15:in `create_controller'\", \"/workspace/lib/cloud_controller/rest_controller/routes.rb:16:in `block in define_route'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1635:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1635:in `block in compile!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:987:in `block (3 levels) in route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1006:in `route_eval'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:987:in `block (2 levels) in route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1035:in `block in process_route'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1033:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1033:in `process_route'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:985:in `block in route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:984:in `each'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:984:in `route!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1098:in `block in dispatch!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `block in invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1095:in `dispatch!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:919:in `block in call!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `block in invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1072:in `invoke'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:919:in `call!'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:908:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/xss_header.rb:18:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/path_traversal.rb:16:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/json_csrf.rb:26:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/base.rb:50:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/base.rb:50:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-protection-2.0.8.1/lib/rack/protection/frame_options.rb:31:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/null_logger.rb:11:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/head.rb:12:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:194:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:1951:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/urlmap.rb:74:in `block in call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/urlmap.rb:58:in `each'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/urlmap.rb:58:in `call'\", \"/workspace/middleware/request_logs.rb:38:in `call'\", \"/workspace/middleware/security_context_setter.rb:19:in `call'\", \"/workspace/middleware/vcap_request_id.rb:15:in `call'\", \"/workspace/middleware/cors.rb:49:in `call_app'\", \"/workspace/middleware/cors.rb:14:in `call'\", \"/workspace/middleware/request_metrics.rb:12:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/rack-2.2.3/lib/rack/builder.rb:244:in `call'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'\", \"/layers/paketo-buildpacks_bundle-install/gems/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in spawn_threadpool'\"]}}",
"log_level": "error",
"source": "cc.api",
"data": {
"request_guid": "159af437-3a09-4a90-b2b7-c65ba4dae0b8::b47a3efd-10f4-40c4-97f5-6674457ad0e7"
},
"thread_id": 47124571135600,
"fiber_id": 47124564557040,
"process_id": 1,
"file": "/workspace/lib/sinatra/vcap.rb",
"lineno": 45,
"method": "block in registered"
}
cc @emmjohnson
Greetings CAPI Friends! We bring glad tidings from the Release Integration team.
While merging cloudfoundry/cf-for-k8s#63, we noted a few items worth talking through but not so crucial as to block this PR.
0.0.55-bionic
tag as a stopgap; please remove that overlay when you pin to the appropriate digest.config/values.yml
move kpack
key underneath the capi
key.
capi-k8s-release
component owns the kpack
dependency.README.md
.Cheers!
cc: @cloudfoundry/cf-release-integration
In order to transparently use an external identity provider, we would like to set
login:
defaultIdentityProvider: myIdP
Would you accept a PR that exposes this property?
Feature Name: Interacting with OCI Image Registries from cloud_controller_ng
Type: Feature
Author: CAKE team
Related components: cloud_controller_ng, capi-k8s-release
In cf-for-k8s we are now using an OCI image registry to store package uploads instead of a blobstore (see: cloudfoundry/cf-for-k8s#409). Currently only package uploads are supported, and other workflows such as deleting a package and [copying a package](https://v3-apidocs.cloudfoundry.org/version/3.89.0/index.html#copy-a-package] are unsupported.
Our initial goal is to support package deletes, but it would be nice if our solution is general enough to support other operations. At the minimum it should also support copying packages, but we suspect we might need to also support operations on droplets (e.g. deleting a droplet should delete the image in the registry).
The package-image-uploader is a micro-service that uses the go-containerregistry package[1] to interact with the OCI registry. Currently, it’s deployed as a container in the cf-api-server pods and is used for converting an uploaded zip file into a container image that is uploaded to the registry.
We propose that we enhance (and rename) the package-image-uploader to support other operations by adding additional endpoints for deleting, copying, etc.
We propose the following means of deploying this new flavor of the package-image-uploader:
cf-api-server
pod and exposing its port to other podsWe could write our own client for the OCI registry that does what we need to do.
Benefits:
Drawbacks:
We could use a gem like https://github.com/deitch/docker_registry2 to interact with the registry from within the CCNG codebase.
Benefits:
Drawbacks:
We write a Golang utility that leverages the https://github.com/google/go-containerregistry library[1]. Cloud Controller can shell out to this binary when interacting with the registry in places that it would interact with the blobstore.
Benefits:
Drawbacks:
Using cf-for-k8s develop
with capi-k8s-release
commit d84e4bf
Trying to push stratos as a docker app on cf-for-k8s often results in this staging error:
$ cat manifest.yml
applications:
- name: console
memory: 1512M
disk_quota: 1024M
host: console
timeout: 180
docker_image: nwmac/stratos:eirini
health-check-type: port
$ cf push -f manifest.yml
...
Waiting for API to complete processing files...
Staging app and tracing logs...
3 of 4 buildpacks participating
paketo-buildpacks/node-engine 0.1.1
paketo-buildpacks/npm-install 0.2.0
paketo-buildpacks/npm-start 0.0.2
Previous image with name "gcr.io/cf-relint-greengrass/cf-workloads/d8c9059f-1ec5-4598-a2b2-9ed4ce4d1237" not found
StagerError - Stager error: Kpack build failed during container execution: Step failure reason: 'Error', message: ''.
FAILED
https://capi.ci.cf-app.com/teams/main/pipelines/capi/jobs/samus-cf-for-k8s/builds/889
grepping the logs of the controller for the image guid that got stuck, we see:
2020-07-24T18:16:15.327Z DEBUG controllers.Build Build create event received {"requestLink": "/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"} │
│ 2020-07-24T18:16:15.714Z DEBUG controllers.Build Build update event received {"requestLink": "/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"} │
│ 2020-07-24T18:16:15.998Z DEBUG controllers.Build Build update event received {"requestLink": "/apis/build.pivotal.io/v1alpha1/namespaces/cf-workloads-staging/builds/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"} │
│ 2020-07-24T18:16:15.998Z DEBUG controllers.Build Build is not complete, took no action {"buildName": "cf-workloads-staging/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb", "cloudfoundry.org/build_guid": "edcfc220-872e- │
│ 423c-866f-d53e665906af", "status": {"observedGeneration":1,"conditions":[{"type":"Succeeded","status":"False","lastTransitionTime":"2020-07-24T18:16:15Z","message":"pods \"8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb-build-pod\" │
│ already exists"}],"stack":{},"podName":"8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb-build-pod"}} │
│ 2020-07-24T18:16:15.998Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "build", "request": "cf-workloads-staging/8c5af030-d3ea-4455-9b4e-c95ef93a1fea-build-1-pbvzb"}
This indicates that the build controller sometimes fails to detect builds that Failed
due to reasons that aren't listed in their container states. In this case, it seems like we sometimes observe a transient state involving a duplicate build pod. The BARA logs show this build succeeding, so this would probably work out fine if we logged the error and requeued the update event.
Is your feature request related to a problem? Please describe.
We noticed that currently, the skip_cert_verify
property is hardcoded to true. See https://github.com/cloudfoundry/cf-for-k8s/blob/eb0e1b1e39900870d54dc3f1d47cf08049cf64fc/config/capi/_ytt_lib/capi-k8s-release/templates/ccng-config.lib.yml#L287. Our component would like to consume this property to toggle ssl validation.
Describe the solution you'd like
This property would be exposed and configurable to operators. This could either be through CCNG values or some kind of top-level/global configured property in the larger cf-for-k8s context, ie #@ data.values.ssl.skip_cert_verify
Thanks,
@belinda-liu && @weymanf
We had to revert cloudfoundry/cf-for-k8s#253
The problem is that there was no problem with most of the pipeline, which uses kind. But some of the pipeline tests, and standard manual testing, use GKE, and this code was running into permission problems. In a nutshell, the kpack builder image can be stored in the gcr registry as a public object, but it can't be retrieved, and we get a failure in the cf-workloads-staging pod. See https://www.pivotaltracker.com/story/show/173470211/comments/215536752 for sample output
Original issue on cf-for-k8s: cloudfoundry/cf-for-k8s#287
Based on associated slack threads (https://cloudfoundry.slack.com/archives/CH9LF6V1P/p1594916021423300 and https://cloudfoundry.slack.com/archives/CH9LF6V1P/p1594928317432200), CC's behavior in cf-for-k8s seems to be the cause, so I'm opening an issue on the capi-k8s-release repo for the CAPI team to track.
We performed a tests with new routecontroller implementation in cf-for-k8s by pushing 10 apps concurrently. During the course of tests we observed the following error from cf-apiserver.
Failed to create/update/delete Route resource with guid 'f4cc6bd0-4242-4fa1-bc88-57ea8814049c' on Kubernetes\", \"error_code\"=>\"CF-KubernetesRouteResourceError
{"timestamp":1592899641.142837,"message":"Failed to Update Route CRD: HTTP status code 409, Operation cannot be fulfilled on routes.networking.cloudfoundry.org \"c0e42e0d-e6cc-4cce-b008-b8e976be6dea\": the object has been modified; please apply your changes to the latest version and try again for PUT https://kubernetes.default/apis/networking.cloudfoundry.org/v1alpha1/namespaces/cf-workloads/routes/c0e42e0d-e6cc-4cce-b008-b8e976be6dea","log_level":"info","source":"cc.action.route_update","data":{"request_guid":"257c79aa-76f1-4a88-a3a7-7f91c6fdc1f2::8e837f91-8cb2-45ed-ab06-54c40e3221a0"},"thread_id":47384772400140,"fiber_id":47384772089500,"process_id":1,"file":"/cloud_controller_ng/lib/kubernetes/route_crd_client.rb","lineno":53,"method":"rescue in update_destinations"} {"timestamp":1592899641.1436174,"message":"Request failed: 422: {\"description\"=>\"Failed to create/update/delete Route resource with guid 'c0e42e0d-e6cc-4cce-b008-b8e976be6dea' on Kubernetes\", \"error_code\"=>\"CF-KubernetesRouteResourceError\", \"code\"=>400001, \"test_mode_info\"=>{\"description\"=>\"Failed to create/update/delete Route resource with guid 'c0e42e0d-e6cc-4cce-b008-b8e976be6dea' on Kubernetes\", \"error_code\"=>\"CF-KubernetesRouteResourceError\", \"backtrace\"=>[\"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:130:in `rescue in handle_exception'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:120:in `handle_exception'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:391:in `update_entity'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/kubeclient-4.5.0/lib/kubeclient/common.rb:240:in `block (2 levels) in define_entity_methods'\", \"/cloud_controller_ng/lib/kubernetes/route_crd_client.rb:50:in `update_destinations'\", \"/cloud_controller_ng/app/actions/v2/route_mapping_create.rb:52:in `add'\", \"/cloud_controller_ng/app/controllers/runtime/routes_controller.rb:262:in `add_app'\", \"/cloud_controller_ng/app/controllers/base/base_controller.rb:84:in `dispatch'\", \"/cloud_controller_ng/lib/cloud_controller/rest_controller/routes.rb:16:in `block in define_route'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1634:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1634:in `block in compile!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:992:in `block (3 levels) in route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1011:in `route_eval'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:992:in `block (2 levels) in route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1040:in `block in process_route'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1038:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1038:in `process_route'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:990:in `block in route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:989:in `each'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:989:in `route!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1097:in `block in dispatch!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `block in invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1094:in `dispatch!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:924:in `block in call!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `block in invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1076:in `invoke'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:924:in `call!'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:913:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/xss_header.rb:18:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/path_traversal.rb:16:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/json_csrf.rb:26:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/base.rb:50:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/base.rb:50:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-protection-2.0.5/lib/rack/protection/frame_options.rb:31:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/null_logger.rb:11:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/head.rb:12:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:194:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/sinatra-2.0.5/lib/sinatra/base.rb:1957:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:74:in `block in call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `each'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/urlmap.rb:58:in `call'\", \"/cloud_controller_ng/middleware/request_logs.rb:38:in `call'\", \"/cloud_controller_ng/middleware/security_context_setter.rb:19:in `call'\", \"/cloud_controller_ng/middleware/vcap_request_id.rb:15:in `call'\", \"/cloud_controller_ng/middleware/cors.rb:49:in `call_app'\", \"/cloud_controller_ng/middleware/cors.rb:14:in `call'\", \"/cloud_controller_ng/middleware/request_metrics.rb:12:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/rack-2.2.2/lib/rack/builder.rb:244:in `call'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:86:in `block in pre_process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `catch'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:84:in `pre_process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/thin-1.7.2/lib/thin/connection.rb:50:in `block in process'\", \"/usr/local/lib/ruby/gems/2.5.0/gems/eventmachine-1.0.9.1/lib/eventmachine.rb:1067:in `block in spawn_threadpool'\"]}}","log_level":"info","source":"cc.api","data":{"request_guid":"257c79aa-76f1-4a88-a3a7-7f91c6fdc1f2::8e837f91-8cb2-45ed-ab06-54c40e3221a0"},"thread_id":47384772400140,"fiber_id":47384772089500,"process_id":1,"file":"/cloud_controller_ng/lib/sinatra/vcap.rb","lineno":44,"method":"block in registered"}
I expect cf-push build pack to work with dockerhub.com registry since it is a publicly trusted registry.
cf-push fails when using dockerhub.com as a cf-app registry.
The configuration works for gcr.io but fails for dockerhub.com. See the following thread on details, steps, and possible solutions.
Since upgrading Istio in cf-for-k8s to 1.7 and closing cloudfoundry/cf-for-k8s#189 app containers should not start while the sidecar container is initializing, so there is no need for having custom code for that.
Also, with the current plan of supporting Istio and Contour without Istio ccdb-migrate should not rely at all that sidecar will be injected.
The BARA "setting_process_commands manifest and Procfile/detected buildpack command interactions prioritizes the manifest command over the Procfile and can be reset via the API" fails when it checks that the process command has been set correctly.
It returns just the first argument in the command, in this case, "bundle" instead of "bundle exec rackup..." For multi-argument commands it should return the entire command.
The app works fine, so we're reasonably sure the problem is a display problem and the underlying process command is set correctly.
dora
app with a Procfile
that defines some processes. I used the following:web: bundle exec rackup config.ru -p $PORT
worker: sleep 10000
foo: bundle exec rackup config.ru -p 8080
cf push
the appcf curl /v3/droplets/<guid>
You will see in the process_types
field that only the first part of the start command was captured.
Example output:
{
"guid": "5efc4402-ca92-4387-b0d0-91a6a21d2daf",
"created_at": "2020-10-21T00:05:22Z",
"updated_at": "2020-10-21T00:05:59Z",
"state": "STAGED",
"error": null,
"lifecycle": {
"type": "kpack",
"data": {}
},
"checksum": null,
"buildpacks": null,
"stack": null,
"image": "gcr.io/cf-capi-arya/cf-workloads/8eb4ed9a-3804-4a66-90b6-6db0fade796c@sha256:fe502f3bbc3efcd0bf3ae647c4c733e127f2dd3625c2aaabc07f1c13b5e6d25f",
"execution_metadata": null,
"process_types": {
"foo": "bundle",
"web": "bundle",
"worker": "sleep"
},
"relationships": {
"app": {
"data": {
"guid": "8eb4ed9a-3804-4a66-90b6-6db0fade796c"
}
}
},
"metadata": {
"labels": {},
"annotations": {}
},
"links": {
"self": {
"href": "https://api.tim.k8s.capi.land/v3/droplets/5efc4402-ca92-4387-b0d0-91a6a21d2daf"
},
"app": {
"href": "https://api.tim.k8s.capi.land/v3/apps/8eb4ed9a-3804-4a66-90b6-6db0fade796c"
},
"assign_current_droplet": {
"href": "https://api.tim.k8s.capi.land/v3/apps/8eb4ed9a-3804-4a66-90b6-6db0fade796c/relationships/current_droplet",
"method": "PATCH"
},
"package": {
"href": "https://api.tim.k8s.capi.land/v3/packages/e0bf1462-ba4b-48cb-aa78-9ac2f91900da"
}
}
}
Assuming everything is all wired up correctly, this will result in apps failing to start since they only get the first piece of the start command. So for the web
process it just runs bundle
and installs its gems again! 🤯
k -n cf-workloads logs multi-dora-proc-s-622cecd9c7-0 -c opi -f
Using bundler 2.1.4
Using diff-lcs 1.4.2
Using json 2.3.0
Using ruby2_keywords 0.0.2
Using mustermann 1.1.1
Using rack 2.2.3
Using rack-protection 2.0.8.1
Using rack-test 1.1.0
Using rspec-support 3.9.3
Using rspec-core 3.9.2
Using rspec-expectations 3.9.2
Using rspec-mocks 3.9.1
Using rspec 3.9.0
Using tilt 2.0.10
Using sinatra 2.0.8.1
Updating files in vendor/cache
Bundle complete! 4 Gemfile dependencies, 15 gems now installed.
Bundled gems are installed into `/layers/paketo-buildpacks_bundle-install/gems`
It looks like we're only copying over the Command
for a Process
and ignoring the Args
.
Reported by cf-for-k8s team cc:@jamespollard8
Install is now failing frequently on AKS (2/3 attempts so far)
see https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-for-k8s/jobs/validate-azure/builds/14 and https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-for-k8s/jobs/validate-azure/builds/16
8:45:54PM: ---- waiting on 6 changes [239/245 done] ----
8:45:58PM: ongoing: reconcile deployment/capi-api-server (apps/v1) namespace: cf-system
8:45:58PM: ^ Waiting for 1 unavailable replicas
8:45:58PM: L ok: waiting on replicaset/capi-api-server-5cc896b848 (apps/v1) namespace: cf-system
8:45:58PM: L ok: waiting on podmetrics/capi-api-server-5cc896b848-p25vl (metrics.k8s.io/v1beta1) namespace: cf-system
8:45:58PM: L ongoing: waiting on pod/capi-api-server-5cc896b848-p25vl (v1) namespace: cf-system
8:45:58PM: ^ Condition Ready is not True (False)
8:45:58PM: L ok: waiting on pod/capi-api-server-5cc896b848-mz7lt (v1) namespace: cf-system
8:45:58PM: fail: reconcile job/ccdb-migrate (batch/v1) namespace: cf-system
kapp: Error: waiting on reconcile job/ccdb-migrate (batch/v1) namespace: cf-system: finished unsuccessfully (Failed with reason BackoffLimitExceeded: Job has reached the specified backoff limit)
Background: The cc-db migration job has to wait until the database (postgres) is ready. As a workaround to this issue we added a retry policy which tries 3 times to run the migrations. Looks like it is failing on AKS so we should either increase the number of retries or have a more robust policy.
Related story: https://www.pivotaltracker.com/story/show/172414069
When STRICT
mode is enabled for the mTLS Istio mesh policy all traffic between Pods covered by the policy must use mTLS. This means that the Istio sidecars (istio-proxy
Envoy container) must be ready before Pods can successfully communicate with each other over the network. See the Istio Security docs for much information all of this.
The issue with Cloud Controller right now is that it is using an init container to run database migrations. This init container runs before the regular istio-proxy
container on the Pod is ready so there is no client sidecar in place to mediate the mTLS connection with the sidecar running on the database Pod1.
Currently there is no workaround on the Istio side. Their stance2 is:
There is no easy solution. Your init container will run before the sidecar starts. If your container runs before Istio’s init container it will not be secure. If your container runs after Istio’s it will not have network access.
If you can avoid doing network I/O in your init containers you should. If you must use init containers that expect connectivity, you’ll need a work-around.
Can Cloud Controller run these migrations in some other way? Maybe in a Job?
Thanks!
Tim && @ndhanushkodi
1 this actually probably doesn't matter for off cluster databases 🤔💭
2 https://discuss.istio.io/t/k8s-istio-sidecar-injection-with-other-init-containers/845
As a followup for the scalability tests we performed a max concurrent Push test.
During the tests we observed a strange behaviour when we perform concurrent pushes(20) we see huge latency between staging pod completion time in cf-workloads-staging namespace and creation of statefulset in cf-workloads namespace.
We already ensured that cf-api-server has enough resources to handle the load.
Below graph shows latency with X-Axis having Number of Apps deployed vs Y-Axis Latency in seconds(Tme difference between staging pod completed time and Stateful set creation time ).
Steps to reproduce the behavior:
guid=$(kubectl get pods -n cf-workloads-staging $app -o json | jq '.metadata.labels | to_entries | .[] | select(.key=="cloudfoundry.org/app_guid") | .value ' | cut -d'"' -f 2)
stage_completion_time=$(kubectl get pods -n cf-workloads-staging $app -o json | jq '.status.conditions[0].lastTransitionTime' | cut -d'"' -f 2)
sts_creation_time=$(kubectl get sts -n cf-workloads -l cloudfoundry.org/app_guid=$guid -o json | jq '.items[0].metadata.creationTimestamp' | cut -d'"' -f 2)
Let us know if the issue is from Eirini, we can open the issue over there.
Feature Name: Scheduling workloads across multiple k8s Namespaces
Type: feature
Author: Connor Braa
Related components: cloud_controller_ng, capi-k8s-release, eirini
We plan to introduce a /v3/placements
API. A single placement
resource describe a namespace or namespaces where CF API workloads may be scheduled. Upon introduction, there will be an initial default placement for existing workloads. /v3/placements
will likely be persisted in the k8s API as a Placement
CRD. The placements API will provide future flexibility so that we may eventually support:
/v3/space
kubernetes Namespace
creation and deletionCurrently, the CF API schedules all of its runtime workloads into a 1-per-installation "cf-workloads" namespace via Eirini's OPI LRP
API and the cf-k8s-networking Route
CRD. All build-time workloads are scheduled into a different, 1-per-installation "cf-workloads-staging" namespace via Kpack's Image
CRD.
Recently, the Eirini team has added the capability to schedule LRP
workloads across different namespaces. This is available both in OPI and k8s-natively in the new, as-of-yet unused LRP
CRD.
cf-for-k8s' first deployers and tire kickers have made explicit requests that apps don't all share one global k8s "cf-workloads" namespace. That global namespace is a bit of a security threat and could present a significant hurdle in the future when we want to give users more k8s API access.
Some amount of initial thought has been given to how the CF API might benefit users that need to schedule workloads across k8s clusters. I'd like to propose that we use namespaces as a way of exploring API modeling around workload placement broadly with the hopes that we'll later extend these same API resources to work across k8s clusters.
At a high level, I'd like to propose that we solve this problem through flexible, configurable data modeling.
To give platform engineers the ability to place workloads in different Namespaces, one approach is to begin by creating a representation of the current state of workload placements. What follows are example objects that would represent the current way Apps, Routes, and Kpack Images are spread across Namespaces in cf-for-k8s.
resources: [
{
"name": "existing-cf-for-k8s-placement",
"guid": "placement-x",
"targets": [
{
"namespace": "cf-workloads"
}
],
"global_runtime_default": true,
"global_staging_default": false,
"metadata": { "labels": {}, "annotations": {}}
},
{
"name": "existing-cf-for-k8s-staging-placement",
"guid": "placement-y",
"targets": [
{
"namespace": "cf-workloads-staging"
}
],
"global_default": false,
"global_default_staging": true,
"metadata": { "labels": {}, "annotations": {}}
}
]
And the associated CRs that platform engineers could configure at deploy-time:
---
apiVersion: apps.cloudfoundry.org/v1alpha1
kind: Placement
metadata:
labels:
apps.cloudfoundry.org/placement_guid: placement-x
annotations: {}
name: existing-cf-for-k8s-placement,
namespace: cf-system
spec:
targets:
- namespace: cf-workloads
global_runtime_default: true
global_default_staging: false
---
apiVersion: apps.cloudfoundry.org/v1alpha1
kind: Placement
metadata:
labels:
apps.cloudfoundry.org/placement_guid: placement-y
annotations: {}
name: "existing-cf-for-k8s-placement",
namespace: cf-system
spec:
targets:
- namespace: cf-workloads
global_runtime_default: false
global_default_staging: true
A single placement describes a sort of multiplexer for CF Apps. In the initial, single-target case this multiplexer passes its inputs straight through to the underlying namespace. In the future, we'd want to support multi-target placements where the multiplexer would create copies of compute and networking resources across its targets. A single-process, routable web app in a space with a 2-target placement would have 2 "web" processes that map to two StatefulSets across 2 namespaces. Likewise, the app's associated Route would need to span those two namespaces or have copies in both.
Once these API resources exist, we can add features to them. Examples that fit pretty naturally into this modeling include:
/v3/space
cf-system
NamespaceIt's possible that modeling namespaces as part of (multicluster) workload placement is putting the cart before the horse and coupling together two concepts that don't necessarily need to be related.
Features that might be a bit hairier to build with this modeling include:
cf scale appName -i 5
their app and understand where those instances will be placedPlacements are not entirely necessary to get users' apps into separate namespaces. Instead, we could have the CF API and associated controllers create a Namespace for each new space and immediately start scheduling new app workloads into them.
Rather than migrating or modeling placements independently of existing tenancy constructs, spaces could have a "namespace" field. This field would be seeded to "cf-workloads" for existing spaces, configurable upon space creation, and dynamically allocated when omitted.
This option offers similar configurability as placements and backwards compatibility with the existing structure, but is flexible for future enhancements.
To move them without user intervention, we'd need to find a way to safely move StatefulSets and Routes across Kubernetes namespaces during a cf-for-k8s deployment or the upgrade would incur app downtime or even breakage.
This approach creates a situation similar to isolation segments, and would perhaps alleviate the need for dynamic namespace allocation. There's a similar solution space to be explored about migrating or not migrating existing apps.
TBD
Would be nice to have quick-start instructions for using capi-k8s-release with a real cluster instead of a minikube.
It's not immediately obvious for new users how to use our chart.
Raising by request of @selzoc
As it is currently implemented, the kpack integration uses a naming strategy whereby multiple push
es of a single application will each result in unique images being created in the image repository, rather than creating separate tags of the same image.
For example, pushing an app twice might currently create:
library/78f0dcd6-2d8d-464e-8094-e95926dabac3:b1.20200316.233450
library/2ce2458d-0ef0-43f0-929a-b9dd2c106e54:b1.20200316.232455
instead of:
library/2ce2458d-0ef0-43f0-929a-b9dd2c106e54:b1.20200316.232455
library/2ce2458d-0ef0-43f0-929a-b9dd2c106e54:b2.20200316.233450
The strategy taken here should be chosen very deliberately, as there could be consequences related to broader tooling, for example:
This is not to say the current approach is wrong, but that it should be carefully weighed against alternatives.
In normal app developer operation, it's possible for a kpack build to fail to create a Build pod due to a lack of resources. It's also possible for builds to fail without having container status failures. We've done some work to get those errors correctly propogated through cf-api-controllers to the API, but it's possible for users to miss them.
Ideally, it'd be great to see any build error in the output of cf logs appName
.
The number of DB connections is limited to 500 because of which can't able to scale capi-api-server.
Default max connections is set to 25 here
I tried to scale capi-apiserver to 20 replicas and got the following error in pre-start init container
Caused by: PG::ConnectionBad: FATAL: remaining connection slots are reserved for non-replication superuser connections
Next tried by changing max_connections to 100 and were able to scale only 5 replicas of capi-api-server.
And other thing what observed is , even if one replica of api-server is down , API is not accessible. It gives 500
status code
Error Code: 500
Raw Response: {}
FAILED
Feature Name: Supporting service bindings used during staging in cf-for-k8s
Type: feature
Author: CAPI team
Related components: cloud_controller_ng, capi-k8s-release
We create a Service Binding Controller to manage Service Bindings.
We currently provide service bindings to apps at build-time via the VCAP_SERVICES
environment variable. This mechanism is not supported by the Paketo Cloud Native Buildpacks.
We need to continue to support build-time service bindings so that the platform can continue to automatically configure APM integrations on behalf of apps. This will require providing the service binding to kpack as a Kubernetes Service Binding. This type of service binding takes the form of a structured directory, mounted to the build container.
When the service binding /create endpoint is hit, CAPI will create a Service Binding resource. The controller will call back to CAPI with the state of that service binding whenever it changes. When CAPI will include a reference to that resource when it requests new builds from kpack, and when it requests long-running processes (LRPs) from Eirini.
This may require some work from Eirini to actually mount the secret to the running image.
This adds another codebase and component that will need to be managed.
There's currently no endpoint for updating a service binding. We don't know why.
We could imperatively create service bindings. In that case CAPI would need to monitor and manage the status of the service binding resources and synchronize its data state according.
We're not sure what the failure modes are for service binding creation/management, and probably need to investigate that further.
As of this writing (August 18 2020) kpack does not support the Kubernetes service binding spec, it supports the CNB service binding spec. We expect it to support the Kubernetes spec soon, since the CNB spec is already deprecated.
Placeholder tracker stories
[PLACEHOLDER] mount service bindings to runtime container
[PLACEHOLDER] convert VCAP_SERVICES to kpack bindings at build time
We've noticed this attempting to deploy postfacto on cf-for-k8s v0.6.0.
Steps to reproduce:
package/cf/manifest.yml
to remove the buildpacks and services listsruby '2.6.3'
from `package/assets/Gemfilecd package/cf
./deploy.sh app-name
Staging completes successfully, but the app fails to start. Logs on the app container show /bin/sh: 1: bundle: not found
.
If you remove command
from the package/cf/manifest.yml
, and instead create package/assets/Procfile
with contents web: bundle exec rake db:migrate && bundle exec rails s -p \$PORT -e development
, the app will start (although that also requires configuring the app to use mysql and redis correctly. Without that, at least you'll see bundle is found).
Slack thread: https://cloudfoundry.slack.com/archives/CH9LF6V1P/p1600788157043700
cc: @cloudfoundry/eirini
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.