Comments (14)
Perfect @turkenh! Let's close down this issue then in favor of #63. Thank you for your support and thank you to @idallaserra for his keep troubleshooting as well! 💪
from provider-helm.
Further details about verifying that the connection secret to the GKE cluster seems OK. See the secret referenced by the ProviderConfig
:
> k get providerconfig.helm.crossplane.io multik8s-cluster-gcp -o yaml
...
spec:
credentials:
secretRef:
key: kubeconfig
name: 2b80fbbb-076b-4908-9034-71db73c3cc86-gkecluster
namespace: upbound-system
source: Secret
Extracting the data.kubeconfig
field from that secret into a file then using the file to run kubectl get node
worked OK:
> kubectl -n upbound-system get secret 2b80fbbb-076b-4908-9034-71db73c3cc86-gkecluster -o jsonpath='{.data.kubeconfig}' | base64 -d > kubeconfig-gke
> k --kubeconfig=kubeconfig-gke get node
NAME STATUS ROLES AGE VERSION
gke-cluster-gcp-tkbx-cluster-gcp-s59h-366d1ae5-9nxr Ready <none> 64m v1.16.15-gke.7300
gke-cluster-gcp-tkbx-cluster-gcp-s59h-894e22ae-p5c9 Ready <none> 65m v1.16.15-gke.7300
gke-cluster-gcp-tkbx-cluster-gcp-s59h-f5400ebc-zp8n Ready <none> 65m v1.16.15-gke.7300
from provider-helm.
It should keep retrying with the default shortWait which is 30 sec. Every time reconcile (at connect step here) failed it is being re-added to queue.
Do you see anything in provider helm controller logs? Just wondering if controller stopped working somehow?
from provider-helm.
I reproduced this issue again just now and I captured the full pod logs in the gist below. A lot of messages for this, but not sure if that's an issue:
E1231 19:10:49.562709 1 memcache.go:206] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Note that the provider-helm pod has no restarts and appears healthy (full pod details in gist)
> kcs get pod crossplane-provider-helm-4b3c12d3669a-7c46b69d8-mzdww
NAME READY STATUS RESTARTS AGE
crossplane-provider-helm-4b3c12d3669a-7c46b69d8-mzdww 1/1 Running 0 16m
full gist: https://gist.github.com/jbw976/2d13e79147c2aae564c61a827389de57
Anything else I can look into? or get you access to?
from provider-helm.
Could this be a network policy or something that is blocking provider-helm in the hosted crossplane instance in Upbound Cloud from being able to reach the GKE cluster? When I successfully used the kubeconfig from the connection secret, I was doing that from my own laptop.
from provider-helm.
Could this be a network policy or something that is blocking provider-helm in the hosted crossplane instance in Upbound Cloud from being able to reach the GKE cluster? When I successfully used the kubeconfig from the connection secret, I was doing that from my own laptop.
We don't have egress policy that might block such kind of a traffic in Upbound Cloud, so, I don't think so.
Couple of things that I am wondering:
- Is it 100% reproducible?
- Does the controller stop reconciling on CR, or, keeps retrying but getting some sort of error and somehow this is not reflected to CR status. Controller logs could help here (thanks for the logs above @jbw976, but I think we would need
--debug
flag in the controller to better understand what is going on) - What happens if we restart the controller?
Otherwise, it would be helpful if I could get access to the environment next time for debugging further.
from provider-helm.
Hmm, interesting point in the logs you shared:
2020/12/31 19:10:47 info: skipping unknown hook: "crd-install"
which indicates that, installation of helm started at 2020/12/31 19:10:47
, so, helm controller seems to get connected to remote cluster, but probably didn't get completed (yet?) because of some reason. (AFAIR, that chart contains a pre-install helm hook which might be related). So, I would definitely check what is going on on the cluster side, for example:
helm ls --all --all-namespaces
kubectl get jobs -n operators
kubectl get pods -n operators
kubectl get all -n operators
...
from provider-helm.
Dear, I am experiencing exactly the same problem described by @jbw976 .
But in my opinion the problem is not related to the helm gcp provider.
In my case the issue is due to the fact that I am testing the gcp part of platform-ref-multi-k8s and the GKE cluster created by this crossplane repo is Private!
So only some image in cache is working. In fact trying to install with plain helm, no provider helm, some example chart give me pull error. Fixing this using cloud nat and now everything works:
from provider-helm.
Super cool that you got it working for you @idallaserra! Can you explain a little more detail about what "using cloud nat" means? Any docs you could point me to that would help me understand the details?
It seemed to me like 2 different problems though, where my provider-helm couldn't even connect to the newly provisioned GKE cluster at all, even with my changes to enable basic auth in the GKE cluster in upbound/platform-ref-multi-k8s#6 that are now published in v0.0.4 at https://cloud.upbound.io/registry/upbound/platform-ref-multi-k8s.
For you though, your provider-helm seemed to be able to connect to the GKE cluster, but then container pulls from inside the GKE cluster were not working. That felt like a different issue to me, and I'd love to hear more about the "cloud nat" you used to get that working.
@turkenh I'll try to get more debugging details (and get you access to the repro instance) the next time I repro this. Thanks for your help so far! 🙇♂️
edit: ah, now i'm seeing that @turkenh mentioned the evidence for my provider-helm connecting to the cluster OK but just not finishing the job. Perhaps @idallaserra has it figured out with the "cloud nat" 💪
from provider-helm.
Dear for the cloud nat the problem is well explained here:
https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#docker_hub
in "Can't pull image from public Docker Hub".
Shortly the created cluster is Private, so the node has no internet connectivity, meaning no docker pull.
Here a document, old but still valid, explaing how to do.
The best in my opinion would be the Cloud Nat support in provider-gcp (still lacking it seems to me) to permit the creation directly in Crossplane.
Hope to be useful :-)
Ivan
from provider-helm.
Just reproduced the issue and here is my findings:
- Helm Release status is same as the one given in the beginning of this issue.
- On the created GKECluster, I see:
- operators namespace created by provider-helm
- helm release is stuck as
pending-install
:
helm ls -a NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION vault1-cluster-p7pfn-mkbc6 operators 1 2021-01-19 14:08:47.1636178 +0000 UTC pending-install kube-prometheus-stack-10.1.0 0.42.1
- pre-install job in prometheus chart created and its pod stuck as
ImagePullBackOff
with the following events:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned operators/vault1-cluster-p7pfn-mkbc6-admission-create-8m2qr to gke-vault1-cluster-p-vault1-cluster-p-d7ca7c52-3465 Normal Pulling 8m36s (x4 over 11m) kubelet Pulling image "jettech/kube-webhook-certgen:v1.2.1" Warning Failed 8m21s (x4 over 10m) kubelet Failed to pull image "jettech/kube-webhook-certgen:v1.2.1": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Warning Failed 8m21s (x4 over 10m) kubelet Error: ErrImagePull Normal BackOff 5m57s (x14 over 10m) kubelet Back-off pulling image "jettech/kube-webhook-certgen:v1.2.1" Warning Failed 63s (x34 over 10m) kubelet Error: ImagePullBackOff
- I can pull that docker image from my local
- Tried to run two pods with public docker images, one started running, the other failed with ImagePullBackoff
I think, this confirms @idallaserra`s observations. This stackoverflow describe the same issue and has a well-written answer.
@jbw976, wondering if there is any special reason to enable private cluster nodes in gke resource spec here.
from provider-helm.
And, after waiting loong enough, helm release reports following status back :)
status:
atProvider:
releaseDescription: 'Release "vault1-cluster-p7pfn-mkbc6" failed: failed pre-install:
failed to deploy vault1-cluster-p7pfn-mkbc6-admission-create'
revision: 1
state: failed
conditions:
- lastTransitionTime: "2021-01-19T14:08:32Z"
reason: ReconcileSuccess
status: "True"
type: Synced
- lastTransitionTime: "2021-01-19T14:08:17Z"
reason: Unavailable
status: "False"
type: Ready
failed: 1
synced: true
from provider-helm.
@jbw976, wondering if there is any special reason to enable private cluster nodes in gke resource spec here.
No I don't think there is @turkenh. Most likely I was following another example and didn't realize the impact of that setting 😨
I'll get this patched in the reference platforms that use GKE (GCP and multi-k8s).
Is there anything to fix here in terms of provider-helm reporting back status earlier, so that people in the future in similar scenarios don't get tricked by the red herring of "connection refused"? That was showing to me as the main/obvious thing that was wrong on the Release
object when I first investigated and got lead down the wrong trail.
Can that status behavior be improved? 🙏
from provider-helm.
Is there anything to fix here in terms of provider-helm reporting back status earlier, so that people in the future in similar scenarios don't get tricked by the red herring of "connection refused"? That was showing to me as the main/obvious thing that was wrong on the
Release
object when I first investigated and got lead down the wrong trail.Can that status behavior be improved? 🙏
Yes, once we have implemented this, we will be able to report early / better instead of getting blocked until helm client returns.
from provider-helm.
Related Issues (20)
- Ability to store secrets in vault
- On Helm release with the specified option pullSecretRef does not find the Secret in the namespace
- Support aws s3 bucket as a helm repository HOT 8
- Establish Ownership and Visibility of Helm Resources in ArgoCD via OwnerReferences HOT 2
- Passing args to configure the Helm provider package to configure the controller and be more verbose HOT 2
- make: *** No rule to make target `local.up', needed by `local-dev'. Stop. HOT 3
- Retry should be enabled by default HOT 1
- Add Proxy Support
- howto debugging reconciling loop HOT 1
- Installation of provider-helm fails with "resource name may not be empty" in air-gapped environment. HOT 1
- Unable to pull Charts from Private Registry HOT 3
- Add x-kubernetes-map-type: granular to release values to support serve side apply
- identity section for AzureAD auth in ProviderConfig should not be processed when the supplied kubeconfig does not require AzureAD auth
- Enable support for Management Policies in Helm Provider HOT 1
- What is the purpose of putting an `*` as an option in an enum?? HOT 2
- Drop v1alpha1 or add conversion webhooks HOT 13
- Release Ready status should be aggregation of readines of object deployed within a release
- Observe data from resources created by Release object
- ProviderConfig does not support "InjectedIdentity" as `source.identity`
- Rollback feature doesn't work as expected.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from provider-helm.