cloudhealth / helm Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
checking CLA Bot.
We really need a changelog for the cloudhealth-collector
helm chart to have a better understanding of what new releases bring.
Could we add the the curl command to the cloudhealth container image?
While troubleshooting a recent deployment of cloudhealth-collector-2.0.2
, I attempted to first validate my apiToken from my local system, and then replicate from the cloudhealth pod.
These steps (Validate Collector Agent Connectivity; pasted below for more convenient reference), instruct users to use the curl
tool to test the cloudhealthtech API. However, the curl command is not available in the cloudhealth container image.
The related support case has been open for nearly a month. Any additional attention you can share to help resolve this is greatly appreciated.
Validate Collector Agent Connectivity
To validate collector agent connectivity to our own collection endpoint, run the following commands:
Use nmap or netcat to ping port 443
nmap -p 443 api.cloudhealthtech.com
nc -zv api.cloudhealthtech.com 443
Run CURL commands against the collection endpoint manually:
curl -v -X GET https://containers-api.edge.cloudhealthtech.com/api/v1/health to request the collection health endpoint.
The expected response: {"status":"healthy","time":"Fri, 29 Jan 2021 22:48:10 GMT"}
curl --header "Content-Type: application/json" --request POST --data '{"auth_token":"INSERT_AUTHENTICATION_TOKEN_HERE","cluster_id":"INSERT_CLUSTER_ID_HERE"}' https://containers-api.edge.cloudhealthtech.com/v1/containers/kubernetes/state to mock the exact request made by the collector agent (except without any k8s data cache payload). Replace the auth_token and the cluster_id as necessary.
The expected response (since we sent no payload): {"result":null}
Now there are only 4 environment variables that can be set in container specs: CHT_API_TOKEN, CHT_CLUSTER_NAME, CHT_INTERVAL and CHT_JVM_MEM. Is it possible to implement support for additional environment variables? Something like :
(values.yaml)
...
env: []
- name: <ENV_VAR_NAME>
value: <ENV_VAR_VALUE>
(deployment.yaml)
...
containers:
...
env:
- name: CHT_API_TOKEN
valueFrom:
secretKeyRef:
name: {{ include "cloudhealth-collector.secretName" . }}
key: apiToken
- name: CHT_CLUSTER_NAME
value: {{.Values.clusterName | required "A valid clusterName required!" | quote }}
- name: CHT_INTERVAL
value: {{ .Values.collectionIntervalSecs | quote }}
- name: CHT_JVM_MEM
value: {{ .Values.jvmMemory }}
{{- toYaml .Values.env | nindent 12 }}
...
Hello,
apiToken parameter is described as the required parameter in values.yaml, but it is not completely true. If you don't use it, then the secret file is not created in
. So better description would be something as: "use apiToken parameter or create a secret with the name specified by secretName parameter".Originally I wanted to request a feature to create a secret file on our own, our systems are not able to pass the apiToken parameter safely to the Helm chart, but we are able to create a secret manifest on other part of infra provisioning system. Now I have discovered that this feature is already here, only not well documented :)
Thank you
Setting imagePullSecrets when installing doesn't seem to do anything?
helm template cloudhealth-collector --set apiToken=$CHT_API_TOKEN,clusterName=$CHT_CLUSTER_NAME,image.repository=/cloudhealth/container-collector,image.pullSecrets='{secret}' -n cloudhealth cloudhealth/cloudhealth-collector
Doesn't show that the secret is set.
It doesn't seem like this is referenced in deployment.yaml ?
Did a fresh helm install on EKS
=========================================================================
CHT Containers Collector Environment
CHT_API_TOKEN: ****
CHT_CLUSTER_NAME: dev-us-east-2
JAVA_OPTS:
-XX:+ExitOnOutOfMemoryError -Xms10M -Xmx891M
CHT_INTERVAL: 900
=========================================================================
2024-04-24T22:41:32.215Z [main] WARN FilenoUtil : Native subprocess control requires open access to the JDK IO subsystem
Pass '--add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED' to enable.
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:85: warning: parentheses after method name is interpreted as an argument list, not a decomposed argument
CHT Containers Collector : version 5f1b5a2 starting
D, [2024-04-24T22:41:46.848458 #10] DEBUG -- : Ensuring cache directory is present: /tmp/cache
D, [2024-04-24T22:41:46.862460 #10] DEBUG -- : Ensuring metrics cache directory is present: /tmp/metrics_cache
D, [2024-04-24T22:41:46.935558 #10] DEBUG -- : Fetching collector agent version
D, [2024-04-24T22:41:46.943248 #10] DEBUG -- : Kubernetes collector agent version found : 1398
D, [2024-04-24T22:41:47.028912 #10] DEBUG -- : loaded K8S config from with master @ https://172.22.0.1:443/ with ca certificate /var/run/secrets/kubernetes.io/serviceaccount/ca.crt with client_cert_file with client key file with trust_certs false with trust store file with proxy username
D, [2024-04-24T22:41:47.635460 #10] DEBUG -- : Fetching Cluster UID
D, [2024-04-24T22:41:47.640977 #10] DEBUG -- : Connecting to URL: https://172.22.0.1:443/api/v1/namespaces/kube-system
E, [2024-04-24T22:41:48.137155 #10] ERROR -- : [Java::JavaIo::IOException]: Server returned HTTP response code: 403 for URL: https://172.22.0.1:443/api/v1/namespaces/kube-system
E, [2024-04-24T22:41:48.138684 #10] ERROR -- : Backtrace:
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(jdk/internal/reflect/NativeConstructorAccessorImpl.java:77)
jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(jdk/internal/reflect/DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstanceWithCaller(java/lang/reflect/Constructor.java:499)
java.lang.reflect.Constructor.newInstance(java/lang/reflect/Constructor.java:480)
sun.net.www.protocol.http.HttpURLConnection$10.run(sun/net/www/protocol/http/HttpURLConnection.java:2057)
sun.net.www.protocol.http.HttpURLConnection$10.run(sun/net/www/protocol/http/HttpURLConnection.java:2052)
java.security.AccessController.doPrivileged(java/security/AccessController.java:569)
sun.net.www.protocol.http.HttpURLConnection.getChainedException(sun/net/www/protocol/http/HttpURLConnection.java:2051)
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(sun/net/www/protocol/http/HttpURLConnection.java:1609)
sun.net.www.protocol.http.HttpURLConnection.getInputStream(sun/net/www/protocol/http/HttpURLConnection.java:1589)
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(sun/net/www/protocol/https/HttpsURLConnectionImpl.java:224)
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(jdk/internal/reflect/NativeMethodAccessorImpl.java:77)
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(jdk/internal/reflect/DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:568)
org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:329)
org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:193)
RUBY.get_input_stream(uri:classloader:/mesos-collector/lib/common/fetcher.rb:37)
RUBY.fetch_uid(uri:classloader:/mesos-collector/lib/common/fetcher.rb:50)
RUBY.do_fetch_cluster_uid(uri:classloader:/mesos-collector/lib/common/collection_cycle.rb:104)
RUBY.do_upload(uri:classloader:/mesos-collector/lib/common/collection_cycle.rb:24)
RUBY.upload_k8s_state_v4(uri:classloader:/mesos-collector/bin/container-collector.rb:88)
org.jruby.RubyClass.finvokeWithRefinements(org/jruby/RubyClass.java:522)
org.jruby.RubyBasicObject.send(org/jruby/RubyBasicObject.java:1687)
org.jruby.RubyKernel.send(org/jruby/RubyKernel.java:2315)
org.jruby.RubyKernel$INVOKER$s$send.call(org/jruby/RubyKernel$INVOKER$s$send.gen)
RUBY.delegate_runner_methods(uri:classloader:/gems/boson-1.3.0/lib/boson/runner_library.rb:12)
org.jruby.RubyProc.call(org/jruby/RubyProc.java:378)
org.jruby.RubyMethod.call(org/jruby/RubyMethod.java:144)
org.jruby.RubyMethod$INVOKER$i$call.call(org/jruby/RubyMethod$INVOKER$i$call.gen)
RUBY.redefine_command_block(uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:87)
org.jruby.RubyProc.call(org/jruby/RubyProc.java:333)
org.jruby.RubyProc$INVOKER$i$call.call(org/jruby/RubyProc$INVOKER$i$call.gen)
RUBY.call_original_command(uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:123)
RUBY.during_analyze(uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:115)
RUBY.analyze(uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:107)
RUBY.redefine_command_block(uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:86)
org.jruby.RubyProc.call(org/jruby/RubyProc.java:378)
org.jruby.RubyClass.finvokeWithRefinements(org/jruby/RubyClass.java:522)
org.jruby.RubyBasicObject.send(org/jruby/RubyBasicObject.java:1687)
org.jruby.RubyKernel.send(org/jruby/RubyKernel.java:2315)
org.jruby.RubyKernel$INVOKER$s$send.call(org/jruby/RubyKernel$INVOKER$s$send.gen)
RUBY.full_invoke(uri:classloader:/gems/boson-1.3.0/lib/boson.rb:79)
RUBY.execute_command(uri:classloader:/gems/boson-1.3.0/lib/boson/bare_runner.rb:42)
RUBY.execute_command(uri:classloader:/gems/boson-1.3.0/lib/boson/runner.rb:35)
RUBY.execute(uri:classloader:/gems/boson-1.3.0/lib/boson/runner.rb:31)
RUBY.start(uri:classloader:/gems/boson-1.3.0/lib/boson/runner.rb:26)
RUBY.<main>(uri:classloader:/mesos-collector/bin/container-collector.rb:120)
org.jruby.Ruby.runInterpreter(org/jruby/Ruby.java:1290)
org.jruby.Ruby.loadFile(org/jruby/Ruby.java:2964)
org.jruby.RubyKernel.loadCommon(org/jruby/RubyKernel.java:1121)
org.jruby.RubyKernel.load(org/jruby/RubyKernel.java:1091)
org.jruby.RubyKernel$INVOKER$s$load.call(org/jruby/RubyKernel$INVOKER$s$load.gen)
RUBY.<main>(uri:classloader:/META-INF/main.rb:1)
org.jruby.Ruby.runInterpreter(org/jruby/Ruby.java:1290)
org.jruby.Ruby.loadFile(org/jruby/Ruby.java:2964)
org.jruby.RubyKernel.requireCommon(org/jruby/RubyKernel.java:1064)
org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1057)
org.jruby.RubyKernel$INVOKER$s$1$0$require.call(org/jruby/RubyKernel$INVOKER$s$1$0$require.gen)
RUBY.require(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:85)
RUBY.<main>(<script>:3)
org.jruby.Ruby.runInterpreter(org/jruby/Ruby.java:1312)
org.jruby.Ruby.runInterpreter(org/jruby/Ruby.java:1316)
org.jruby.embed.internal.EmbedEvalUnitImpl.run(org/jruby/embed/internal/EmbedEvalUnitImpl.java:119)
org.jruby.embed.ScriptingContainer.runUnit(org/jruby/embed/ScriptingContainer.java:1296)
org.jruby.embed.ScriptingContainer.runScriptlet(org/jruby/embed/ScriptingContainer.java:1289)
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(jdk/internal/reflect/NativeMethodAccessorImpl.java:77)
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(jdk/internal/reflect/DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:568)
JarMain.invokeMethod(JarMain.java:263)
JarMain.invokeMethod(JarMain.java:256)
JarMain.launchJRuby(JarMain.java:141)
JarMain.start(JarMain.java:158)
JarMain.doStart(JarMain.java:233)
JarMain.main(JarMain.java:227)
E, [2024-04-24T22:41:48.139484 #10] ERROR -- : Could not open an input stream to cluster's Kubernetes API: Server returned HTTP response code: 403 for URL: https://172.22.0.1:443/api/v1/namespaces/kube-system
I've validated that the service account and bindings were made as expected in the chart
Why we need this changes?
As part of latest release of cloudhealth-collector-4.0.0, missing update on image-tag to 1360.
Hence pods were not coming healthy, stating like
In container with argument (- upload_k8s_state_v4) not found.
Hello CloudHealth team.
The company where I'm working has a very strict admission controller policy that enforces a set of annotations and labels for deployments.
The current values.yaml file does not provide the option to include those custom annotations. Can you include an alternative for it? I have sent this PR for your consideration.
Marco.
I applied 2 custom values.yaml files and used the helmfile tool to successfully deploy the collector to a (dev) cluster.
I specified values for the apiToken
and clusterName
keys, and when I review the Environment details for the running pod, I can see the following env vars:
CHT_API_TOKEN: secretKeyRef(cloudhealth-config.apiToken)
CHT_CLUSTER_NAME: cluster-dev
CHT_INTERVAL: 900
CHT_JVM_MEM: -Xmx891M
However, once the new pod is running, it logs these error messages:
E, [{timestamp} #10] ERROR -- : Could not open an output stream to the collection API: containers-api.edge.cloudhealthtech.com
E, [{timestamp} #10] ERROR -- : ATTENTION: The most common cause of this error is that the [CHT_API_TOKEN] or [CHT_CLUSTER_NAME] environment variables were not configured properly as part of agent deployment. Please check your full configuration, including the API token and the cluster name. After your config has been fixed, please wait 15 minutes for the changes to take effect, and collection to resume.
D, [{timestamp} #10] DEBUG -- : sleeping for 10
I noticed that the value I retrieved from the cloudhealth-config
secret did not match the ApiToken string I provided in the helm chart. So I manually updated it, but the same errors show in the pod logs, even after restarting the pod.
Would it be possible for future helm chart versions that you use github releases? That would let us subscribe to the feed and know when a new helm chart version is out and update.
Hello, this is similar to issue #23 an option that is documented in the values.yaml (lines 42-48) file but is not included in the deployment.yaml template file.
I have forked this repo and made some modifications to the template to add this capabiliy and tested it with an uncommented values file. The tests consisted of modifiying the deployment.yaml file and uncommenting the securityContext entries in the values.yaml file so that the seurityContext values were applied to the running pod.
I will add the PR from the fork and some additional info to the comments here so that you can evaluate and decide if it is good to implement it. Thanks!
Hello,
I'm attempting to install the cloudhealth helm chart into an EKS cluster running 1.23
I've provided the correct keys and cluster name, as well as ensured that all security groups associated with the cluster and the cluster nodes are open to all traffic for the time being while we test, and I continue to get the same error that is preventing metrics from being collected.
I've attempted to install the helm chart on a number of freshly built clusters, so I know there is nothing installed on the cluster that would prevent a connection. I've attached the node logs below.
Thank you in advance.
logs.txt
default should be changed to {{ .Release.Namespace }}:
https://github.com/CloudHealth/helm/blob/main/cloudhealth-collector/templates/clusterrolebinding.yaml#L16
As it cause the pod failing to fetch node information.
Root cause for #61 has been determined to stem from DNS errors when the CloudHealth collector container (running on Alpine linux OS) cannot resolve the URL for the CloudHealth API. This may vary across different Kubernetes platforms. In our environment, it has been a consistent symptom on all (AWS) EKS clusters.
The collector / application log only included the following message:
Could not open an output stream to the collection API: containers-api.edge.cloudhealthtech.com
After the docker image was updated to include curl
and dnsutils
(see #66), network name resolution was confirmed to be the root cause. After manually updating the kubernetes manifest to include a dnsConfig statement and ndots
override, DNS resolution of containers-api.edge.cloudhealthtech.com
began working and the collector began working as expected.
I am unable to install cloudhealth-collector releases beyond 2.0.4. It looks like later releases (including 3.2.0, the latest) have not been uploaded to the repository described in the usage steps:
$ helm repo add cloudhealth https://cloudhealth.github.io/helm/
$ helm upgrade --install cloudhealth-collector cloudhealth/cloudhealth-collector --version 3.2.0 --namespace kube-system --values values.yaml
Error: failed to download "cloudhealth/cloudhealth-collector" at version "3.2.0"
It is difficult to view the changes that are being made in the chart, requiring inspection of each PR/commit to view changes.
I have two requests:
Thanks in advance
Hello,
Please add "hostNetwork" parameter support in helm chart
example:
values.yaml:
# -- Run the controller on the host network
hostNetwork: false
deployment.yaml:
spec:
template:
spec:
hostNetwork: {{ .Values.hostNetwork }}
Hi, the change regarding the cluster name breaks my cluster name in the console on update
It neglects that the secret contents are base64 decoded before being used as ENV variables.
The second cluster name in the screenshot is the first one Base64-encoded.
$> echo 'c2MtZGV2ZWxvcG1lbnQy' | base64 -d -
sc-development2
#27 fixes this
Allowing users to set the ServiceAccount name here breaks the ClusterRoleBinding here due to the way the ServiceAccount name is handled in the _helpers.tpl
file here. This results in the Collector not being able to open an input stream to cluster API, returning a 403 response. Please see below for error message for details:
D, [2022-04-26T15:26:17.197738 #13] DEBUG -- : loaded K8S config from with master @ https://kubernetes.default.svc/ with ca certificate /var/run/secrets/kubernetes.io/serviceaccount/ca.crt with client_cert_file with client key file with trust_certs false with trust store file with proxy username
D, [2022-04-26T15:26:19.194230 #13] DEBUG -- : Ensuring cache directory is present: /tmp/cache
D, [2022-04-26T15:26:19.488109 #13] DEBUG -- : Fetching state...
D, [2022-04-26T15:26:19.508177 #13] DEBUG -- : Connecting to URL: https://kubernetes.default.svc/api/v1/nodes
E, [2022-04-26T15:26:23.395960 #13] ERROR -- : [IOError]: Could not open an input stream to cluster API: Server returned HTTP response code: 403 for URL: https://kubernetes.default.svc/api/v1/nodes
E, [2022-04-26T15:26:23.405555 #13] ERROR -- : Backtrace:
uri:classloader:/mesos-collector/lib/common/fetcher.rb:28:in `fetch'
uri:classloader:/mesos-collector/bin/container-collector.rb:132:in `block in do_fetch_state'
org/jruby/RubyHash.java:1415:in `each'
uri:classloader:/mesos-collector/bin/container-collector.rb:131:in `do_fetch_state'
uri:classloader:/mesos-collector/bin/container-collector.rb:114:in `block in do_upload'
org/jruby/RubyKernel.java:1442:in `loop'
uri:classloader:/mesos-collector/bin/container-collector.rb:109:in `do_upload'
uri:classloader:/mesos-collector/bin/container-collector.rb:56:in `upload_k8s_state_v2'
uri:classloader:/gems/boson-1.3.0/lib/boson/runner_library.rb:12:in `block in upload_k8s_state_v2'
org/jruby/RubyMethod.java:131:in `call'
uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:87:in `block in upload_k8s_state_v2'
uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:123:in `call_original_command'
uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:115:in `during_analyze'
uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:107:in `analyze'
uri:classloader:/gems/boson-1.3.0/lib/boson/scientist.rb:86:in `block in upload_k8s_state_v2'
uri:classloader:/gems/boson-1.3.0/lib/boson.rb:79:in `full_invoke'
uri:classloader:/gems/boson-1.3.0/lib/boson/bare_runner.rb:42:in `execute_command'
uri:classloader:/gems/boson-1.3.0/lib/boson/runner.rb:35:in `execute_command'
uri:classloader:/gems/boson-1.3.0/lib/boson/runner.rb:31:in `execute'
uri:classloader:/gems/boson-1.3.0/lib/boson/runner.rb:26:in `start'
uri:classloader:/mesos-collector/bin/container-collector.rb:198:in `<main>'
org/jruby/RubyKernel.java:1009:in `load'
uri:classloader:/META-INF/main.rb:1:in `<main>'
org/jruby/RubyKernel.java:974:in `require'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:54:in `require'
<script>:3:in `<main>'
E, [2022-04-26T15:26:23.418309 #13] ERROR -- : Could not open an input stream to cluster API: Server returned HTTP response code: 403 for URL: https://kubernetes.default.svc/api/v1/nodes
E, [2022-04-26T15:26:23.487793 #13] ERROR -- : ATTENTION: The most common cause of this error is that the [CHT_API_TOKEN] or [CHT_CLUSTER_NAME] environment variables were not configured properly as part of agent deployment. Please check your full configuration, including the API token and the cluster name. After your config has been fixed, please wait 15 minutes for the changes to take effect, and collection to resume.
Either the ServiceAccount name should not be able to be set in the values.yaml
file and instead overridden using something like nameOverride
or fullnameOverride
, or the ClusterRoleBinding should be updated to use the ServiceAccount name specified by the user.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.