Giter Site home page Giter Site logo

helm-charts's Issues

nodeSelector missing

nodeSelector: {} from the values.yaml has no effect on the statefulset.

Please fix.

new eks node naming convention not scraping data

Since we upgraded to K8s 1.30 and changed to the new naming convention where the node name becomes the instance ID rather then the ip address, vantage is not able to connect to the metrics endpoints anymore.

{"time":"2024-08-20T11:11:12.383850217Z","level":"ERROR","msg":"failed to scrape node","err":"Get \"https://i-08d8e736c8613ad19.eu-central-1.compute.internal:10250/metrics/resource\": dial tcp: lookup i-08d8e736c8613ad19.eu-central-1.compute.internal on 172.20.0.10:53: no such host","node":"i-08d8e736c8613ad19.eu-central-1.compute.internal"}                                                                                                                                                                                                         
{"time":"2024-08-20T11:11:12.388666313Z","level":"ERROR","msg":"failed to scrape node","err":"Get \"https://i-01ef04da8337513e9.eu-central-1.compute.internal:10250/metrics/resource\": dial tcp: lookup i-01ef04da8337513e9.eu-central-1.compute.internal on 172.20.0.10:53: no such host","node":"i-01ef04da8337513e9.eu-central-1.compute.internal"}                                                                                                                                                                                                         
{"time":"2024-08-20T11:11:12.391516247Z","level":"ERROR","msg":"failed to scrape node","err":"Get \"https://i-088f87d7adca008ef.eu-central-1.compute.internal:10250/metrics/resource\": dial tcp: lookup i-088f87d7adca008ef.eu-central-1.compute.internal on 172.20.0.10:53: no such host","node":"i-088f87d7adca008ef.eu-central-1.compute.internal"}                                                                                                                                                                                                         
{"time":"2024-08-20T11:11:12.397959425Z","level":"ERROR","msg":"failed to scrape node","err":"Get \"https://i-04c8b4cf88d11a2da.eu-central-1.compute.internal:10250/metrics/resource\": dial tcp: lookup i-04c8b4cf88d11a2da.eu-central-1.compute.internal on 172.20.0.10:53: no such host","node":"i-04c8b4cf88d11a2da.eu-central-1.compute.internal"}                                                                                                                                                                                                         
{"time":"2024-08-20T11:11:12.398060478Z","level":"ERROR","msg":"failed to scrape node","err":"Get \"https://i-0c754439af4fd227d.eu-central-1.compute.internal:10250/metrics/resource\": dial tcp: lookup i-0c754439af4fd227d.eu-central-1.compute.internal on 172.20.0.10:53: no such host","node":"i-0c754439af4fd227d.eu-central-1.compute.internal"}                                                                                                                                                                                                         
{"time":"2024-08-20T11:11:12.468072542Z","level":"ERROR","msg":"failed to scrape node","err":"Get \"https://i-0830490e63ad762a5.eu-central-1.compute.internal:10250/metrics/resource\": dial tcp: lookup i-0830490e63ad762a5.eu-central-1.compute.internal on 172.20.0.10:53: no such host","node":"i-0830490e63ad762a5.eu-central-1.compute.internal"}                                                                                                                                                                                                         
{"time":"2024-08-20T11:11:12.468218196Z","level":"INFO","msg":"finished scraping metrics from nodes","success":0,"failure":6,"duration_ms":98}  

Any idea on how to fix that?

pod crashing every now and then

We have multiple clusters, all running the vantage agent. Every now and then a pod on any of them crashes. The helm deployment needs to be destroyed and recreated to fix this issue. We are now missing a week of data for some clusters.

{"time":"2023-12-30T11:30:17.665406711Z","level":"INFO","msg":"restoring pod data from backup"}                                                                                                                       
{"time":"2023-12-30T11:30:18.469158454Z","level":"ERROR","msg":"failed to setup data store","err":"unexpected EOF"}

How can we solve this issue?

- persistS3.bucket: Invalid type. Expected: null, given: string

schema is misleading
there's no env option so it's impossible to use this chart as is with persistS3

    error: values don't meet the specifications of the schema(s) in the following chart(s):
    vantage-kubernetes-agent:
    - persistS3.bucket: Invalid type. Expected: null, given: string

agent.nodeAddressTypes value should have default explicitly in values.yaml.

It's typically, a helm convention to have the defaults be explicitly placed into the vaules.yaml.

I could easily have made this a PR, but I wanted to open a discussion about what should be the default, since this is a new type of integration.

It seems that Hostname is the implied default. I've have mixed success with this.

  • Didn't work in my GKE clusters
  • Inconsistently worked in my EKS clusters.
    • Not a regional thing.
    • All had the same Corefile
    • Some had the EC2 domain suffix in /etc/resolv.conf ; some didn't. One host worked without the domain suffix in /etc/resolv/conf.

I seem to have success with the value set to InternalIP across my entire multi-region, multi-cloud (EKS, GKE) fleet. IMO, this should work universally and that's my reason for lobbying for it as the default. I've set it my values.yaml for my parent chart, so I'm good. But I just wanted to see if this proposed change would help others.

I don't think it would affect any clusters with a service mesh or HTTP(S) proxy involved.

Thanks!

Vantage Helm chart has no way of defining volume mounts for ReadOnlyRootFilesystem

ReadOnlyRootFilesystem: true causes "level":"ERROR","msg":"failed to report","err":"open /tmp/report369083143: read-only file system"}

Typically a emptyDir volume, or other, can be added to remedy the file system permissions but the Vantage Helm chart has no overrides for additional volumes or volumeMounts.

Possible Solutions:

  1. Vantage Agent writes to the PVC it creates instead of a /tmp/ dir.
  2. Vantage Helm Chart provides a way for users to modify volumes and volumeMounts
  3. Vantage team pre-populates volumes and volumeMounts in the Helm template to the specifications of the agent.

Vantage Agent should include best practice securityContext + Run as Non-Root User

As said in the title. The default Vantage install should come with a securityContext with the highest restrictions the Vantage developers believe their app can run with, and Vantage should run as non-root user out of the box.

These are things I can tweak when installing Vantage. But inexperienced Kubernetes maintainers will probably leave the defaults, following Vantage documentation exactly, and result in a less-than-ideal configuration. This is especially important for clusters with Restricted requirements.

Vantage Agent is crashing

image

I did installed the vantage agent using helm. But I don't know it's crashing. You can see the logs in the SS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.