Giter Site home page Giter Site logo

Comments (15)

torumakabe avatar torumakabe commented on August 23, 2024 1

@vishiy I have tried with the latest image (AKSCBLMariner-V2gen2-202304.20.0) several times. I failed three times out of five attempts. Still flaky.

Despite the failure to deploy Node Exporter, provisioning for all node pools has been successful.

az aks show -g my-group -n my-cluster -o json | grep provisioningState
WARNING: The behavior of this command has been altered by the following extension: aks-preview
      "provisioningState": "Succeeded",
      "provisioningState": "Succeeded",
      "provisioningState": "Succeeded",
      "provisioningState": "Succeeded",
  "provisioningState": "Succeeded",

When the deployment of Node Exporter fails, it fails on all nodes. There is no partial success.

The following are the /usr/local/bin directories for the nodes where installation was successful and the nodes where it failed.

successful

root [ / ]# ls -asl /usr/local/bin/
total 508780
     4 drwxr-xr-x 2 root root        4096 May  8 04:59 .
     4 drwxr-xr-x 7 root root        4096 Apr  7 16:30 ..
 34556 -rwxr-x--- 1 root root    35384960 Apr 20 15:21 bpftrace
     4 -rwxr-xr-x 1 root root         705 Apr 20 15:19 ci-syslog-watcher.sh
 46508 -rwxr-xr-x 1 root root    47622592 Apr 20 15:20 containerd-shim-slight-v0-3-0-v1
 51008 -rwxr-xr-x 1 root root    52232184 Apr 20 15:20 containerd-shim-slight-v0-5-1-v1
 35172 -rwxr-xr-x 1 root root    36014944 Apr 20 15:20 containerd-shim-spin-v0-3-0-v1
 44276 -rwxr-xr-x 1 root root    45334640 Apr 20 15:20 containerd-shim-spin-v0-5-1-v1
 49136 -rwxr-xr-x 1 1001 docker  50311268 Aug 26  2022 crictl
     4 -r-xr--r-- 1 root root        2462 Apr 20 15:19 health-monitor.sh
 46912 -rwxr-xr-x 1 root root    48037888 Mar 21 00:57 kubectl
118432 -rwxr-xr-x 1 root root   121272408 Mar 21 00:57 kubelet
 64948 -rwxr-xr-x 1 root root    66504080 Nov  9 11:05 local-gadget
     0 lrwxrwxrwx 1 root root          20 Mar 29 23:32 log-counter -> /usr/bin/log-counter
 17804 -rwxr-xr-x 1 root root    18231039 Feb  8  2022 node-exporter
     4 -rwxr-xr-x 1 root root         834 Mar 29 23:31 node-exporter-startup.sh
     0 lrwxrwxrwx 1 root root          30 Mar 29 23:32 node-problem-detector -> /usr/bin/node-problem-detector
     8 -rwxr-xr-x 1 root root        4601 Mar 29 23:32 node-problem-detector-startup.sh

failed

root [ / ]# ls -asl /usr/local/bin/
total 426016
     4 drwxr-xr-x 2 root root        4096 May  8 06:06 .
     4 drwxr-xr-x 7 root root        4096 Apr  7 16:30 ..
 34556 -rwxr-x--- 1 root root    35384960 Apr 20 15:21 bpftrace
     4 -rwxr-xr-x 1 root root         705 Apr 20 15:19 ci-syslog-watcher.sh
 46508 -rwxr-xr-x 1 root root    47622592 Apr 20 15:20 containerd-shim-slight-v0-3-0-v1
 51008 -rwxr-xr-x 1 root root    52232184 Apr 20 15:20 containerd-shim-slight-v0-5-1-v1
 35172 -rwxr-xr-x 1 root root    36014944 Apr 20 15:20 containerd-shim-spin-v0-3-0-v1
 44276 -rwxr-xr-x 1 root root    45334640 Apr 20 15:20 containerd-shim-spin-v0-5-1-v1
 49136 -rwxr-xr-x 1 1001 docker  50311268 Aug 26  2022 crictl
     4 -r-xr--r-- 1 root root        2462 Apr 20 15:19 health-monitor.sh
 46912 -rwxr-xr-x 1 root root    48037888 Mar 21 00:57 kubectl
118432 -rwxr-xr-x 1 root root   121272408 Mar 21 00:57 kubelet

Is there any possible cause that you can think of? Thanks.

from prometheus-collector.

vishiy avatar vishiy commented on August 23, 2024 1

@torumakabe - Could you please send the below log files and also share AKS cluster id ?

from prometheus-collector.

vishiy avatar vishiy commented on August 23, 2024

@torumakabe - thanks for filing the issue . node exporter is installed thru vm images on aks nodes (not thru prometheus collector), and collector just scrapes it. can u pls tell us the vm image version ? i am also assuming these are aks nodes ?

from prometheus-collector.

torumakabe avatar torumakabe commented on August 23, 2024

@vishiy Thank you for your comment. All nodes are in AKS. The node image is "AKSCBLMariner-V2gen2-202304.10.0".

from prometheus-collector.

github-actions avatar github-actions commented on August 23, 2024

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

from prometheus-collector.

vishiy avatar vishiy commented on August 23, 2024

@torumakabe apologies for the delay. i am following up with AKS folks on this.

from prometheus-collector.

vishiy avatar vishiy commented on August 23, 2024

@torumakabe - can u please confirm if the the corresponding node pools in the cluster is/are not in failed provisioning state ? Also is this the case in all anodes or just a few nodes ? Thanks for your help.

from prometheus-collector.

github-actions avatar github-actions commented on August 23, 2024

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

from prometheus-collector.

github-actions avatar github-actions commented on August 23, 2024

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

from prometheus-collector.

github-actions avatar github-actions commented on August 23, 2024

This issue was closed because it has been stalled for 12 days with no activity.

from prometheus-collector.

github-actions avatar github-actions commented on August 23, 2024

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

from prometheus-collector.

github-actions avatar github-actions commented on August 23, 2024

This issue was closed because it has been stalled for 12 days with no activity.

from prometheus-collector.

vishiy avatar vishiy commented on August 23, 2024

@torumakabe - is this issue resolved now ? I remember you were following up with AKS .

from prometheus-collector.

torumakabe avatar torumakabe commented on August 23, 2024

@vishiy Thanks for your concern

I talked to the AKS team and found out why node exporter is not installed. node exporter is installed by AKS-Operator, which tries to install node exporter several times during cluster creation. But the priority is low. Therefore, if a high priority task is taking a long time, it will try to install it again after enough waiting time.

The wait time can be up to 24 hours, but I have confirmed that, indeed, if I wait, it will install. I would like the wait time to be shorter, in other words, the retry interval to be shorter, but I am satisfied at this point that I have found the cause of the problem.

from prometheus-collector.

vishiy avatar vishiy commented on August 23, 2024

ok thank you. i will close this issue.

from prometheus-collector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.