Giter Site home page Giter Site logo

nrel / tracknodes Goto Github PK

View Code? Open in Web Editor NEW
4.0 5.0 5.0 49 KB

Tracknodes keeps a history of node state and comment changes. It allows system administrators of HPC systems to determine when nodes were down and discover trends such as recurring issues. Supports Torque, PBSpro and SLURM.

License: GNU General Public License v3.0

Python 99.45% Shell 0.55%
torque slurm pbspro hpc cluster hpc-systems nhc openhpc

tracknodes's Introduction

tracknodes

Description

Tracknodes keeps a history of node state and comment changes. It allows system administrators of HPC systems to determine when nodes were down and discover trends such as recurring issues. Supports Torque, PBSpro and has limited support for SLURM.

Build Status PIP Version Coverage Status Gitter IM

Installation

$ pip install tracknodes

or

$ easy_install tracknodes

Usage

Setup a cronjob on an admin node. This step is required for node state changes to be tracked.

$ crontab -u root -e
# Track Node State Every Minute
* * * * * (/usr/bin/tracknodes --update >/dev/null 2>&1)

Use the below command to see the history of node changes.

$ tracknodes
History of Nodes
=========
n101 | 2016-11-28 21:30:01 | online | ''
n101 | 2016-11-28 20:30:01 | offline,down | 'Hardware issue bad DIMM'
n092 | 2016-11-27 19:30:01 | online | ''
n092 | 2016-11-27 12:00:01 | offline | 'Hardware issue failed disk'
n021 | 2016-11-27 09:00:01 | online | ''
n021 | 2016-11-26 19:00:01 | offline,down | 'DIMM Configuration Error'
-- --

You can setup the configuration file for tracknodes to change the database location or the command to get node status. Use the below as an example.

$ cat /etc/tracknodes.conf
---
dbfile: "/opt/tracknodes.db"
cmd: "/opt/pbsnodes"

Tracknodes uses a sqlite database to store the node history, you can determine what database its using with the -v argument.

$ tracknodes -v
Resource Manager Detected as torque
cmd: /opt/pbsnodes
dbfile: ~/.tracknodes.db
...

For usage information you can use --help.

$ tracknodes --help
Usage: tracknodes [options]

Options:
  -h, --help            show this help message and exit
  -U, --update          Update Database From Current Node States
  -f DBFILE, --dbfile=DBFILE
                        Database File
  -c CMD, --cmd=CMD
                        Location of command to show node state, example: /opt/pbsnodes, /opt/sinfo
  -v, --verbose         Verbose Output

License

tracknodes is released under the GPLv3 License.

tracknodes's People

Contributors

thedavidwhiteside avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tracknodes's Issues

Add Support for PBSpro

The pbsnodes command for PBSpro does not have the -n option, it defaults with the -n option.

To detect PBSpro:
$ /opt/pbs/bin/pbsnodes —version
pbs_version =

If its PBSpro:
Use pbsnodes -l, instead of pbsnodes -nl.

Add Support for SLURM

$ sinfo -dR
REASON USER TIMESTAMP NODELIST
broken ram root 2017-01-02T09:09:82 n010
DIMM failed root 2017-01-03T06:03:16 n020

Comment updates are not being commited to sqlite db afte run

$ tracknodes -U
History of Nodes

c3 | 2016-12-06 22:07:38 | offline,down | 'offline for testing'
c1 | 2016-12-06 21:56:20 | down | 'node down: communication closed'
c2 | 2016-12-06 21:56:20 | down | 'node down: communication closed'
c3 | 2016-12-06 21:56:20 | down | 'node down: communication closed'
c4 | 2016-12-06 21:56:20 | down | 'node down: communication closed'

$ tracknodes
History of Nodes

c1 | 2016-12-06 21:56:20 | down | 'node down: communication closed'
c2 | 2016-12-06 21:56:20 | down | 'node down: communication closed'
c3 | 2016-12-06 21:56:20 | down | 'node down: communication closed'
c4 | 2016-12-06 21:56:20 | down | 'node down: communication closed'

python3 support?

Any chance of python3 support? If you need help with that I might be able to assit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.