Giter Site home page Giter Site logo

ubccr / grendel Goto Github PK

View Code? Open in Web Editor NEW
56.0 8.0 13.0 13.17 MB

Bare Metal Provisioning system for HPC Linux clusters

Home Page: https://grendel.readthedocs.io

License: GNU General Public License v3.0

Go 92.00% Makefile 0.47% Shell 0.65% CSS 5.29% JavaScript 1.59%
hpc bare-metal provisioning pxe-server

grendel's Introduction

Grendel

Grendel - Bare Metal Provisioning for HPC

Documentation Status

Grendel is a fast, easy to use bare metal provisioning system for High Performance Computing (HPC) Linux clusters. Grendel simplifies the deployment and administration of physical compute clusters both large and small. It's developed by the University at Buffalo Center for Computational Research (CCR) with more than 20 years of experience in HPC. Grendel is under active development and currently runs CCR's production HPC clusters ranging from 200 to 1500 nodes.

Key Features

  • DHCP/PXE/TFTP provisioning
  • DNS forward and reverse resolution
  • Automatic host discovery
  • Diskful and Stateless (Live image) provisioning
  • BMC/iDRAC control via RedFish and IPMI
  • Authorized provisioning using Branca tokens
  • Rest API
  • Easy installation (single binary with no deps)
  • Heorot Web GUI

Project status

Grendel is under heavy development and any API's will likely change considerably before a more stable release is available. Use at your own risk. Feedback and pull requests are more than welcome!

Quickstart with QEMU/KVM

The following steps will show how to PXE boot a linux virtual machine using QEMU/KVM and install Flatcar linux using Grendel. A demo of installing and using Grendel can be found here.

Installation

To install Grendel download a copy of the binary here.

$ tar xvzf grendel-0.x.x-linux-amd64.tar.gz
$ cd grendel-0.x.x-linux-amd64/
$ ./grendel --help

Create a TAP device

$ sudo ip tuntap add name tap0 mode tap user ${LOGNAME}
$ sudo ip addr add 192.168.10.254/24 dev tap0
$ sudo ip link set up dev tap0

For RedHat/CentOS

$ sudo firewall-cmd --zone=trusted --change-interface=tap0

For Debian/Ubuntu

$ sudo ufw allow in on tap0

Create a boot Image file

$ wget http://stable.release.flatcar-linux.net/amd64-usr/current/flatcar_production_pxe_image.cpio.gz
$ wget http://stable.release.flatcar-linux.net/amd64-usr/current/flatcar_production_pxe.vmlinuz

Create the following JSON file image.json:

[{
    "name": "flatcar",
    "kernel": "flatcar_production_pxe.vmlinuz",
    "initrd": [
        "flatcar_production_pxe_image.cpio.gz"
    ],
    "cmdline": "flatcar.autologin"
}]

Create a host file

Create the following JSON file host.json:

[{
    "name": "tux01",
    "provision": true,
    "boot_image": "flatcar",
    "interfaces": [
        {
            "fqdn": "tux01.localhost",
            "ip": "192.168.10.12/24",
            "mac": "DE:AD:BE:EF:12:8C"
        }
    ]
}]

Start Grendel services

$ sudo ./grendel --verbose serve --hosts host.json --images image.json --listen 192.168.10.254

Note: The serve command requires root privileges to bind to lower level ports. If you don't want to run as root you can allow Grendel to bind to privileged with the following command:

$ sudo setcap CAP_NET_BIND_SERVICE,CAP_NET_RAW=+eip /path/to/grendel

PXE Boot the linux virtual machine

In another terminal window run the following commands:

$ qemu-system-x86_64 -m 2048 -boot n -device e1000,netdev=net0,mac=DE:AD:BE:EF:12:8C -netdev tap,id=net0,ifname=tap0,script=no

Hacking

Building Grendel requires Go v1.20 or greater. Building iPXE requires packages lzma-sdk-devel, xz-devel, and gcc-aarch64-linux-gnu:

$ git clone --recursive https://github.com/ubccr/grendel
$ cd grendel/firmware
$ make build
$ make bindata
$ cd ..
$ go build .
$ ./grendel help
Bare Metal Provisioning for HPC

Usage:
  grendel [command]

Available Commands:
  bmc         Query BMC devices
  discover    Auto-discover commands
  help        Help about any command
  host        Host commands
  image       Boot Image commands
  serve       Run services

Flags:
  -c, --config string     config file
      --debug             Enable debug messages
      --endpoint string   Grendel API endpoint (default "grendel-api.socket")
  -h, --help              help for grendel
      --verbose           Enable verbose messages

Use "grendel [command] --help" for more information about a command.

Publications

  • Andrew E. Bruno, Salvatore J. Guercio, Doris Sajdak, Tony Kew, and Matthew D. Jones. 2020. Grendel: Bare Metal Provisioning System for High Performance Computing. In Practice and Experience in Advanced Research Computing (PEARC ’20). Association for Computing Machinery, New York, NY, USA, 13–18. DOI: https://doi.org/10.1145/3311790.3396637

  • PEARC ’20 Paper Presentation: video | slides

Acknowledgments

PXE booting is based on Pixiecore by Dave Anderson. DHCP implementation makes heavy use of this excellent packet library. DNS implementation uses this library. TFTP implementation uses this library. Backend database runs BuntDB. NodeSet/RangeSet algorithms ported from ClusterShell

License

Grendel is released under the GPLv3 license. See the LICENSE file.

grendel's People

Contributors

aebruno avatar bensallen avatar holmanb avatar jafurlan avatar tonykew avatar ttrafford avatar wajchina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grendel's Issues

incorrect nodeset conversion

When hitting endpoints such as /host/tag or provision, nodesets like srv-p24-13,srv-p24-11 are shortened to srv-p24-[11,13], if you add another node with a 0 preceding the last number, such as srv-p24-09, the nodeset will drop the 0 and get incorrectly shortened to srv-p24-[9,11,13].
This only occurs when the nodeset includes both a "double digit" and "single digit" node, ie: srv-p24-09,srv-p24-07 correctly gets shortened to srv-p24-[07,09] while srv-p24-10,srv-p24-09 does not: srv-p24-[9,10]

http doesn't seem to be working.

Hi,
http doesn't seem to be working. When I try to serve a directory using the "repo_dir" variable in the grendel.toml file, the client (compute and/or curl and wget) can't seem to download any file. Gets a 404 not found when using wget. I have never been able to server any files using http. Not in 0.0.6, not in 0.0.7 and also not in 0.0.8. Not sure if it is just me only.

Thanks,
RC

feature request: add tags to images

Just like how hosts support the "tag" string array, images should too. When you have a lot of images, it can be cumbersome to sort through them all, something like grendel image show --tags blah would help filter them.

Support DHCP listening on multiple interfaces

Currently, grendel dhcp server listens on a single interface. Support dhcp responses across multiple interfaces and craft dhcp packets according to the interface they were received on. May also need to add extra configs options.

discover dhcp options is not clear

Hi,
I am testing out new node discovery and was running discover dhcp.

Looks like there is no --nodeset option. It throws an error unknown flag "--nodeset". But if I omit, it altogether, it complains with "Please provide a nodeset".

I am running the command mentioned in the "background" page.
$ grendel discover dhcp --subnet 10.64.0.0 --nodeset tux-[01-100]

I would also like to mention that there seems to be no clear documentation about what each option does. Like the --firmware string or the --hosts string.

It says that the --hosts option adds to a hosts file. Does it mean that if we have a hosts.json file, it will add the new node info to it ?

I like this software because it is an all-on-one single binary, portable solution. But the documentation is not up to mark. I can help in testing out each and every feature and document it.

Thanks,

unknown network kickstart URL $kickstart

Hi, When I use the latest version of grendel, I will report the following error

# ./grendel --verbose serve -c /usr/share/grendel/etc/grendel.toml --hosts /usr/share/grendel/etc/hosts.json --images /usr/share/grendel/etc/images.json

image

But I use the old version before, this step is normal

# ./grendel.bak --verbose serve -c /usr/share/grendel/etc/grendel.toml --hosts /usr/share/grendel/etc/hosts.json --images /usr/share/grendel/etc/images.json

image

How to deal with this problem?
some info :

[root@node01 bin]# md5sum grendel 
e7fb63985e795c5a9d6ccc57d5dfc841  grendel
[root@node01 bin]# md5sum grendel.bak 
14f6727929f1854802b284249e80ffef  grendel.bak

[root@node01 bin]# cat /usr/share/grendel/etc/images.json 
[
 {
    "name": "test_node",
    "kernel": "/var/lib/grendel/repo/isolinux/vmlinuz",
    "initrd": [
        "/var/lib/grendel/repo/isolinux/initrd.img"
    ],
    "provision_template": "test_node.ks.tmpl",
    "cmdline": "ks=$kickstart network ksdevice=bootif ks.device=bootif inst.sshd"
  }
]


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.