Giter Site home page Giter Site logo

metal-stack / metal-hammer Goto Github PK

View Code? Open in Web Editor NEW
42.0 10.0 6.0 733 KB

metal-hammer is used to boot bare metal servers with ipxe and the metal-stack kernel

License: GNU Affero General Public License v3.0

Dockerfile 1.74% Makefile 4.01% Go 93.51% Shell 0.74%
bare-metal initrd pxe pxe-boot

metal-hammer's Introduction

Metal Stack Hammer

Hammer is used to boot a bare metal server via PXE together with the Metal Stack kernel. Hammer is a initrd which runs a small golang binary as init process. This does the following actions:

  • Ensures all interfaces are up
  • Check if the server was booted in UEFI, if not modify the bios tu uefi and reboots
  • Wipes as existing disks by either:
    • run secure erase if possible by using the mechanism in modern disks, this is true for most SSD´s and NVME disks.
    • If not possible run mkfs.ext4 --discard on the disks.
  • Gather HW information and report them back to metal-api:
    • CPU Core count
    • Memory count
    • Disks with their size and device path
    • Network adapters which have an active uplink with their interface name, own mac address and mac address of the switch chassis where this network card is connected to. 2 distinct switch chassis are required.
    • IPMI interface with mac and ipaddress.
    • create a metal user on IPMI with a strong password
  • Set BIOS boot order to contain only PXE and Hard Disk as possible options.
  • Wait until a machine create command was issued from metal-api

Local Testing

make clean initrd vagrant-up

Create a PXE boot initrd with u-root

In order to be able to create an initrd image which is suitable to boot a bare metal server with the required tools to discover and install the target os, we use u-root.

Quickstart

  • download u-root:
go get -u github.com/u-root/u-root
  • build the initrd
make initrd

check content

cpio -itv < metal-hammer-initrd.img

start it

make vagrant-up

metal-hammer's People

Contributors

gerrit91 avatar grigoriymikhalkin avatar kolsa avatar majst01 avatar muhittink avatar mwindower avatar suckowbiz avatar suryamurugan avatar ulrichschreiner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metal-hammer's Issues

Creating RAID on s3-large-x86 fails

...
INFO[06-15|13:20:27] create filesystem                        args="[-F -L varlib /dev/md2]" caller=filesystem.go:289                                                                                                                             
mke2fs 1.44.5 (15-Dec-2018)                                                                                                                                                                                                                       
ext2fs_check_if_mount: Can't check if filesystem is mounted due to missing mtab file while determining whether /dev/md2 is mounted.                                                                                                               
Discarding device blocks: done                                                                                                                                                                                                                    
Creating filesystem with 57199456 4k blocks and 14303232 inodes                                                                                                                                                                                   
Filesystem UUID: 266a1cb6-f954-4092-a97b-8a13895366b2                                                                                                                                                                                             
Superblock backups stored on blocks:                                                                                                                                                                                                              
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,                                                                                                                                                                   
        4096000, 7962624, 11239424, 20480000, 23887872                                                                                                                                                                                            
                                                                                                                                                                                                                                                  
Allocating group tables: done                                                                                                                                                                                                                     
Writing inode tables: done                                                                                                                                                                                                                        
Creating journal (262144 blocks): done                                                                                                                                                                                                            
Writing superblocks and filesystem accounting information: done                                                                                                                                                                                   
                                                                                                                                                                                                                                                  
INFO[06-15|13:20:32] mount filesystem                         device=/dev/md1 path=/rootfs format=0xc000393510 opts= caller=filesystem.go:462                                                                                                     
INFO[06-15|13:20:32] mount filesystem  [  136.833573] FAT-fs (md0): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!                                                                                  
                       device=/dev/md0 path=/rootfs/boot/efi format=0xc0003934f0 opts= caller=filesystem.go:462                                                                                                                                   
INFO[06-15|13:20:32] mount filesystem                         device=/dev/md2 path=/rootfs/var/lib format=0xc000393540 opts= caller=filesystem.go:462                                                                                             
INFO[06-15|13:20:32] mount                                    source=proc target=/rootfs/proc fstype=proc flags=0 data= caller=filesystem.go:393                                                                                                  
INFO[06-15|13:20:32] mount                                    source=sys target=/rootfs/sys fstype=sysfs flags=0 data= caller=filesystem.go:393                                                                                                   
INFO[06-15|13:20:32] mount                                    source=efivarfs target=/rootfs/sys/firmware/efi/efivars fstype=efivarfs flags=0 data= caller=filesystem.go:393                                                                      
INFO[06-15|13:20:32] mount                                    source=tmpfs target=/rootfs/tmp fstype=tmpfs flags=0 data= caller=filesystem.go:393                                                                                                 
INFO[06-15|13:20:32] mount                                    source=/dev target=/rootfs/dev fstype= flags=4096 data= caller=filesystem.go:393                                                                                                    
INFO[06-15|13:20:32] create legacy disk.json                  content="{\n  \"Device\": \"legacy\",\n  \"Partitions\": [\n    {\n      \"Label\": \"root\",\n      \"Filesystem\": \"ext4\",\n      \"Properties\": {\n        \"UUID\": \"644c9fe
9-f981-4971-90bb-923d61c9f077\"\n      }\n    },\n    {\n      \"Label\": \"efi\",\n      \"Filesystem\": \"vfat\",\n      \"Properties\": {\n        \"UUID\": \"91AB-5E06\"\n      }\n    },\n    {\n      \"Label\": \"varlib\",\n      \"Files
ystem\": \"ext4\",\n      \"Properties\": {\n        \"UUID\": \"266a1cb6-f954-4092-a97b-8a13895366b2\"\n      }\n    }\n  ]\n}" caller=filesystem.go:444                                                                                         
INFO[06-15|13:20:32] pull image                               image=http://images.metal-stack.io/metal-os/master/centos/7/20210606/img.tar.lz4 caller=image.go:25                                                                                 
INFO[06-15|13:20:32] download                                 from=http://images.metal-stack.io/metal-os/master/centos/7/20210606/img.tar.lz4 to=/tmp/os.tgz caller=image.go:134                                                                  
235.48 MiB / 629.10 MiB [------------>____________________] 37.43% 78.46 MiB p/sDBUG[06-15|13:20:35] lldp                                     detectedNeighbor="Name:nbg-w8101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS771
2-32X Chassis:Mac:90:3c:b3:77:7f:59 Port:Mac:90:3c:b3:77:7f:5a" caller=lldpclient.go:71                                                                                                                                                           
DBUG[06-15|13:20:35] lldp                                     detectedNeighbor="Name:nbg-w8101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7c:59 Port:Mac:90:3c:b3:77:7c:5a" caller=lldpclien
t.go:71                                                                                                                                                                                                                                           
629.10 MiB / 629.10 MiB [--------------------------------] 100.00% 80.27 MiB p/s                                                                                                                                                                  
INFO[06-15|13:20:40] download                                 from=http://images.metal-stack.io/metal-os/master/centos/7/20210606/img.tar.lz4.md5 to=/tmp/os.tgz.md5 caller=image.go:134                                                          
46 B / 46 B [----------------------------------------------------] 100.00% ? p/s                                                                                                                                                                  
INFO[06-15|13:20:40] check md5                                caller=image.go:37                                                                                                                                                                  
INFO[06-15|13:20:41] check md5                                source md5=7c17699731a7ef2f0d2bc884eaecc828 expected md5=7c17699731a7ef2f0d2bc884eaecc828 caller=image.go:123                                                                       
INFO[06-15|13:20:41] pull image done                          image=http://images.metal-stack.io/metal-os/master/centos/7/20210606/img.tar.lz4 caller=image.go:43                                                                                 
INFO[06-15|13:20:41] burn image                               image=http://images.metal-stack.io/metal-os/master/centos/7/20210606/img.tar.lz4 caller=image.go:49                                                                                 
INFO[06-15|13:20:41] lz4                                      size=0 caller=image.go:69
1.09 GiB / 1.23 GiB [------------------------------->____] 88.75% 357.61 MiB p/sDBUG[06-15|13:20:46] lldp                                     detectedNeighbor="Name:nbg-w8101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS771
2-32X Chassis:Mac:90:3c:b3:77:7c:59 Port:Mac:90:3c:b3:77:7c:5a" caller=lldpclient.go:71                                                                                                                                                           
DBUG[06-15|13:20:46] lldp                                     detectedNeighbor="Name:nbg-w8101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7f:59 Port:Mac:90:3c:b3:77:7f:5a" caller=lldpclien
t.go:71                                                                                                                                                                                                                                           
1.18 GiB / 1.23 GiB [---------------------------------->_] 96.01% 346.05 MiB p/sINFO[06-15|13:20:46] event                                    event=Alive message="still alive at: 2021-06-15 13:20:46.497746098 +0000 UTC m=+120.622047809" calle
r=event.go:62                                                                                                                                                                                                                                     
POST /machine/00000000-0000-0000-0000-3cecef188fb2/event HTTP/1.1                                                                                                                                                                                 
Host: 10.255.255.6:4242                                                                                                                                                                                                                           
User-Agent: Go-http-client/1.1                                                                                                                                                                                                                    
Content-Length: 103                                                                                                                                                                                                                               
Accept: application/json                                                                                                                                                                                                                          
Content-Type: application/json                                                                                                                                                                                                                    
Accept-Encoding: gzip                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                  
{"event":"Alive","message":"still alive at: 2021-06-15 13:20:46.497746098 +0000 UTC m=+120.622047809"}                                                                                                                                            
                                                                                                                                                                                                                                                  
1.25 GiB / 1.23 GiB [---------------------------------->] 102.07% 346.05 MiB p/sHTTP/1.1 200 OK                                                                                                                                                   
Content-Length: 0                                                                                                                                                                                                                                 
Date: Tue, 15 Jun 2021 13:20:46 GMT                                                                                                                                                                                                               
                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                  
1.45 GiB / 1.23 GiB [-----------------------------------] 118.02% 249.04 MiB p/s                                                                                                                                                                  
INFO[06-15|13:20:48] burn took                                duration=6.163323612s caller=image.go:96                                                                                                                                            
INFO[06-15|13:20:48] install                                  image=http://images.metal-stack.io/metal-os/master/centos/7/20210606/img.tar.lz4 caller=install.go:88                                                                               
INFO[06-15|13:20:48] write installation configuration         caller=install.go:180                                                                                                                                                               
INFO[06-15|13:20:48] install                                  base64 decode of userdata failed, using plain text="illegal base64 data at input byte 0" caller=install.go:171                                                                      
INFO[06-15|13:20:48] running /install.sh on                   prefix=/rootfs caller=install.go:100                                                                                                                                                
UUID="644c9fe9-f981-4971-90bb-923d61c9f077" / ext4 defaults 0 1                                                                                                                                                                                   
UUID="266a1cb6-f954-4092-a97b-8a13895366b2" /var/lib ext4 defaults 0 1                                                                                                                                                                            
UUID="91AB-5E06" /boot/efi vfat defaults 0 2                                                                                                                                                                                                      
tmpfs /tmp tmpfs defaults,noatime,nosuid,nodev,noexec,mode=1777,size=512M 0 0                                                                                                                                                                     
creating user 'metal'                                                                                                                                                                                                                             
set password for metal to 8JHDekbYbHOR7TsC expires after 1 day.                                                                                                                                                                                   
Changing password for user metal.                                                                                                                                                                                                                 
New password: Retype new password: passwd: all authentication tokens updated successfully.                                                                                                                                                        
{"level":"info","ts":1623763248.606868,"caller":"cmd/root.go:81","msg":"running app version: devel (b59f0b82), tags/v0.6.4-0-gb59f0b8, 2021-05-20T06:04:51+00:00, go1.16.4"}                                                                      
{"level":"info","ts":1623763248.6069381,"caller":"netconf/knowledgebase.go:52","msg":"loading: /etc/metal/install.yaml"}                                                                                                                          
{"level":"info","ts":1623763248.607897,"caller":"netconf/configurator.go:217","msg":"rendering networkd/00-lo.network.tpl to /etc/systemd/network/00-lo.network (mode: -rw-r--r--)"}                                                              
{"level":"info","ts":1623763248.6080582,"caller":"netconf/systemd.go:81","msg":"Skipping validation since there is no known way to validate (.network|.link) files in advance."}                                                                  
{"level":"info","ts":1623763248.6082485,"caller":"netconf/configurator.go:217","msg":"rendering networkd/10-lan.link.tpl to /etc/systemd/network/10-lan0.link (mode: -rw-r--r--)"}                                                                
{"level":"info","ts":1623763248.6083636,"caller":"netconf/systemd.go:81","msg":"Skipping validation since there is no known way to validate (.network|.link) files in advance."}                                                                  
{"level":"info","ts":1623763248.608497,"caller":"netconf/configurator.go:217","msg":"rendering networkd/10-lan.network.tpl to /etc/systemd/network/10-lan0.network (mode: -rw-r--r--)"}                                                           
{"level":"info","ts":1623763248.6086195,"caller":"netconf/systemd.go:81","msg":"Skipping validation since there is no known way to validate (.network|.link) files in advance."}                                                                  
{"level":"info","ts":1623763248.6087441,"caller":"netconf/configurator.go:217","msg":"rendering networkd/10-lan.link.tpl to /etc/systemd/network/11-lan1.link (mode: -rw-r--r--)"}                                                                
{"level":"info","ts":1623763248.6088538,"caller":"netconf/systemd.go:81","msg":"Skipping validation since there is no known way to validate (.network|.link) files in advance."}                                                                  
{"level":"info","ts":1623763248.6089616,"caller":"netconf/configurator.go:217","msg":"rendering networkd/10-lan.network.tpl to /etc/systemd/network/11-lan1.network (mode: -rw-r--r--)"}                                                          
{"level":"info","ts":1623763248.6090455,"caller":"netconf/systemd.go:81","msg":"Skipping validation since there is no known way to validate (.network|.link) files in advance."}                                                                  
{"level":"info","ts":1623763248.6091485,"caller":"netconf/configurator.go:217","msg":"rendering hosts.tpl to /etc/hosts (mode: -rw-------)"}                                                                                                      
{"level":"info","ts":1623763248.6093442,"caller":"netconf/configurator.go:217","msg":"rendering hostname.tpl to /etc/hostname (mode: -rw-r--r--)"}                                                                                                
{"level":"info","ts":1623763248.609517,"caller":"netconf/configurator.go:217","msg":"rendering frr.machine.tpl to /etc/frr/frr.conf (mode: -rw-------)"}                                                                                          
{"level":"info","ts":1623763248.6096277,"caller":"netconf/frr.go:119","msg":"running 'vtysh --dryrun --inputfile /etc/metal/networker/frr_677544675' to validate changes.'"}                                                                      
{"level":"info","ts":1623763248.6980484,"caller":"cmd/root.go:97","msg":"completed. Exiting.."}
System was booted with UEFI                                                                                                                                                                                                                       
Generating grub configuration file ...                                                                                                                                                                                                            
Found linux image: /boot/vmlinuz-3.10.0-1160.25.1.el7.x86_64                                                                                                                                                                                      
Found initrd image: /boot/initramfs-3.10.0-1160.25.1.el7.x86_64.img                                                                                                                                                                               
/usr/sbin/grub2-probe: error: disk `md1' not found.                                                                                                                                                                                               
/usr/sbin/grub2-probe: error: disk `md1' not found.                                                                                                                                                                                               
/usr/sbin/grub2-probe: error: disk `md1' not found.                                                                                                                                                                                               
done                                                                                                                                                                                                                                              
Installing for x86_64-efi platform.                                                                                                                                                                                                               
grub2-install: error: disk `md0' not found.                                                                                                                                                                                                       
exit status 1                                                                                                                                                                                                                                     
running install.sh in chroot failed                                                                                                                                                                                                               
github.com/metal-stack/metal-hammer/cmd.(*Hammer).install                                                                                                                                                                                         
        /work/cmd/install.go:117                                                                                                                                                                                                                  
github.com/metal-stack/metal-hammer/cmd.(*Hammer).Install                                                                                                                                                                                         
        /work/cmd/install.go:68                                                                                                                                                                                                                   
github.com/metal-stack/metal-hammer/cmd.(*Hammer).installImage                                                                                                                                                                                    
        /work/cmd/root.go:255                                                                                                                                                                                                                     
github.com/metal-stack/metal-hammer/cmd.Run                                                                                                                                                                                                       
        /work/cmd/root.go:248                                                                                                                                                                                                                     
main.main                                                                                                                                                                                                                                         
        /work/main.go:74                                                                                                                                                                                                                          
runtime.main                                                                                                                                                                                                                                      
        /usr/local/go/src/runtime/proc.go:225                                                                                                                                                                                                     
runtime.goexit                                                                                                                                                                                                                                    
        /usr/local/go/src/runtime/asm_amd64.s:1371                                                                                                                                                                                                
install        
github.com/metal-stack/metal-hammer/cmd.(*Hammer).installImage
        /work/cmd/root.go:259      
github.com/metal-stack/metal-hammer/cmd.Run
        /work/cmd/root.go:248
main.main                                                                                                                                                                                                                                         
        /work/main.go:74
runtime.main                                                                                                                                                                                                                                      
        /usr/local/go/src/runtime/proc.go:225
runtime.goexit                                                                                                                                                                                                                                    
        /usr/local/go/src/runtime/asm_amd64.s:1371                                                                                                                                                                                                
main.main                                                                                                                                                                                                                                         
        /work/main.go:77
runtime.main                                                                                                                                                                                                                                      
        /usr/local/go/src/runtime/proc.go:225                                                                                                                                                                                                     
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1371EROR[06-15|13:20:49] metal-hammer failed                      rebooting in=5s error="install: running install.sh in chroot failed: exit status 1" caller=main.go:79

disable fw-lldp seems not to work anymore on some of the s2 machines

INFO[01-16|16:45:28] lldpd                                    listen on=eth0 caller=server.go:51                                                                                                                   
INFO[01-16|16:45:28] lldpd                                    interface=eth0 interval=5s caller=server.go:71                                                                                                       
INFO[01-16|16:45:28] lldpd                                    listen on=eth1 caller=server.go:51                                                                                                                   
INFO[01-16|16:45:28] lldpd                                    interface=eth1 interval=5s caller=server.go:71                                                                                                       
INFO[01-16|16:45:28] lldpd                                    listen on=eth2 caller=server.go:51                                                                                                                   
INFO[01-16|16:45:28] lldpd                                    interface=eth2 interval=5s caller=server.go:71                                                                                                       
INFO[01-16|16:45:29] lldpd                                    listen on=eth3 caller=s[   10.878908] i40e 0000:b7:00.0: Device does not support changing FW LLDP                                                    
erver.go:51                                                                                                                                                                                                        
INFO[01-16|16:45:29] lldpd                                    interface=eth3 interval=5s[   10.896674] i40e 0000:b7:00.1: Device does not support changing FW LLDP                                                 
 caller=server.go:71                                                                                                                                                                                               
INFO[01-16|16:45:29] ethtool                                  interface=eth4 disable-fw-lldp is set to=off caller=ethtool.go:77                                                                                    
EROR[01-16|16:45:29] ethtool                                  interface=eth4 error disabling fw-lldp try to stop it="exit status 1" caller=ethtool.go:83                                                           
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.0/command caller=ethtool.go:107                                                     
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys[   10.963468] i40e 0000:b7:00.2: Device does not support changing FW LLDP
/kernel/debug/i40e/0000:b7:00.1/command caller=ethtool.go:107                                                                                                                                                      
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.2/command caller=ethtool.go:107
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/de[   11.003758] i40e 0000:b7:00.3: Device does not support changing FW LLDP
bug/i40e/0000:b7:00.3/command caller=ethtool.go:107                                                                                                                                                                
INFO[01-16|16:45:29] lldpd                                    listen on=eth4 caller=server.go:51                                                                                                                   
INFO[01-16|16:45:29] lldpd                                    interface=eth4 interval=5s caller=server.go:71
INFO[01-16|16:45:29] ethtool                                  interface=eth5 disable-fw-lldp is set to=off caller=ethtool.go:77
EROR[01-16|16:45:29] ethtool                                  interface=eth5 error disabling fw-lldp try to stop it="exit status 1" caller=ethtool.go:83
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.0/command caller=ethtool.go:107
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.1/command caller=ethtool.go:107
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.2/command caller=ethtool.go:107
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.3/command caller=ethtool.go:107
INFO[01-16|16:45:29] lldpd                                    listen on=eth5 caller=server.go:51                                                                                                                   
INFO[01-16|16:45:29] lldpd                                    interface=eth5 interval=5s caller=server.go:71
INFO[01-16|16:45:29] ethtool                                  interface=eth6 disable-fw-lldp is set to=off caller=ethtool.go:77
EROR[01-16|16:45:29] ethtool                                  interface=eth6 error disabling fw-lldp try to stop it="exit status 1" caller=ethtool.go:83
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.0/command caller=ethtool.go:107
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.1/command caller=ethtool.go:107
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.2/command caller=ethtool.go:107
INFO[01-16|16:45:29] ethtool                                  stopFirmwareLLDP found command=/sys/kernel/debug/i40e/0000:b7:00.3/command caller=ethtool.go:107
INFO[01-16|16:45:29] lldpd                                    listen on=eth6 caller=server.go:51
INFO[01-16|16:45:29] lldpd                                    interface=eth6 interval=5s caller=server.go:71
INFO[01-16|16:45:29] ethtool                                  interface=eth7 disable-fw-lldp is set to=off caller=ethtool.go:77
EROR[01-16|16:45:29] ethtool                                  interface=eth7 error disabling fw-lldp try to stop it="exit status 1" caller=ethtool.go:83

machine ID: 00000000-0000-0000-0000-002590b99fba
partition: fel-wps101

machines not entering wait-for-allocation mode

Sometimes, machines emit "waiting for allocation", but are actually not waiting for allocation.

Output:

INFO[12-06|09:32:25] bios                                     message="successfully configured BIOS" caller=bios.go:23
INFO[12-06|09:32:25] event                                    event=Waiting message="waiting for allocation" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 55
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Waiting","message":"waiting for allocation"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:32:25 GMT


DBUG[12-06|09:32:27] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:32] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:37] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
INFO[12-06|09:32:39] event                                    event=Alive message="still alive at: 2021-12-06 09:32:39.587602281 +0000 UTC m=+60.547598084" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 102
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Alive","message":"still alive at: 2021-12-06 09:32:39.587602281 +0000 UTC m=+60.547598084"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:32:39 GMT


DBUG[12-06|09:32:42] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:47] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:52] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:32:57] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:03] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:08] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:13] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:18] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:23] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:28] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:33] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
DBUG[12-06|09:33:38] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
INFO[12-06|09:33:39] event                                    event=Alive message="still alive at: 2021-12-06 09:33:39.587209833 +0000 UTC m=+120.547205638" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 103
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Alive","message":"still alive at: 2021-12-06 09:33:39.587209833 +0000 UTC m=+120.547205638"}

Expected output:

INFO[12-06|09:36:43] bios                                     message="successfully configured BIOS" caller=bios.go:23
INFO[12-06|09:36:43] event                                    event=Waiting message="waiting for allocation" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 55
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Waiting","message":"waiting for allocation"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:36:43 GMT


DBUG[12-06|09:36:44] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
INFO[12-06|09:36:48] wait for allocation...                   machineID=33cf1200-0e3b-11eb-8000-3cecef47709a caller=wait.go:57
DBUG[12-06|09:36:49] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71
INFO[12-06|09:36:53] wait for allocation...                   machineID=33cf1200-0e3b-11eb-8000-3cecef47709a caller=wait.go:57
DBUG[12-06|09:36:55] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf02 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:7d:59 Port:Mac:90:3c:b3:77:7d:b1" caller=lldpclient.go:71
INFO[12-06|09:36:56] event                                    event=Alive message="still alive at: 2021-12-06 09:36:56.753981147 +0000 UTC m=+60.559026527" caller=event.go:62
POST /machine/33cf1200-0e3b-11eb-8000-3cecef47709a/event HTTP/1.1
Host: 10.255.255.5:4242
User-Agent: Go-http-client/1.1
Content-Length: 102
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"event":"Alive","message":"still alive at: 2021-12-06 09:36:56.753981147 +0000 UTC m=+60.559026527"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Mon, 06 Dec 2021 09:36:56 GMT


INFO[12-06|09:36:58] wait for allocation...                   machineID=33cf1200-0e3b-11eb-8000-3cecef47709a caller=wait.go:57
DBUG[12-06|09:37:00] lldp                                     detectedNeighbor="Name:fel-wps101-r02leaf01 Desc:Cumulus Linux version 3.7.13 running on Accton AS7712-32X Chassis:Mac:90:3c:b3:77:77:59 Port:Mac:90:3c:b3:77:77:b1" caller=lldpclient.go:71

Some machines complain about invalid GPT header

This can be currently on seen on machines with this board: X11SDV-8C-TP8F

{"event":"Installing","message":"start installation"}

HTTP/1.1 200 OK
Content-Length: 0
Date: Wed, 08 Sep 2021 10:38:12 GMT

INFO[09-08|10:38:12] sgdisk create partitions                 command="[--zap-all --new=1:0:+500M --change-name=1:efi --typecode=1:ef00 --new=2:0:+5000M --change-name=2:root --typecode=2:8300 --new=3:0:+0M --change-name=3:varlib --typecode=3:8300 /dev/sde]" caller=filesystem.go:119
Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!

Invalid partition data!
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
EROR[09-08|10:38:13] sgdisk creating partitions failed        error="exit status 2" caller=filesystem.go:122
create partitions failed:unable to create partitions on /dev/sde: exit status 2
install
github.com/metal-stack/metal-hammer/cmd.(*Hammer).installImage
        /work/cmd/root.go:259
github.com/metal-stack/metal-hammer/cmd.Run
        /work/cmd/root.go:248
main.main
        /work/main.go:74
runtime.main
        /usr/local/go/src/runtime/proc.go:225
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1371
main.main
        /work/main.go:77
runtime.main
        /usr/local/go/src/runtime/proc.go:225
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1371EROR[09-08|10:38:13] metal-hammer failed                      rebooting in=5s error="install: create partitions failed:unable to create partitions on /dev/sde: exit status 2" caller=main.go:79

@mwennrich

No matching versions for query "latest". Can't compile Go files with gollvm

Hi.

I am experiencing such an issue:

metal-hammer/cmd$ go build -v -x
WORK=/tmp/go-build852402972
go: finding module for package github.com/metal-stack/metal-hammer/metal-core/client/machine
# get https://proxy.golang.org/github.com/@v/list
go: finding module for package github.com/metal-stack/metal-hammer/metal-core/client/certs
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/client/@v/list
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/client/machine/@v/list
# get https://proxy.golang.org/github.com/metal-stack/@v/list
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/client/certs/@v/list
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/@v/list
go: finding module for package github.com/metal-stack/metal-hammer/metal-core/models
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/models/@v/list
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/@v/list: 410 Gone (1.149s)
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/models/@v/list: 410 Gone (0.825s)
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/client/@v/list: 410 Gone (1.150s)
# get https://proxy.golang.org/github.com/@v/list: 410 Gone (1.152s)
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/client/machine/@v/list: 410 Gone (1.150s)
# get https://proxy.golang.org/github.com/metal-stack/metal-hammer/metal-core/client/certs/@v/list: 410 Gone (1.139s)
# get https://proxy.golang.org/github.com/metal-stack/@v/list: 410 Gone (1.150s)
mkdir -p /home/oceanfish81/go/pkg/mod/cache/vcs # git3 https://github.com/metal-stack/metal-hammer
# lock /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1.lock# /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1 for git3 https://github.com/metal-stack/metal-hammer
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git ls-remote -q origin
# get https://github.com/?go-get=1
# get https://github.com/?go-get=1: 200 OK (0.180s)
1.436s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git ls-remote -q origin
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git -c log.showsignature=false log -n1 '--format=format:%H %ct %D' b1688f4 --
0.112s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git -c log.showsignature=false log -n1 '--format=format:%H %ct %D' b1688f4 --
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.075s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.127s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.122s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.083s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.077s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.064s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git -c log.showsignature=false log -n1 '--format=format:%H %ct %D' b1688f4 --
0.132s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git -c log.showsignature=false log -n1 '--format=format:%H %ct %D' b1688f4 --
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.097s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.103s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.078s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
0.067s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git for-each-ref --format %(refname) refs/tags --merged b1688f4
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/client/machine/go.mod
0.043s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/client/machine/go.mod
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/client/go.mod
0.058s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/client/go.mod
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/client/certs/go.mod
0.044s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/client/certs/go.mod
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/models/go.mod
0.062s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/models/go.mod
cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/go.mod
0.110s # cd /home/oceanfish81/go/pkg/mod/cache/vcs/622413d38f6c59bcd46b0a1920eaf0b773f6f582a2351214bc1c1811b11886a1; git cat-file blob b1688f4:metal-core/go.mod
root.go:15:2: no matching versions for query "latest"
event/event.go:8:2: no matching versions for query "latest"
event/event.go:9:2: no matching versions for query "latest"

Something is wrong with the dependency management/metadata.

here is what & how I am using:

$ go version && go env
go version go1.15.2 gollvm LLVM 12.0.0git linux/amd64
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/oceanfish81/.cache/go-build"
GOENV="/home/oceanfish81/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/oceanfish81/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/oceanfish81/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/oceanfish81/gollvm_dist"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/oceanfish81/gollvm_dist/tools"
GCCGO="/home/oceanfish81/gollvm_dist/bin/llvm-goc"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/home/oceanfish81/metal-hammer/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build094063708=/tmp/go-build -gno-record-gcc-switches -funwind-tables"

update SUM to 2.7.0

Since sum 2.7.0 we get:

********************************<<<<<ERROR>>>>>*********************************

ExitCode                = 8
Description             = Can not open file
Program Error Code      = 202.2
Error message:
        acpica_bin/acpidump is required for GetCurrentBiosCfg command. 
Instruction:
        It is included in SUM release package.

Ensure kexec does not get executed in any error situation

It must be prevented that the installation boots the OS kernel if any error occurs before. This can lead to situations where for example:

  • no report was triggered
  • boot order was not modified
  • etc.

One possible location would be here:

metal-hammer/cmd/root.go

Lines 255 to 266 in 390024d

err = rep.ReportInstallation()
if err != nil {
wait := 10 * time.Second
log.Error("report installation failed", "reboot in", wait, "error", err)
time.Sleep(wait)
if !h.Spec.DevMode {
err = kernel.Reboot()
if err != nil {
log.Error("reboot", "error", err)
}
}
}

In case kernel.Reboot() does return an error, still kexec is executed.

The whole code paths should be analysed as well.

@kolsa

Fetch BIOS config with sum before entering wait mode

When a supermicro machine gets allocated, we need to wait for sum command completion twice. This takes quite some time and ruined our one-minute machine provisioning time mark.

Theoretically, it wold be possible to fetch the BIOS config shortly before entering the wait mode, such that we only need to run the update command with sum during machine provisioning. I am not sure if this can be dangerous because the BIOS config needs to reflect the current state of the machine, but I am also unsatisfied that the sum tool is slowing us down + being super weird. Is there any reason why this must be during provisioning time?

Ensure BMC root user creation

The first thing the metal-hammer should do, is to create the root user for the BMC.
Otherwise an other error will prevent the user creation and therefore the machine will boot in a loop without appearing in the partition overview/issues.
/cc @majst01

Nil pointer dereference

      metal-hammer             .
                               |\
    .--------------.___________) \ 
    |//////////////|___________[ ] 
    '--------------'           ) ( 
            by metal-stack.io  '-'

2000-01-01T17:49:39.568Z	debug	connect	{"vendor": "Vendor:Supermicro Model:X11SDD-8C-F"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "BMC ARP Control", "value": "ARP Responses Enabled, Gratuitous ARP Disabled"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "RMCP+ Cipher Suites", "value": "1,2,3,6,7,8,11,12"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Bad Password Threshold", "value": "3"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Invalid password disable", "value": "yes"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "MAC Address", "value": "3c:ec:ef:da:b1:77"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "802.1q VLAN Priority", "value": "0"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Attempt Count Reset Int.", "value": "300"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Cipher Suite Priv Max", "value": "XaaaXXaaaXXaaXX"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "User Lockout Interval", "value": "300"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "IP Address Source", "value": "DHCP Address"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Subnet Mask", "value": "255.255.255.0"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "IP Header", "value": "TTL=0x00 Flags=0x00 Precedence=0x00 TOS=0x00"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Default Gateway IP", "value": "10.140.11.1"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Backup Gateway MAC", "value": "00:00:00:00:00:00"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "802.1q VLAN ID", "value": "Disabled"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Backup Gateway IP", "value": "0.0.0.0"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Set in Progress", "value": "Set Complete"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Auth Type Support", "value": "NONE MD2 MD5 PASSWORD"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Auth Type Enable", "value": "Callback : MD2 MD5 PASSWORD"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "IP Address", "value": "10.140.11.17"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "SNMP Community String", "value": "public"}
2000-01-01T17:49:39.674Z	debug	output	{"key": "Default Gateway MAC", "value": "00:00:00:00:00:00"}
2000-01-01T17:49:39.949Z	debug	output	{"key": "Product Serial", "value": "E340538X3901084A"}
2000-01-01T17:49:39.949Z	debug	output	{"key": "FRU Device Description", "value": "Builtin FRU Device (ID 0)"}
2000-01-01T17:49:39.949Z	debug	output	{"key": "Chassis Part Number", "value": "CSE-938NH-R1K68BP"}
2000-01-01T17:49:39.949Z	debug	output	{"key": "Board Mfg Date", "value": "Mon Nov 21 13:50:00 2022"}
2000-01-01T17:49:39.949Z	debug	output	{"key": "Board Part Number", "value": "X11SDD-8C-F"}
2000-01-01T17:49:39.949Z	debug	output	{"key": "Board Serial", "value": "WM22BS014604"}
2000-01-01T17:49:39.949Z	debug	output	{"key": "Product Manufacturer", "value": "Supermicro"}
2000-01-01T17:49:39.950Z	debug	output	{"key": "Product Part Number", "value": "SYS-5039MD8-H8TNR"}
2000-01-01T17:49:39.950Z	debug	output	{"key": "Chassis Type", "value": "Other"}
2000-01-01T17:49:39.950Z	debug	output	{"key": "Chassis Serial", "value": "C9380AK34PA0111"}
2000-01-01T17:49:39.950Z	debug	output	{"key": "Board Mfg", "value": "Supermicro"}
2000-01-01T17:49:39.950Z	debug	output	{"key": "Board Product", "value": "X11SDD-8C-F"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Aux Firmware Rev Info", "value": ""}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Manufacturer ID", "value": "10876"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Product Name", "value": "Unknown (0x1B1C)"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Firmware Revision", "value": "3.75"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "IPMI Version", "value": "2.0"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Manufacturer Name", "value": "Supermicro"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Product ID", "value": "6940 (0x1b1c)"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Device Available", "value": "yes"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Provides Device SDRs", "value": "no"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Device ID", "value": "32"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Device Revision", "value": "1"}
2000-01-01T17:49:39.983Z	debug	output	{"key": "Additional Device Support", "value": ""}
2000-01-01T17:49:39.986Z	info	sshd started, connect via ssh -i metal.key [email protected]
2000-01-01T17:49:39.987Z	info	metal-hammer	{"version": "v0.11.6 (c9d5d171), tags/v0.11.6-0-gc9d5d17, 2023-11-16T08:00:01Z, go1.20.5", "hal": "InBand connected to Supermicro"}
2000-01-01T17:49:39.987Z	info	configuration	{"debug": true, "pixieAPIUrl": "http://10.140.112.1:4242/certs", "bgpenabled": false, "cidr": "", "machineUUID": "6432f400-69a3-11ed-8000-3cecefc0fd0c", "ip": "10.140.112.17"}
2000-01-01T17:49:39.988Z	info	metal-hammer run	{"firmware": "efi", "bios": "version:1.4a vendor:American Megatrends Inc. date:07/17/2023"}
2000-01-01T17:49:54.996Z	error	failed to fetch GRPC certificates	{"error": "context dea[   75.900769] watchdog: watchdog0: watchdog did not stop!
dline exceeded"}
2000-01-01T17:49:54.996Z	error	metal-hammer failed	{"rebooting in": 5, "error": "context deadline exceeded"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x94cda5]

goroutine 1 [running]:
github.com/metal-stack/metal-hammer/cmd/event.(*EventEmitter).Emit(0x0, {0x100de53, 0x7}, {0xc000694920, 0x19})
	/work/cmd/event/event.go:56 +0x65
main.main()
	/work/main.go:70 +0x6ec

Consider to create /var/lib/ on the nvme device instead the sata-dom

In some situations where the worker node is in a very stressful situation, with lots of containers creating, the time to get configmaps/secrets mounted increases a lot. This slows down the start of all required containers.

If a nvme device is present use this to create the /var/lib filesystem on this device.

There are several options to do so:

  • create a partition on the first nvme device and simple put a filesystem on it
  • create a namespace on the first nvme device
  • create a namespace on all existing nvme devices and create a raid1 above (this will left the size of both devices in the same size for usage by csi-driver-lvm)
  • create 2 namespaces one on every disk, mount them as /var/lib and /var/log

Feature request: booting a live image for debugging/maintenance reasons

For debugging/maintenance reasons sometimes it's useful and neccesary to boot from an live OS without touching the operating system on the disk.

Something like:

metalctl machine rescue (--ipmi) <MACHINEID>

due the fact that this is a manual operation it also makes sense to jump directly to the serial console after this command.

Unable to consistently set BMC superuser for s2-xlarge-x86

Only happens for one of our s2-xlarge-x86 machines. Even after upgrading to latest BIOS and BMC firmware (1.5 and 1.73.12) this does not work anymore. :(

INFO[11-11|12:51:21] run ipmitool                             args="[user set name 4 root]" output= error="exit status 1" caller=log15.go:22                                                                       
EROR[11-11|12:51:21] failed to update bmc superuser password  error="failed to create bmc superuser: root failed set username for user root with id 4: : exit status 1" caller=root.go:84                          
EROR[11-11|12:51:21] metal-hammer failed                      rebooting in=5s error="failed to create bmc superuser: root failed set username for user root with id 4: : exit status 1" caller=main.go:76          
INFO[11-11|12:51:21] event                                    event=Crashed message="failed to create bmc superuser: root failed set username for user root with id 4: : exit status 1" caller=event.go:62         
POST /machine/00000000-0000-0000-0000-ac1f6b7d7efa/event HTTP/1.1                                                                                                                                                  
Host: 10.255.255.4:4242                                                                                                                                                                                            
User-Agent: Go-http-client/1.1                                                                                                                                                                                     
Content-Length: 130                                                                                                                                                                                                
Accept: application/json                                                                                                                                                                                           
Content-Type: application/json                                                                                                                                                                                     
Accept-Encoding: gzip                                                                                                                                                                                              
                                                                                                                                                                                                                   
{"event":"Crashed","message":"failed to create bmc superuser: root failed set username for user root with id 4: : exit status 1"}    

Ensure csm-support is disabled

metal-hammer currently ensures UEFI mode via

if firmware != "efi" && !spec.DevMode {
    err = hammer.EnsureUEFI()
}

This additionally disables the csm-support, which is required.

We observed the case that there are a few machines booting in UEFI mode, but have the csm-support enabled. Thus the check above will skip hammer.EnsureUEFI() and therewith missing to disable csm-support.

Solution: metal-hammer needs to always read BIOS settings via sum tool and check not only for UEFI mode but also for disabled csm-support.

replace github.com/jaypipes/ghw with stdlib and a 4liner

We use this external dependency just for the sake of gathering CPUCount and Memory, for CPUCount there is:

runtime.NumCPU

and for Memory we can copy the work of https://github.com/pbnjay/memory which gathers memory with a syscall like:

// +build linux

package memory

import "syscall"

func sysTotalMemory() uint64 {
	in := &syscall.Sysinfo_t{}
	err := syscall.Sysinfo(in)
	if err != nil {
		return 0
	}
	// If this is a 32-bit system, then these fields are
	// uint32 instead of uint64.
	// So we always convert to uint64 to match signature.
	return uint64(in.Totalram) * uint64(in.Unit)
}

should be sufficient for our usecase and removes one external dependency.

/run is not present

INFO[09-13|12:43:49] mount                                    source=/run target=/rootfs/run fstype= flags=4096 data= caller=filesystem.go:421
EROR[09-13|12:43:49] metal-hammer failed                      rebooting in=5s error="install mount special filesystems failed:mounting /run to /run failed no such file or directory " caller=main.go:76

Need for a technical bmc user for maintenance purposes inside a partition

There is a need for a technical user to achieve maintenance work inside a partition from the local network.
The user and the strong password should be all the same among the bmcs in a partition.
For additional security reasons, it also makes sense to enable the "IP Access Control" from specific source IPs.

Machines booting with inconsistent UUIDs

We heavily rely on consistent / steady UUIDs of machines as the UUID is the primary key of the machine entity in the metal-api.

For machines, the UUID is not generated in the metal-api but comes from the machine itself: It's emitted in the extended DHCP request of the PXE boot procedure (option 97 as described in RFC4578) and it can later also be read from /sys/class/dmi/id/product_uuid. This information from the DHCP extended broadcast allows us to track the machine lifecycle very early in the game (from the first time it tries to PXE boot).

The problem (at least with some of our Supermicro servers) is that the machine UUID that a machine reports consists of the MAC address from one of the network cards available for the PXE boot. When deactivating certain network cards through BIOS, the UUID may change.

When this happens, this results into an orphaned machine (liveliness dead) in the metal-api and a new machine with a fresh UUID that's directly sent into the crash loop because the metal-core prevents it from retrieving a boot image (see this code fragment).

Different approaches for handling this issue might be:

  • Configure the BIOS to only boot from a single network interface either trough manual configuration or automated in the metal-hammer (first approach causes high operational overhead, second approach probably introduces another big burden to integrate vendor support into go-hal)
  • Generate the UUID for a machine in the metal-api and try not to rely on /sys/class/dmi/id/product_uuid (somehow tolerate machines without UUIDs but only with MAC addresses in the initial bootstrap phase --> give them their UUID in the register function)
  • Somehow deal with multiple UUIDs for a single machine in the metal-api (there we know all the machine's mac addresses, but it's very unclear how this should be done without being extremely confusing)
  • ...<your ideas?>

remote logging

with log15 we can do:

h := log.MultiHandler(
		log.CallerFileHandler(log.StdoutHandler),
		log.Must.NetHandler("udp", "192.168.121.1:4713", log.JsonFormat()),
	)

and on the receiver side:

nc -lv 4713

But every log entry must then contain the machineID to be able to filter for a specific machine as the receiver (e.g. metal-core) will receive logs from all machines.

If we switch logger to zerolog for example, we can do remote logging via grpc:
https://github.com/rs/zerolog
https://github.com/cheapRoc/grpc-zerolog
Which is more convenient than udp sockets.

Or more general with netconsole from the metal-hammer-kernel:

https://wiki.archlinux.org/index.php/Netconsole

Is a weird interface but should be possible to log to the management server(s). Every metal-core can set a different management server on wish to the kernel cmdline.

The developer of golang netlink has a golang server for the netconsole standard: https://github.com/mdlayher/netconsole and the MAC-address is optional!

But still, we need to add the machineID somehow to the logs to have the context, otherwise it will all be random garbage. But pixiecore already knows from pxe magic the systemid which is the machineid in our case, @mwindower did this magic to get the pxe booting event reported. Might be we can use that mechanism to configure console logging with this as prefix ??

No, prefixing is not possible from a kernel perspective. Must be done on the receiving side with a lookup on the macaddress.

I added netconsole support to the metal-hammer kernel to have at least a start.

/cc @mwennrich @Gerrit91

Report Power usage and consider deep sleep

In order to save energy we should put the cpu's into deep sleep state as long as we stay in the long polling loop.

As we only have Intel CPU's with pstate enabled, all documentations relevant are here:

https://www.kernel.org/doc/html/v4.20/admin-guide/pm/intel_pstate.html

We can read power consumption on the machine with:

ipmitool dcmi power reading

    Instantaneous power reading:                    74 Watts
    Minimum during sampling period:                 50 Watts
    Maximum during sampling period:                195 Watts
    Average power reading over sample period:       56 Watts
    IPMI timestamp:                           Mon Jun 24 19:52:40 2019
    Sampling period:                          00278782 Seconds.
    Power reading state is:                   activated

which could probably be reported back to the api in a event to see the current power consumption.

Filesystem Layouts with LVM does not work for debian:10

[   15.259786] raid6: .... xor() 7916 MB/s, rmw enabled
[   15.264754] raid6: using avx512x2 recovery algorithm
[   15.270918] xor: automatically using best checksumming function   avx       
[   15.278818] async_tx: api initialized (async)
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Run[   15.312198]  sda: sda1
ning /scripts/lo[   15.316179] sd 12:0:0:0: [sda] Attached SCSI disk
cal-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.
Begin: Running /scripts/local-block ... mdadm: No arrays found in config file or automatically
done.

works with debian-11 and ubuntu-20.04 though

centos also does not work:

[  202.881743] dracut-initqueue[371]: Warning: dracut-initqueue timeout - starting timeout scripts
[  203.412749] dracut-initqueue[371]: Warning: dracut-initqueue timeout - starting timeout scripts
[  203.423793] dracut-initqueue[371]: Warning: Could not boot.
[  203.431733] dracut-initqueue[371]: Warning: /dev/disk/by-uuid/433a04f8-1b21-4f2a-8806-b813adc7e260 does not exist
         Starting Setup Virtual Console...
[  OK  ] Started Setup Virtual Console.
         Starting Dracut Emergency Shell...
Warning: /dev/disk/by-uuid/433a04f8-1b21-4f2a-8806-b813adc7e260 does not exist

Generating "/run/initramfs/rdsosreport.txt"


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


dracut:/# 

Wild guess: This only occurs for lvm based root, probably a lvm module is missing in the initrd.

debian-10 works perfectly fine with lvm filesystem not on /

check health of attached disks

machines should only get to state "Waiting" if all attached disks are usable

maybe a small fdisk and newfs or something (when not in "reinstall" mode)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.