Giter Site home page Giter Site logo

metal-stack / mini-lab Goto Github PK

View Code? Open in Web Editor NEW
54.0 13.0 13.0 17.95 MB

a small, virtual setup to locally run the metal-stack

License: MIT License

Shell 25.01% Python 32.50% Makefile 23.27% Smarty 10.03% Jinja 7.16% Dockerfile 2.04%
bare-metal vagrant cumulus-vx ansible metal

mini-lab's Introduction

mini-lab

The mini-lab is a small, virtual setup to locally run the metal-stack. It deploys the metal control plane and a metal-stack partition with two simulated leaf switches. The lab can be used for trying out metal-stack, demonstration purposes or development.

overview components

ℹ This project can also be used as a template for writing your own metal-stack deployments.

Requirements

  • Linux machine with hardware virtualization support
  • kvm as hypervisor for the VMs (you can check through the kvm-ok command)
  • docker >= 20.10.13 (for using kind and our deployment base image)
  • kind == v0.20.0 (for hosting the metal control plane)
  • containerlab >= v0.47.1
  • the lab creates a docker network on your host machine (172.17.0.1), this hopefully does not overlap with other networks you have
  • (recommended) haveged to have enough random entropy (only needed if the PXE process does not work)

Here is some code that should help you to set up most of the requirements:

# If UFW enabled.
# Disable the firewall or allow traffic through Docker network IP range.
sudo ufw status
sudo ufw allow from 172.17.0.0/16

# Install kvm
sudo apt install -y git curl qemu qemu-kvm haveged

# Install Docker
curl -fsSL https://get.docker.com | sh
# if you want to be on the safe side, follow the original installation
# instructions at https://docs.docker.com/engine/install/ubuntu/

# Ensure that your user is member of the group "docker"
# you need to login again in order to make this change take effect
sudo usermod -G docker -a ${USER}

# Install containerlab
bash -c "$(curl -sL https://get.containerlab.dev)"

# Install kind (kubernetes in docker), for more details see https://kind.sigs.k8s.io/docs/user/quick-start/#installation
sudo curl -Lo /usr/local/bin/kind "https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64"
sudo chmod +x /usr/local/bin/kind

The following ports are used statically on your host machine:

Port Bind Address Description
6443 0.0.0.0 kube-apiserver of the kind cluster
4443 0.0.0.0 HTTPS ingress
4150 0.0.0.0 nsqd
8080 0.0.0.0 HTTP ingress

Known Limitations

  • to keep the demo small there is no EVPN
  • machine restart and destroy does not work because we cannot change the boot order via IPMI in the lab easily (virtual-bmc could, but it's buggy)
  • login to the machines is possible with virsh console, login to the firewall is possible with SSH from your local machine

Try it out

git clone https://github.com/metal-stack/mini-lab.git
cd mini-lab

Start the mini-lab with a kind cluster, a metal-api instance as well as two containers wrapping leaf switches and another container that hosts two user-allocatable machines:

make
# containerlab will ask you for root permissions (https://github.com/srl-labs/containerlab/issues/669)

After the deployment and waiting for a short amount of time, two machines in status PXE booting become visible through metalctl machine ls:

docker compose run --rm metalctl machine ls

ID                                          LAST EVENT   WHEN     AGE  HOSTNAME  PROJECT  SIZE          IMAGE  PARTITION
e0ab02d2-27cd-5a5e-8efc-080ba80cf258        PXE Booting  3s
2294c949-88f6-5390-8154-fa53d93a3313        PXE Booting  5s

Wait until the machines reach the waiting state:

docker compose run --rm metalctl machine ls

ID                                          LAST EVENT   WHEN     AGE  HOSTNAME  PROJECT  SIZE          IMAGE  PARTITION
e0ab02d2-27cd-5a5e-8efc-080ba80cf258        Waiting      8s                               v1-small-x86         mini-lab
2294c949-88f6-5390-8154-fa53d93a3313        Waiting      8s                               v1-small-x86         mini-lab

Create a firewall and a machine with:

make firewall
make machine

Alternatively, you may want to issue the metalctl commands on your own:

docker compose run --rm metalctl network allocate \
        --partition mini-lab \
        --project 00000000-0000-0000-0000-000000000000 \
        --name user-private-network

# lookup the network ID and create a machine
docker compose run --rm metalctl machine create \
        --description test \
        --name machine \
        --hostname machine \
        --project 00000000-0000-0000-0000-000000000000 \
        --partition mini-lab \
        --image ubuntu-20.04 \
        --size v1-small-x86 \
        --networks <network-ID>

# create a firewall that is also connected to the virtual internet-mini-lab network
docker compose run --rm metalctl machine create \
        --description fw \
        --name fw \
        --hostname fw \
        --project 00000000-0000-0000-0000-000000000000 \
        --partition mini-lab \
        --image firewall-ubuntu-2.0 \
        --size v1-small-x86 \
        --networks internet-mini-lab,$(privatenet)

See the installation process in action

make console-machine01/02
...
Ubuntu 20.04 machine ttyS0

machine login:

Two machines are now installed and have status "Phoned Home"

docker compose run --rm metalctl machine ls
ID                                          LAST EVENT   WHEN   AGE     HOSTNAME  PROJECT                               SIZE          IMAGE                             PARTITION
e0ab02d2-27cd-5a5e-8efc-080ba80cf258        Phoned Home  2s     21s     machine   00000000-0000-0000-0000-000000000000  v1-small-x86  Ubuntu 20.04 20200331             mini-lab
2294c949-88f6-5390-8154-fa53d93a3313        Phoned Home  8s     18s     fw        00000000-0000-0000-0000-000000000000  v1-small-x86  Firewall 2 Ubuntu 20200730        mini-lab

Login with user name metal and the console password from

docker compose run --rm metalctl machine consolepassword e0ab02d2-27cd-5a5e-8efc-080ba80cf258

If you want to access the firewall with SSH or have internet connectivity from the firewall and machine, you'll need to have a static route configured that points to the leaf switches:

# Add the route to the network internet-mini-lab 100.255.254.0/24 via leaf01 and leaf02, whose IPs are dynamically allocated. Make sure there's no old route before execution.
make route

# Connect to the firewall
ssh [email protected]

To remove the kind cluster, the switches and machines, run:

make cleanup

Reinstall machine

Reinstall a machine with

docker compose run --rm metalctl machine reinstall \
        --image ubuntu-20.04 \
        e0ab02d2-27cd-5a5e-8efc-080ba80cf258

Free machine

Free a machine with make free-machine01 or

docker compose run --rm metalctl machine rm e0ab02d2-27cd-5a5e-8efc-080ba80cf258

Flavors

There's few versions of mini-lab environment that you can run. We call them flavors. There's 2 flavors at the moment:

  • default -- runs 2 machines.
  • cluster-api -- runs 3 machines. Useful for testing Control plane and worker node deployment with Cluster API provider.
  • sonic -- use SONiC as network operating system for the leaves

In order to start specific flavor, you can define the flavor as follows:

export MINI_LAB_FLAVOR=cluster-api
make

mini-lab's People

Contributors

domi-nik- avatar donimax avatar droid42 avatar fhaftmann avatar gerrit91 avatar grigoriymikhalkin avatar harmathy avatar iljarotar avatar jklippel avatar kolsa avatar limkianan avatar majst01 avatar muhittink avatar mwennrich avatar mwindower avatar robertvolkmann avatar suryamurugan avatar thanche1 avatar ulrichschreiner avatar vknabel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mini-lab's Issues

Machines hang after rebooting

I was trying to reboot machines instead of rebooting whole mini-lab for testing. First by running make delete-machine0x directive, after which machine goes to Planned Reboot state and then running make reboot-machine0x. After running last directive machines either hang in PXE Booting status(with 💀 appearing after some time) or staying in Planned Reboot state.

OS: Ubuntu 20.04
Vagrant : 2.2.9
Docker:

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:20 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

mini-lab is flaky

Sometimes the mini-lab starts up correctly and everything works nicely but there are situations that lead to this error:

reconfiguration failed        {"app": "metal-core", "error": "could not build switcher config: no vlan mapping could be determined for vxlan interface vniInternet", "errorVerbose": "no vlan mapping could be determined for vxlan interface vniInternet\ncould not build switcher config\ngithub.com/metal-stack/metal-core/internal/event.(*eventHandler).reconfigureSwitch\n\t/work/internal/event/reconfigureSwitch.go:65\ngithub.com/metal-stack/metal-core/internal/event.(*eventHandler).ReconfigureSwitch\n\t/work/internal/event/reconfigureSwitch.go:28\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"

The error can be seen by metalctl switch ls -o wide and is responsible for networking not working as expected.

env target does not work anymore

Since we resolve all versions from the release vector now, the env target in the Makefile does not work anymore, which defines the version of metalctl.

IP 192.168.121.1 is not created

hi,

i normally use only make control-plane to startup the control plane locally in a cluster. Sadly this does not work if the IP adress 192.168.121.1 does not exist. This IP is created implicitly by using vagrant to spin off some machines which are not needed by the control plane (and which i don't want to spin of only to gather this IP).

my workaround is to do a

sudo ip a add 192.168.121.1/24 dev eno1 label eno1:1

where eno1 is my main interface on my machine. it would be great to do this automatically somewhere in the whole machinery

Periodically failing to restart mini-lab

After some time after first start of mini-lab i'm failing to restart it. I get very similar error, but at different stages(so far i see errors at deploy-partition | TASK [metal-roles/partition/roles/docker-on-cumulus : ensure dependencies are installed], deploy-partition | TASK [ansible-common/roles/systemd-docker-service : pre-pull docker image] and deploy-partition | TASK [metal-roles/partition/roles/metal-core : wait for metal-core to listen on port]). Here is last error that i got:

deploy-control-plane | fatal: [localhost]: FAILED! => changed=true 
deploy-control-plane |   cmd:
deploy-control-plane |   - helm
deploy-control-plane |   - upgrade
deploy-control-plane |   - --install
deploy-control-plane |   - --namespace
deploy-control-plane |   - metal-control-plane
deploy-control-plane |   - --debug
deploy-control-plane |   - --set
deploy-control-plane |   - helm_chart.config_hash=7fc19e1bc1a3ee41f622c3de7bc98ee33756844e
deploy-control-plane |   - -f
deploy-control-plane |   - metal-values.j2
deploy-control-plane |   - --repo
deploy-control-plane |   - https://helm.metal-stack.io
deploy-control-plane |   - --version
deploy-control-plane |   - 0.2.1
deploy-control-plane |   - --wait
deploy-control-plane |   - --timeout
deploy-control-plane |   - 600s
deploy-control-plane |   - metal-control-plane
deploy-control-plane |   - metal-control-plane
deploy-control-plane |   delta: '0:10:02.713685'
deploy-control-plane |   end: '2020-12-09 08:47:29.432729'
deploy-control-plane |   msg: non-zero return code
deploy-control-plane |   rc: 1
deploy-control-plane |   start: '2020-12-09 08:37:26.719044'
deploy-control-plane |   stderr: |-
deploy-control-plane |     history.go:53: [debug] getting history for release metal-control-plane
deploy-control-plane |     install.go:172: [debug] Original chart version: "0.2.1"
deploy-control-plane |     install.go:189: [debug] CHART PATH: /root/.cache/helm/repository/metal-control-plane-0.2.1.tgz
deploy-control-plane |   
deploy-control-plane |     client.go:255: [debug] Starting delete for "metal-api-initdb" Job
deploy-control-plane |     client.go:284: [debug] jobs.batch "metal-api-initdb" not found
deploy-control-plane |     client.go:109: [debug] creating 1 resource(s)
deploy-control-plane |     client.go:464: [debug] Watching for changes to Job metal-api-initdb with timeout of 10m0s
deploy-control-plane |     client.go:492: [debug] Add/Modify event for metal-api-initdb: ADDED
deploy-control-plane |     client.go:531: [debug] metal-api-initdb: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
deploy-control-plane |     client.go:492: [debug] Add/Modify event for metal-api-initdb: MODIFIED
deploy-control-plane |     client.go:531: [debug] metal-api-initdb: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
deploy-control-plane |     Error: failed pre-install: timed out waiting for the condition
deploy-control-plane |     helm.go:81: [debug] failed pre-install: timed out waiting for the condition
deploy-control-plane |   stderr_lines: <omitted>
deploy-control-plane |   stdout: Release "metal-control-plane" does not exist. Installing it now.
deploy-control-plane |   stdout_lines: <omitted>
deploy-control-plane | 
deploy-control-plane | PLAY RECAP *********************************************************************
deploy-control-plane | localhost                  : ok=24   changed=11   unreachable=0    failed=1    skipped=8    rescued=0    ignored=0 

I'm using mini-lab on master branch with only change, metal_stack_release_version set to develop. Only thing that reliably helps is pruning everything(networks, build cache, containers, images) from docker.

OS: Ubuntu 20.04
Vagrant : 2.2.9
Docker:

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:20 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

cc @Gerrit91, @LimKianAn

Booting mini-lab might hang while setting up the network

Sometimes make up stops proceeding after setting up the first groups of wires. Though sometimes it proceeds without any issues. I cannot detect any pattern. When I run make down && make only 1 in 5 calls succeed. Waiting several minutes doesn't solve this either.

image (2)

Aborting the process prints ^Cmake: *** [Makefile:80: partition-bake] Error 130.

I already had a quick chat with @Gerrit91 on this. Here is some requested information:

❯ docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED         STATUS         PORTS                                                                                                                                              NAMES
9a951775b316   ghcr.io/metal-stack/mini-lab-vms:latest   "/mini-lab/vms_entry…"   3 minutes ago   Up 3 minutes                                                                                                                                                      vms
3b254f2982cf   grigoriymikh/sandbox:latest               "/usr/local/bin/igni…"   3 minutes ago   Up 3 minutes                                                                                                                                                      ignite-eb8de119eecdaa65
dc03782d709c   kindest/node:v1.24.0                      "/usr/local/bin/entr…"   3 minutes ago   Up 3 minutes   0.0.0.0:4150->4150/tcp, 0.0.0.0:4161->4161/tcp, 0.0.0.0:4443->4443/tcp, 0.0.0.0:6443->6443/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:50051->50051/tcp   metal-control-plane-control-plane
❯ make ssh-leaf01
ssh -o StrictHostKeyChecking=no -o "PubkeyAcceptedKeyTypes +ssh-rsa" -i files/ssh/id_rsa root@leaf01
ssh: Could not resolve hostname leaf01: Temporary failure in name resolution
make: *** [Makefile:142: ssh-leaf01] Error 255
❯ cat /etc/hosts
127.0.0.1	localhost
127.0.1.1	yubihill

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

docker logs ignite-eb8de119eecdaa65

Following Tutorial In Readme.md

Specs

  • Ubuntu 20.04.4 LTS (Focal Fossa)
  • Docker 20.10.14, build a224086
  • Docker-Compose 1.29.2, build 5becea4c
  • kind 0.12.0
  • containerlab 0.25.1
  • kvm ✅

Problem

Hi,

today I tried the tutorial found in the README.md. After several cleanups and restarts I did not get it to work. Every time creating the metal-core I got the following error:

deploy-partition | TASK [ansible-common/roles/systemd-docker-service : start service metal-core] ***
deploy-partition | changed: [leaf01]
deploy-partition | changed: [leaf02]
deploy-partition | 
deploy-partition | TASK [ansible-common/roles/systemd-docker-service : ensure service is started] ***
deploy-partition | ok: [leaf02]
deploy-partition | ok: [leaf01]
deploy-partition | 
deploy-partition | TASK [metal-roles/partition/roles/metal-core : wait for metal-core to listen on port] ***
deploy-partition | fatal: [leaf01]: FAILED! => changed=false 
deploy-partition |   elapsed: 300
deploy-partition |   msg: metal-core did not come up
deploy-partition | fatal: [leaf02]: FAILED! => changed=false 
deploy-partition |   elapsed: 300
deploy-partition |   msg: metal-core did not come up
deploy-partition | 
deploy-partition | PLAY RECAP *********************************************************************
deploy-partition | leaf01                     : ok=65   changed=47   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
deploy-partition | leaf02                     : ok=59   changed=43   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
deploy-partition | 
deploy-partition exited with code 2
docker exec vms /mini-lab/manage_vms.py --names machine01,machine02 create
Formatting '/machine01.img', fmt=qcow2 size=5368709120 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/machine02.img', fmt=qcow2 size=5368709120 cluster_size=65536 lazy_refcounts=off refcount_bits=16
QEMU 4.2.1 monitor - type 'help' for more information
(qemu) qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64 -name machine01 -uuid e0ab02d2-27cd-5a5e-8efc-080ba80cf258 -m 2G -boot n -drive if=virtio,format=qcow2,file=/machine01.img -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,file=/usr/share/OVMF/OVMF_VARS.fd -serial telnet:127.0.0.1:4000,server,nowait -enable-kvm -nographic -net nic,model=virtio,macaddr=aa:c1:ab:87:4e:82 -net nic,model=virtio,macaddr=aa:c1:ab:c1:29:2c -net tap,fd=30 30<>/dev/tap2 -net tap,fd=40 40<>/dev/tap3 &
qemu-system-x86_64 -name machine02 -uuid 2294c949-88f6-5390-8154-fa53d93a3313 -m 2G -boot n -drive if=virtio,format=qcow2,file=/machine02.img -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,file=/usr/share/OVMF/OVMF_VARS.fd -serial telnet:127.0.0.1:4001,server,nowait -enable-kvm -nographic -net nic,model=virtio,macaddr=aa:c1:ab:90:3a:db -net nic,model=virtio,macaddr=aa:c1:ab:46:52:e4 -net tap,fd=50 50<>/dev/tap4 -net tap,fd=60 60<>/dev/tap5 &
QEMU 4.2.1 monitor - type 'help' for more information
(qemu) qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@leaf01 -i files/ssh/id_rsa 'systemctl restart metal-core'
Warning: Permanently added 'leaf01,172.17.0.4' (ECDSA) to the list of known hosts.
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@leaf02 -i files/ssh/id_rsa 'systemctl restart metal-core'
Warning: Permanently added 'leaf02,172.17.0.3' (ECDSA) to the list of known hosts.

The error tells me, the host does not support a requested feature. I have found similar issues in other virtualization software like podman (see containers/podman#11479).

Is there something I missed during configuration of my machine or software?
Hopefully you could help me out here.
Best regards Julian

deploy-partition | elapsed: 300 deploy-partition | msg: metal-core did not come up

Tried 5 times, mini-lab fails to come up, whats missing

deploy-partition      | fatal: [leaf02]: FAILED! => changed=false 
deploy-partition      |   elapsed: 300
deploy-partition      |   msg: metal-core did not come up
deploy-partition      | fatal: [leaf01]: FAILED! => changed=false 
deploy-partition      |   elapsed: 300
deploy-partition      |   msg: metal-core did not come up
deploy-partition      | 
deploy-partition      | PLAY RECAP *********************************************************************
deploy-partition      | leaf01                     : ok=79   changed=51   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
deploy-partition      | leaf02                     : ok=63   changed=45   unreachable=0    failed=1    skipped=3    rescued=0    ignored=0   

Internet access for machines

Currently the provisioned machines in the mini-lab have no internet connection but we will need that if we want to take the mini-lab to the next level: having an integration to k8s orchestrators like Gardener or Cluster-API.

These are the TODOs for that:

  • enable firewall creation
    • needs an underlay network configured at the metal-api
    • needs a virtual internet network configured at the metal-api
  • make the leaf switches a VTEP for the virtual internet network
    • provide VXLAN, SVI and VRF interfaces for the virtual internet network with static parts in /etc/network/interfaces.d/ and parameters for metal-core: ADDITIONAL_BRIDGE_PORTS, ADDITIONAL_BRIDGE_VIDS
  • configure route leaking btw. the virtual internet network and the mgmt VRF on the leaf switches
    • the default route of the mgmt VRF is leaked to the virtual internet network VRF
    • the virtual internet network VRF is leaked to the mgmt VRF
  • masquerade traffic leaving on eth0 of the leaf switches OR add a static route for the virtual internet network to the eth0 IPs of the leaf switches on the host system

Wrapping docker-compose into docker

As we have already started to reduce the dependency stack, I think it would also make sense to wrap docker-compose inside a docker container.

Use latest released OS images

We forget updating the OS images all the time but the latest images may be unstable, so it would be best to use the last released images for the mini-lab all the time.

Problem running example from readme

OS: Ubuntu 20.04.1 LTS
Vagrant: 2.2.9
Docker: 19.03.8
Docker-Compose: 1.27.3, build 4092ae5d

Had problem running example from README. When running make i get following error, although script finishes successfully:

deploy-partition | fatal: [leaf01]: UNREACHABLE! => changed=false 
deploy-partition |   msg: 'Failed to connect to the host via ssh: ssh: Could not resolve hostname leaf01: Name or service not known'
deploy-partition |   unreachable: true
deploy-partition | fatal: [leaf02]: UNREACHABLE! => changed=false 
deploy-partition |   msg: 'Failed to connect to the host via ssh: ssh: Could not resolve hostname leaf02: Name or service not known'
deploy-partition |   unreachable: true
deploy-partition | 
deploy-partition | PLAY RECAP *********************************************************************
deploy-partition | leaf01                     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
deploy-partition | leaf02                     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
Full log
vagrant up
Bringing machine 'leaf02' up with 'libvirt' provider...
Bringing machine 'leaf01' up with 'libvirt' provider...
==> leaf02: Checking if box 'CumulusCommunity/cumulus-vx' version '3.7.13' is up to date...
==> leaf01: Checking if box 'CumulusCommunity/cumulus-vx' version '3.7.13' is up to date...
==> leaf02: Creating image (snapshot of base box volume).
==> leaf01: Creating image (snapshot of base box volume).
==> leaf02: Creating domain with the following settings...
==> leaf01: Creating domain with the following settings...
==> leaf02:  -- Name:              metalleaf02
==> leaf02:  -- Domain type:       kvm
==> leaf01:  -- Name:              metalleaf01
==> leaf01:  -- Domain type:       kvm
==> leaf02:  -- Cpus:              1
==> leaf02:  -- Feature:           acpi
==> leaf01:  -- Cpus:              1
==> leaf01:  -- Feature:           acpi
==> leaf02:  -- Feature:           apic
==> leaf02:  -- Feature:           pae
==> leaf01:  -- Feature:           apic
==> leaf01:  -- Feature:           pae
==> leaf02:  -- Memory:            512M
==> leaf01:  -- Memory:            512M
==> leaf02:  -- Management MAC:    
==> leaf01:  -- Management MAC:    
==> leaf01:  -- Loader:            
==> leaf02:  -- Loader:            
==> leaf01:  -- Nvram:             
==> leaf01:  -- Base box:          CumulusCommunity/cumulus-vx
==> leaf02:  -- Nvram:             
==> leaf02:  -- Base box:          CumulusCommunity/cumulus-vx
==> leaf01:  -- Storage pool:      default
==> leaf01:  -- Image:             /var/lib/libvirt/images/metalleaf01.img (6G)
==> leaf02:  -- Storage pool:      default
==> leaf02:  -- Image:             /var/lib/libvirt/images/metalleaf02.img (6G)
==> leaf01:  -- Volume Cache:      default
==> leaf02:  -- Volume Cache:      default
==> leaf01:  -- Kernel:            
==> leaf02:  -- Kernel:            
==> leaf01:  -- Initrd:            
==> leaf02:  -- Initrd:            
==> leaf01:  -- Graphics Type:     vnc
==> leaf01:  -- Graphics Port:     -1
==> leaf02:  -- Graphics Type:     vnc
==> leaf02:  -- Graphics Port:     -1
==> leaf01:  -- Graphics IP:       127.0.0.1
==> leaf02:  -- Graphics IP:       127.0.0.1
==> leaf01:  -- Graphics Password: Not defined
==> leaf02:  -- Graphics Password: Not defined
==> leaf01:  -- Video Type:        cirrus
==> leaf02:  -- Video Type:        cirrus
==> leaf01:  -- Video VRAM:        9216
==> leaf02:  -- Video VRAM:        9216
==> leaf01:  -- Sound Type:	
==> leaf01:  -- Keymap:            de
==> leaf02:  -- Sound Type:	
==> leaf01:  -- TPM Path:          
==> leaf02:  -- Keymap:            de
==> leaf02:  -- TPM Path:          
==> leaf01:  -- INPUT:             type=mouse, bus=ps2
==> leaf01:  -- RNG device model:  random
==> leaf02:  -- INPUT:             type=mouse, bus=ps2
==> leaf02:  -- RNG device model:  random
==> leaf01: Creating shared folders metadata...
==> leaf02: Creating shared folders metadata...
==> leaf01: Starting domain.
==> leaf02: Starting domain.
==> leaf01: Waiting for domain to get an IP address...
==> leaf02: Waiting for domain to get an IP address...
==> leaf01: Waiting for SSH to become available...
==> leaf02: Waiting for SSH to become available...
    leaf01: 
    leaf01: Vagrant insecure key detected. Vagrant will automatically replace
    leaf01: this with a newly generated keypair for better security.
    leaf02: 
    leaf02: Vagrant insecure key detected. Vagrant will automatically replace
    leaf02: this with a newly generated keypair for better security.
    leaf02: 
    leaf02: Inserting generated public key within guest...
    leaf01: 
    leaf01: Inserting generated public key within guest...
    leaf02: Removing insecure key from the guest if it's present...
    leaf01: Removing insecure key from the guest if it's present...
    leaf01: Key inserted! Disconnecting and reconnecting using new SSH key...
    leaf02: Key inserted! Disconnecting and reconnecting using new SSH key...
==> leaf01: Setting hostname...
==> leaf02: Setting hostname...
==> leaf01: Running provisioner: shell...
==> leaf02: Running provisioner: shell...
    leaf01: Running: /tmp/vagrant-shell20201024-51781-7ivrnw.sh
    leaf02: Running: /tmp/vagrant-shell20201024-51781-e8hvaf.sh
    leaf01: #################################
    leaf01:   Running Switch Post Config (config_switch.sh)
    leaf01: #################################
    leaf02: #################################
    leaf02:   Running Switch Post Config (config_switch.sh)
    leaf02: #################################
    leaf01: #################################
    leaf01:    Finished
    leaf01: #################################
    leaf02: #################################
    leaf02:    Finished
    leaf02: #################################
==> leaf01: Running provisioner: shell...
==> leaf02: Running provisioner: shell...
    leaf01: Running: /tmp/vagrant-shell20201024-51781-otw21i.sh
    leaf02: Running: /tmp/vagrant-shell20201024-51781-h3jegd.sh
    leaf01: #### UDEV Rules (/etc/udev/rules.d/70-persistent-net.rules) ####
    leaf01:   INFO: Adding UDEV Rule: Vagrant interface = eth0
    leaf01:   INFO: Adding UDEV Rule: 44:38:39:00:00:1a --> swp1
    leaf01:   INFO: Adding UDEV Rule: 44:38:39:00:00:18 --> swp2
    leaf01: ACTION=="add", SUBSYSTEM=="net", ATTR{ifindex}=="2", NAME="eth0", SUBSYSTEMS=="pci"
    leaf01: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:1a", NAME="swp1", SUBSYSTEMS=="pci"
    leaf01: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:18", NAME="swp2", SUBSYSTEMS=="pci"
==> leaf01: Running provisioner: shell...
    leaf02: #### UDEV Rules (/etc/udev/rules.d/70-persistent-net.rules) ####
    leaf02:   INFO: Adding UDEV Rule: Vagrant interface = eth0
    leaf02:   INFO: Adding UDEV Rule: 44:38:39:00:00:04 --> swp1
    leaf02:   INFO: Adding UDEV Rule: 44:38:39:00:00:19 --> swp2
    leaf02: ACTION=="add", SUBSYSTEM=="net", ATTR{ifindex}=="2", NAME="eth0", SUBSYSTEMS=="pci"
    leaf02: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:04", NAME="swp1", SUBSYSTEMS=="pci"
    leaf02: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:19", NAME="swp2", SUBSYSTEMS=="pci"
==> leaf02: Running provisioner: shell...
    leaf01: Running: /tmp/vagrant-shell20201024-51781-7bdyq2.sh
    leaf02: Running: /tmp/vagrant-shell20201024-51781-32eax6.sh
    leaf01: ### RUNNING CUMULUS EXTRA CONFIG ###
    leaf01:   INFO: Detected a 3.x Based Release (3.7.13)
    leaf01: ### Disabling default remap on Cumulus VX...
    leaf01:   INFO: Detected Cumulus Linux v3.7.13 Release
    leaf01: ### Fixing ONIE DHCP to avoid Vagrant Interface ###
    leaf01:      Note: Installing from ONIE will undo these changes.
    leaf02: ### RUNNING CUMULUS EXTRA CONFIG ###
    leaf02:   INFO: Detected a 3.x Based Release (3.7.13)
    leaf02: ### Disabling default remap on Cumulus VX...
    leaf02:   INFO: Detected Cumulus Linux v3.7.13 Release
    leaf02: ### Fixing ONIE DHCP to avoid Vagrant Interface ###
    leaf02:      Note: Installing from ONIE will undo these changes.
    leaf01: ### Giving Vagrant User Ability to Run NCLU Commands ###
    leaf02: ### Giving Vagrant User Ability to Run NCLU Commands ###
    leaf01: Adding user `vagrant' to group `netedit' ...
    leaf02: Adding user `vagrant' to group `netedit' ...
    leaf01: Adding user vagrant to group netedit
    leaf02: Adding user vagrant to group netedit
    leaf02: Done.
    leaf01: Done.
    leaf02: Adding user `vagrant' to group `netshow' ...
    leaf02: Adding user vagrant to group netshow
    leaf01: Adding user `vagrant' to group `netshow' ...
    leaf01: Adding user vagrant to group netshow
    leaf01: Done.
    leaf01: ### Disabling ZTP service...
    leaf02: Done.
    leaf02: ### Disabling ZTP service...
    leaf01: Removed symlink /etc/systemd/system/multi-user.target.wants/ztp.service.
    leaf02: Removed symlink /etc/systemd/system/multi-user.target.wants/ztp.service.
    leaf01: ### Resetting ZTP to work next boot...
    leaf02: ### Resetting ZTP to work next boot...
    leaf01: Created symlink from /etc/systemd/system/multi-user.target.wants/ztp.service to /lib/systemd/system/ztp.service.
    leaf02: Created symlink from /etc/systemd/system/multi-user.target.wants/ztp.service to /lib/systemd/system/ztp.service.
    leaf01: ### DONE ###
    leaf02: ### DONE ###
./env.sh
docker-compose up --remove-orphans --force-recreate control-plane partition && vagrant up machine01 machine02
Recreating deploy-partition     ... done
Recreating deploy-control-plane ... done
Attaching to deploy-partition, deploy-control-plane
deploy-control-plane | 
deploy-control-plane | PLAY [provide requirements.yaml] ***********************************************
deploy-partition | 
deploy-partition | PLAY [provide requirements.yaml] ***********************************************
deploy-control-plane | 
deploy-control-plane | TASK [download release vector] *************************************************
deploy-partition | 
deploy-partition | TASK [download release vector] *************************************************
deploy-partition | ok: [localhost]
deploy-control-plane | ok: [localhost]
deploy-partition | 
deploy-partition | TASK [write requirements.yaml from release vector] *****************************
deploy-control-plane | 
deploy-control-plane | TASK [write requirements.yaml from release vector] *****************************
deploy-control-plane | ok: [localhost]
deploy-partition | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | PLAY RECAP *********************************************************************
deploy-control-plane | localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
deploy-control-plane | 
deploy-partition | 
deploy-partition | PLAY RECAP *********************************************************************
deploy-partition | localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
deploy-partition | 
deploy-partition | - extracting ansible-common to /root/.ansible/roles/ansible-common
deploy-partition | - ansible-common (v0.5.5) was installed successfully
deploy-control-plane | - extracting ansible-common to /root/.ansible/roles/ansible-common
deploy-control-plane | - ansible-common (v0.5.5) was installed successfully
deploy-partition | - extracting metal-ansible-modules to /root/.ansible/roles/metal-ansible-modules
deploy-partition | - metal-ansible-modules (v0.1.1) was installed successfully
deploy-control-plane | - extracting metal-ansible-modules to /root/.ansible/roles/metal-ansible-modules
deploy-control-plane | - metal-ansible-modules (v0.1.1) was installed successfully
deploy-control-plane | - extracting metal-roles to /root/.ansible/roles/metal-roles
deploy-control-plane | - metal-roles (v0.3.3) was installed successfully
deploy-partition | - extracting metal-roles to /root/.ansible/roles/metal-roles
deploy-partition | - metal-roles (v0.3.3) was installed successfully
deploy-control-plane | 
deploy-control-plane | PLAY [deploy control plane] ****************************************************
deploy-control-plane | 
deploy-control-plane | TASK [ingress-controller : Apply mandatory nginx-ingress definition] ***********
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [ingress-controller : Deploy nginx-ingress service] ***********************
deploy-partition | [WARNING]:  * Failed to parse /root/.ansible/roles/ansible-
deploy-partition | common/inventory/vagrant/vagrant.py with script plugin: Inventory script
deploy-partition | (/root/.ansible/roles/ansible-common/inventory/vagrant/vagrant.py) had an
deploy-partition | execution error: Traceback (most recent call last):   File
deploy-partition | "/root/.ansible/roles/ansible-common/inventory/vagrant/vagrant.py", line 452,
deploy-partition | in <module>     main()   File "/root/.ansible/roles/ansible-
deploy-partition | common/inventory/vagrant/vagrant.py", line 447, in main     hosts, meta_vars =
deploy-partition | list_running_hosts()   File "/root/.ansible/roles/ansible-
deploy-partition | common/inventory/vagrant/vagrant.py", line 414, in list_running_hosts     _,
deploy-partition | host, key, value = line.split(',')[:4] ValueError: not enough values to unpack
deploy-partition | (expected 4, got 1)
deploy-partition | [WARNING]:  * Failed to parse /root/.ansible/roles/ansible-
deploy-partition | common/inventory/vagrant/vagrant.py with ini plugin:
deploy-partition | /root/.ansible/roles/ansible-common/inventory/vagrant/vagrant.py:6: Expected
deploy-partition | key=value host variable assignment, got: re
deploy-partition | [WARNING]: Unable to parse /root/.ansible/roles/ansible-
deploy-partition | common/inventory/vagrant/vagrant.py as an inventory source
deploy-partition | [WARNING]: Unable to parse /root/.ansible/roles/ansible-
deploy-partition | common/inventory/vagrant as an inventory source
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/prepare : Create namespace for metal stack] ***
deploy-partition | 
deploy-partition | PLAY [pre-deployment checks] ***************************************************
deploy-partition | 
deploy-partition | TASK [get vagrant version] *****************************************************
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/nsq : Gather release versions] ***********
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/nsq : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/nsq : Deploy nsq] ************************
deploy-partition | changed: [localhost]
deploy-partition | 
deploy-partition | TASK [check vagrant version] ***************************************************
deploy-partition | skipping: [localhost]
deploy-partition | 
deploy-partition | PLAY [deploy leaves and docker] ************************************************
deploy-partition | 
deploy-partition | TASK [Gathering Facts] *********************************************************
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/nsq : Set services for patching ingress controller service exposal] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/nsq : Patch tcp-services in ingress controller] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/nsq : Expose tcp services in ingress controller] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal-db : Gather release versions] ******
deploy-control-plane | skipping: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal-db : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [Deploy metal db] *********************************************************
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/rethinkdb-backup-restore : Gather release versions] ***
deploy-control-plane | skipping: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/rethinkdb-backup-restore : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/rethinkdb-backup-restore : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/rethinkdb-backup-restore : Deploy rethinkdb (backup-restore)] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/ipam-db : Gather release versions] *******
deploy-control-plane | skipping: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/ipam-db : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [Deploy ipam db] **********************************************************
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/postgres-backup-restore : Gather release versions] ***
deploy-control-plane | skipping: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/postgres-backup-restore : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/postgres-backup-restore : Deploy postgres (backup-restore)] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/masterdata-db : Gather release versions] ***
deploy-control-plane | skipping: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/masterdata-db : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [Deploy masterdata db] ****************************************************
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/postgres-backup-restore : Gather release versions] ***
deploy-control-plane | skipping: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/postgres-backup-restore : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/postgres-backup-restore : Deploy postgres (backup-restore)] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal : Gather release versions] *********
deploy-control-plane | skipping: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal : Check mandatory variables for this role are set] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [Deploy metal control plane] **********************************************
deploy-control-plane | 
deploy-control-plane | TASK [ansible-common/roles/helm-chart : Create folder for charts and values] ***
deploy-control-plane | changed: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [ansible-common/roles/helm-chart : Copy over custom helm charts] **********
deploy-control-plane | changed: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [ansible-common/roles/helm-chart : Template helm value file] **************
deploy-control-plane | changed: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [ansible-common/roles/helm-chart : Calculate hash of configuration] *******
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [ansible-common/roles/helm-chart : Deploy helm chart (metal-control-plane)] ***
deploy-partition | fatal: [leaf02]: UNREACHABLE! => changed=false 
deploy-partition |   msg: 'Failed to connect to the host via ssh: ssh: connect to host leaf02 port 22: No route to host'
deploy-partition |   unreachable: true
deploy-partition | fatal: [leaf01]: UNREACHABLE! => changed=false 
deploy-partition |   msg: 'Failed to connect to the host via ssh: ssh: connect to host leaf01 port 22: No route to host'
deploy-partition |   unreachable: true
deploy-partition | 
deploy-partition | PLAY RECAP *********************************************************************
deploy-partition | leaf01                     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
deploy-partition | leaf02                     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
deploy-partition | localhost                  : ok=1    changed=1    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
deploy-partition | 
deploy-partition exited with code 4
deploy-control-plane | changed: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal : Set services for patching ingress controller service exposal] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal : Patch tcp-services in ingress controller] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal : Patch udp-services in ingress controller] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal : Expose tcp services in ingress controller] ***
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | TASK [metal-roles/control-plane/roles/metal : Wait until api is available] *****
deploy-control-plane | ok: [localhost]
deploy-control-plane | 
deploy-control-plane | PLAY RECAP *********************************************************************
deploy-control-plane | localhost                  : ok=30   changed=4    unreachable=0    failed=0    skipped=7    rescued=0    ignored=0   
deploy-control-plane | 
deploy-control-plane exited with code 0
Bringing machine 'machine01' up with 'libvirt' provider...
Bringing machine 'machine02' up with 'libvirt' provider...
==> machine01: Creating domain with the following settings...
==> machine02: Creating domain with the following settings...
==> machine02:  -- Name:              metalmachine02
==> machine01:  -- Name:              metalmachine01
==> machine02:  -- Forced UUID:       2294c949-88f6-5390-8154-fa53d93a3313
==> machine02:  -- Domain type:       kvm
==> machine01:  -- Forced UUID:       e0ab02d2-27cd-5a5e-8efc-080ba80cf258
==> machine01:  -- Domain type:       kvm
==> machine02:  -- Cpus:              1
==> machine02:  -- Feature:           acpi
==> machine01:  -- Cpus:              1
==> machine02:  -- Feature:           apic
==> machine02:  -- Feature:           pae
==> machine01:  -- Feature:           acpi
==> machine01:  -- Feature:           apic
==> machine02:  -- Memory:            1536M
==> machine02:  -- Management MAC:    
==> machine01:  -- Feature:           pae
==> machine02:  -- Loader:            /usr/share/OVMF/OVMF_CODE.fd
==> machine02:  -- Nvram:             
==> machine01:  -- Memory:            1536M
==> machine01:  -- Management MAC:    
==> machine02:  -- Storage pool:      default
==> machine01:  -- Loader:            /usr/share/OVMF/OVMF_CODE.fd
==> machine01:  -- Nvram:             
==> machine02:  -- Image:              (G)
==> machine01:  -- Storage pool:      default
==> machine01:  -- Image:              (G)
==> machine02:  -- Volume Cache:      default
==> machine02:  -- Kernel:            
==> machine01:  -- Volume Cache:      default
==> machine02:  -- Initrd:            
==> machine01:  -- Kernel:            
==> machine02:  -- Graphics Type:     vnc
==> machine02:  -- Graphics Port:     -1
==> machine01:  -- Initrd:            
==> machine01:  -- Graphics Type:     vnc
==> machine02:  -- Graphics IP:       127.0.0.1
==> machine01:  -- Graphics Port:     -1
==> machine02:  -- Graphics Password: Not defined
==> machine01:  -- Graphics IP:       127.0.0.1
==> machine02:  -- Video Type:        cirrus
==> machine01:  -- Graphics Password: Not defined
==> machine01:  -- Video Type:        cirrus
==> machine02:  -- Video VRAM:        9216
==> machine01:  -- Video VRAM:        9216
==> machine02:  -- Sound Type:	
==> machine01:  -- Sound Type:	
==> machine02:  -- Keymap:            de
==> machine01:  -- Keymap:            de
==> machine02:  -- TPM Path:          
==> machine01:  -- TPM Path:          
==> machine02:  -- Boot device:        network
==> machine01:  -- Boot device:        network
==> machine02:  -- Boot device:        hd
==> machine02:  -- Disks:         sda(qcow2,6000M)
==> machine02:  -- Disk(sda):     /var/lib/libvirt/images/metalmachine02-sda.qcow2
==> machine01:  -- Boot device:        hd
==> machine01:  -- Disks:         sda(qcow2,6000M)
==> machine02:  -- INPUT:             type=mouse, bus=ps2
==> machine02:  -- RNG device model:  random
==> machine01:  -- Disk(sda):     /var/lib/libvirt/images/metalmachine01-sda.qcow2
==> machine01:  -- INPUT:             type=mouse, bus=ps2
==> machine01:  -- RNG device model:  random
==> machine02: Starting domain.
==> machine01: Starting domain.

After waiting for some time, vagrant global-status returns:

id       name      provider state   directory                            
-------------------------------------------------------------------------
4da85f4  leaf01    libvirt running /home/greesha/Data/Projects/mini-lab 
45d4ab1  leaf02    libvirt running /home/greesha/Data/Projects/mini-lab 
12f0ebf  machine02 libvirt running /home/greesha/Data/Projects/mini-lab 
1d95c76  machine01 libvirt running /home/greesha/Data/Projects/mini-lab 

So machines and switches are running. But docker-compose run metalctl machine ls returns empty list of machines. Would appreciate any help with it)

Probably cloud-hypervisor is more modern than qemu and is more suitable for our job

Especially configuring the network is much easier than with qemu: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/networking.md

simple API is also available:

https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/api.md

curl --unix-socket /tmp/cloud-hypervisor.sock -i \
     -X PUT 'http://localhost/api/v1/vm.create'  \
     -H 'Accept: application/json'               \
     -H 'Content-Type: application/json'         \
     -d '{
         "cpus":{"boot_vcpus": 4, "max_vcpus": 4},
         "kernel":{"path":"/opt/clh/kernel/vmlinux-virtio-fs-virtio-iommu"},
         "cmdline":{"args":"console=ttyS0 console=hvc0 root=/dev/vda1 rw"},
         "disks":[{"path":"/opt/clh/images/focal-server-cloudimg-amd64.raw"}],
         "rng":{"src":"/dev/urandom"},
         "net":[{"ip":"192.168.10.10", "mask":"255.255.255.0", "mac":"12:34:56:78:90:01"}]
         }'

Consider deployment flavor consisting of a more complete switch plane

For integration testing purposes it could be interesting to add another deployment flavor to the mini-lab where we also spin up VMs for spines, exits, mgmt-servers, etc.

This can probably not be called "mini" anymore, but there is a need for something like this. It is useful in order to integration test more sophisticated network scenarios, plus it accelerates moving partition deployment roles to metal-roles such that adopters can use them for bootstrapping their partitions.

Try IPMI simulator (virtual BMC) again

This would give us much higher test coverage as also the ipmi_sim from OpenIPMI seems to be pretty much feature complete.

First steps for trying it out would be:

  • Use QEMU flags like
     -device ipmi-bmc-sim,id=bmc0
     -chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 
     -device ipmi-bmc-extern,id=bmc1,chardev=ipmi0 
     -device isa-ipmi-kcs,bmc=bmc1
    
  • Connect ipmi_sim to this device
  • Try commands with ipmitool

If this works out, we can think about where we can deploy the metal-bmc to connect the system to the metal-stack. With this, we could start integration tests for go-hal and also refactor go-hal that we have a working default implementation for the IPMI protocol (wider hardware support).

References:

Additional information:

make fails with error: unknown command "/bin/sh" for "yq"

Steps to reproduce:

  • remove all local images of "yq"
  • make

Result:

Status: Downloaded newer image for mikefarah/yq:latest
Error: unknown command "/bin/sh" for "yq"
Run 'yq --help' for usage.
make: *** [Makefile:131: env] Error 1

Cause:
env.sh has a docker run "mikefarah/yq" which pulls the latest image which for some time now is version 4.x, which incorporates some changes that break compatibility, see https://mikefarah.gitbook.io/yq/upgrading-from-v3:

  • entrypoint ist now /usr/bin/yq --> which is why the error above occurs
  • In v3 yq had seperate commands for reading/writing/deleting and more. In v4 all these have been embedded into a single expression.

So we should simply use mikefarah/yq:3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.