Giter Site home page Giter Site logo

tinkerbell / osie Goto Github PK

View Code? Open in Web Editor NEW
99.0 17.0 30.0 2.17 MB

An in-memory installation environment for bare metal.

Home Page: https://tinkerbell.org

License: Apache License 2.0

Makefile 3.32% Dockerfile 3.85% Shell 80.20% Python 11.67% Nix 0.22% SaltStack 0.20% Go 0.03% Jinja 0.50%
tinkerbell

osie's Introduction

OSIE

Build Status deprecated

OSIE is the Operating System Installation Environment. It consists of an Alpine Linux based netboot image which fetches a prebuilt Ubuntu 16.04 container that does the actual installation. All of the above is built from this repository using GNU Make.

Deprecation

OSIE has been deprecated in preference to the Hook project. Here is the deprecation schedule:

  • September 28th, 2021: Announcement published
  • November 30th, 2021: Tree closes for feature changes
  • December 30th, 2021: Repository is archived (read-only)

For more details, see the OSIE deprecation proposal.

Cloning OSIE

OSIE uses git-lfs for large files that are part of the build process. If you clone this repo without git-lfs installed and set up, your builds will fail.

Install git-lfs per instructions at https://git-lfs.github.com/ and make sure to run git lfs install afterwards to set it up in your ~/.gitconfig.

Building OSIE

Ubuntu Based Container

The OSIE Ubuntu based container is built with docker for both aarch64 and x86_64. Some packages are rebuilt with different settings (git, using openssl) or updated upstream sources are built/installed. These can be built individually with make build/osie-aarch64.tar.gz or make build build/osie-x86_64.tar.gz.

Alpine Based Netboot Image

The OSIE Alpine boot files are built in an Alpine Docker container. All the packages are built at container build time, including the kernel. The built/installed packages are later used at run time to generate initramfs and modloop files.

Note: Skipping Alpine Kernel Builds

Building the Alpine Linux Kernel takes a long time, on account of building just about all of the kernel modules. This is usually not needed as we don't mess with the kernel configuration very often. Unfortunately, make will try to build the kernel unless certain steps are taken (usually only on initial git clone). Skipping these builds can be done by running the installer/alpine/skip-building-alpine-files script, which updates the modified timestamp of the source files so make will not try to rebuild.

Build Dependencies

The build dependencies can be seen in Makefile and rules.mk.j2, they are the source of truth. The packages found in shell.nix are good second source. Using nix-shell or lorri along with direnv is highly recommended.

Otherwise, ensure the following tools are installed:

  • bash
  • curl
  • cpio
  • docker
  • git
  • git-lfs
  • gnumake
  • gnused
  • j2cli (for j2)
  • libarchive (for bsdcpio, bsdtar)
  • minio (for mc)
  • pigz (and unpigz)

The OSIE build uses docker's --squash functionality and that is currently locked behind an experimental feature flag. To enable experimental features in Docker, place the following json in /etc/docker/daemon.json and $HOME/.docker/config.json:

{
    "experimental": "true"
}

Developing Locally

The quickest way to start running OSIE locally is:

make OSES=ubuntu_20_04 V=1 T=1 test-x86_64

Adding Alpine Packages To initramfs

The Alpine x86_64 initramfs image used is fully self-reliant. We embed the .apk files, and repo metadata into the initramfs for all packages used as part of /init. Alpine packages should be installed in the installer/alpine/Dockerfile like so:

RUN apk add --no-scripts --update --upgrade --repository http://dl-cdn.alpinelinux.org/alpine/edge/testing kexec-tools

Those package names then need to be added to installer/alpine/init-x86_64 in this list:

KOPT_pkgs="curl,docker,jq,mdadm,openssh,kexec-tools"

If you need to install packages from a non-standard alpine repo, the URL will need to be listed in installer/alpine/init-x86_64 like so:

ALPINE_REPO="http://dl-cdn.alpinelinux.org/alpine/v3.7/main,http://dl-cdn.alpinelinux.org/alpine/v3.7/community,http://dl-cdn.alpinelinux.org/alpine/edge/testing"

Website

For complete documentation, please visit the Tinkerbell project hosted at tinkerbell.org.

osie's People

Contributors

andy-v-h avatar dependabot[bot] avatar detiber avatar dlaube avatar dustinmiller avatar gauravgahlot avatar grahamc avatar invidian avatar jacobweinstock avatar joelrebel avatar maxpeal avatar mergify[bot] avatar mikemrm avatar mmlb avatar mrmrcoleman avatar nathangoulding avatar nshalman avatar parauliya avatar rainleander avatar raj-dharwadkar avatar scott4000 avatar scottgarman avatar sfunkhouser avatar splaspood avatar thebsdbox avatar tobert avatar tstromberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osie's Issues

conditional syntax not supported in future alpines

The syntax ?// from installer/osie-installer.sh is not supported in jq on Alpine 3.11, and results in the error:

+ jq -S '. + {"password_hash":"'"$pwhash"'", "state": (.state?//"'"$state"'")}' <"$metadata" >"$metadata.tmp"
jq: error: syntax error, unexpected ?// (Unix shell quoting issues?) at <top-level>, line 1:

Debug aarch64 vm based power-on/phone-home tests

The qemu-system-aarch64 user-mode emulation based boot/phone-home tests from #227 don't work. There's no serial output so its hard to debug atm. Is the test giving up too early? Is something done wrong with qemu? Maybe the aarch64 uefi can't boot from virtio-disks (:thinking:). Need to try the vnc graphical output, maybe there's data there...

Expected Behaviour

test_boot_and_phone_home works for aarch64 VMs

Current Behaviour

test_boot_and_phone_home always fails

Replace osie-build-env with shell.nix

osie-build-env is currently used to setup both an developer/build environment and also an environment suitable for running the osie tests. This is done using docker. Instead of (ab)using docker for both scenarios we should instead use nix and nix-shell for it. With nix-shell we get better support for pinning/choosing tool versions and parity with other tinkerbell/ services dev setups.

The test environment should be completely re-done with either the actual services (with docker-compose.yml or as drone services). This was much harder back in the day pre-cacher, but should be pretty easyish now.

Unable to create ZFS file system - unsupported by kernel

ZFS files system creation support is not there in current alpine version

Expected Behaviour

Need ZFS support it works in 5.4.72
current version is 5.4.52

Current Behaviour

not able to create zfs files system on disk instead of ext4

Possible Solution

Need upgraded Alpine OS to kernel version 5.4.72
then i can install zfs-lts

Steps to Reproduce (for bugs)

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Linux
  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Vagrant and Libvirt
  • Link to your project or a code example to reproduce issue:

Send the worker logs to the boots syslog server?

#51 dropped support for forwarding the logs to a central place but left the worker without any way to troubleshoot the actions (and since the containers are deleted, we cannot use docker logs).

We should have a way to see the actions/worker logs somewhere, maybe forward them to the boots syslog server?

This is somewhat related to #48, but saving the logs to a local file is not enough because when the worker reboots we loose the logs.

osie-runner triggering SIGSEGV and being terminated

We've seen some machines where preinstallation has completed and osie-runner is sitting connected to hegel over grpc where it suddenly terminates with exit code 139:

localhost:~# docker ps -a
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS                     PORTS               NAMES
bcc4cb56c565        osie-runner:x86_64   "/entrypoint.sh pyth…"   5 days ago          Exited (139) 7 hours ago                       quizzical_varahamihira
localhost:~# docker logs quizzical_varahamihira --tail 3
}
2021-08-04T10:40:45.952178Z [info     ] no handler for state           [runner] state=provisionable
2021-08-04T10:40:45.952249Z [info     ] about to monitor               [runner]

Expected Behaviour

osie-runner keeps running indefinitely until it receives new information from hegel

Current Behaviour

osie-runner exits unexpectedly with exit code 139

Possible Solution

Update to the latest grpc in hopes that some bug in the core library has been fixed.

Steps to Reproduce (for bugs)

Context

Your Environment

Equinix Metal Production

Containerd package gets corrupted sometimes

Expected Behaviour

Provisioning OSIE works every time.

Current Behaviour

Currently, sometimes booting worker with OSIE ends up with no workflow executed. Upon investigation, I figured that cached containerd package in OSIE is corrupted and apk add is not able to install it, which results in no Docker and no workflows executed.

@gianarb says it's only happening sometimes.

Steps to Reproduce (for bugs)

  1. Provision sandbox repository several times checking if workflow runs each time.

Context

Discovered while working on tinkerbell/cluster-api-provider-tinkerbell#17

Your Environment

Used OSIE version https://tinkerbell-oss.s3.amazonaws.com/osie-uploads/osie-v0-n=404,c=c35a5f8,b=master.tar.gz

Running in sandbox environment using tinkerbell/playground@be40a7b.

workflow-helper script should not block TTY from spawning

It seems that right now, if workflow-helper script hits infinite loop trying to log in into the docker registry, there is no way to debug it and there is no logs produced to TTY either.

If you look at the serial console output, then the login prompt does not show up until workflow-helper script finishes execution.

Change CI away from drone.packet.net

drone.packet.net is going to be shutdown in the not very distant future. Its running old code (v0.8, EoL for many years now), is being "managed" by a team of one (me) as fires pop up, and the hardware its running is old/crufty and about to be retired.

There are 2 options I'm considering

  1. Transition to GH Actions using our tinkebell org self hosted runners to build and test osie (we need the ability to run VMs for the tests).
  2. Transition to build kite using a bare metal runner hosted by Equinix.

I'm leaning towards option 2 as a PoC for transitioning all the tinkerbell repos over to buildkite instead of the self hosted GHA. I've been planning a PoC that I could use as an example for a formal proposal and was going to with boots, but the drone.packet.net shutdown presents a good opportunity.

Write a detailed document about connections and requirement to write your own in mem os

It will be nice to get a detailed document about how to build your in-memory operating system (initrd), what it has to run (for example docker, and start tink-worker).

I am asking for mainly two reasons:

  1. Osie should be only one possible implementation, companies can have their OS, and it is nice to provide what they need to write their one
  2. a lot of the pull requests I see landing those days in Osie: #102 , #101 , #98 , #93 looks related to "Packet needs."

My secret hope is to get a minimum "Osie" implementation as part of the Tinkerbell organization, leaving more specialized ones to the end-user; ideally, Osie, as we know it today, can belong to PacketHost.

Something cool @thebsdbox did https://github.com/plunder-app/BOOTy about this topic

Probably this is related to tinkerbell/tink#136
And in some way to #2

CI is broken

github.com/mholt/caddy used to redirect to caddyserver/caddy but now is 404'ing. This is breaking our CI. Going to update to something stable.

Docker images fail to download with 'no space left on device' in OSIE

When trying out the sandbox and using a different workflow than hello world as in the example (I tried https://github.com/alexellis/tinkerbell-ubuntu) the docker images fail to download. I'm posting this here and not in sandbox, because I assume this is rather an issue with the default OSIE (?)

Expected Behaviour

OSIE environment is able to download necessary Docker images

Current Behaviour

download of Docker images fails due to 'no space left on device'

Possible Solution

Steps to Reproduce (for bugs)

  1. Download and run sandbox
  2. Create workflow for https://github.com/alexellis/tinkerbell-ubuntu
  3. run the workflow against your hardware
  4. tink worker fails with "no space left on device"

Context

I noticed the ubuntu installation fails during the first step (wiping the disk), so I checked the docker logs of tink-worker on the machine I'm trying to setup. In the logs it says "no space left on device" when trying to extract the downloaded Docker image. I checked the volumes and the only full volume was /dev/loop0 (/.modloop) with 200MB.
What am I missing?

Your Environment

Using sandbox docker-compose setup on Proxomox VMs with Ubuntu 18.04 LTS as OS

Fetching complete repo for single image fileset is inefficient

During server provisioning we see extra ~1 minute delays whilst ~1GB git data is downloaded from https://images.packet.net/packethost/packet-images.git

As this repo grows, we get more and more slow down (this is all before git checkout, thus excludes final image large file download LFS/caching).

Contributing to this:
a) This repo has a lot of stuff beyond the final images for servers booting (idk, maybe build tools?)
b) It fetches all branches (could be single ref)
c) It fetches all history (could be shallow)

Historically fetch by commit (uploadpack.allowReachableSHA1InWant) was not well supported - it is now (including GitHub, I believe), and a shallow single commit fetch is much quicker. (Deploy script could always try direct commit fetch, and fall back to all branches if git service doesn't support it).

I'm not sure of the exact OSIE script running at the moment, but I'm assuming it's close to:

git -C $assetdir fetch origin

Example (Run from Packet SYD2)

gituri=https://github.com/packethost/packet-images.git
image_tag=82dfba29f7aa462651c2e96521ed24bcad726330

#Existing fetch-all
time git -C $assetdir fetch origin
#Receiving objects: 100% (91877/91877), 889.05 MiB | 19.89 MiB/s, done.
#real 0m51.687s
#user 0m23.772s
#sys 0m5.012s

#imageid fetch
time git -C $assetdir fetch --depth 1 origin "${image_tag}"
# remote: Total 9 (delta 0), reused 6 (delta 0), pack-reused 0
#real 0m2.982s
#user 0m0.080s
#sys 0m0.012s

Ticket reference NYDE-2114-IUHD

Tinkerbell Uniform Standards: Maintained Repository

Our repositories should be the example from which adjacent, competing, projects look for inspiration.

Each repository should not look entirely different from other repositories in the ecosystem, having a different layout, a different testing model, or a different logging model, for example, without reason or recommendation from the subject matter experts from the community.

We should share our improvements with each ecosystem while seeking and respecting the feedback of these communities.

Whether or not strict guidelines have been provided for the project type, our repositories should ensure that the same components are offered across the board. How these components are provided may vary, based on the conventions of the project type. GitHub provides general guidance on this which they have integrated into their user experience.

Expected Behaviour

We believe this repository is Maintained and therefore needs the following files updated:

If you feel the repository should be experimental or end of life or that you'll need assistance to update these files, please let us know by filing an issue with https://github.com/packethost/standards.

Current Behaviour

n/a

Possible Solution

n/a

Steps to Reproduce (for bugs)

n/a

Context

Packet maintains a number of public repositories that help customers to run various workloads on Packet. These repositories are in various states of completeness and quality, and being public, developers often find them and start using them. This creates problems:

  • Developers using low-quality repositories may infer that Packet generally provides a low quality experience.
  • Many of our repositories are put online with no formal communication with, or training for, customer success. This leads to a below average support experience when things do go wrong.
  • We spend a huge amount of time supporting users through various channels when with better upfront planning, documentation and testing much of this support work could be eliminated.

To that end, we propose three tiers of repositories: Private, Experimental, and Maintained.

As a resource and example of a maintained repository, we've created https://github.com/packethost/standards. This is also where you can file any requests for assistance or modification of scope.

Your Environment

https://github.com/tinkerbell/

OSIE image should use multi stage builds

We build a few packages from source, some of them may go away with #128 but some are likely to still be needed. Using multi stage builds would allow making better use layer caching and likely help with the final image size (#2).

reduce osie size

We should remove unnecessary packages, perhaps package x86_64/aarch64 architectures independently, and generally work to slim down OSIE substantially.

Uniform Standards: Experimental Repository

Hello!

We believe this repository is Experimental and therefore needs the following files updated:

If you feel the repository should be maintained or end of life or that you'll need assistance to create these files, please let us know by filing an issue with https://github.com/packethost/standards.

Packet maintains a number of public repositories that help customers to run various workloads on Packet. These repositories are in various states of completeness and quality, and being public, developers often find them and start using them. This creates problems:

  • Developers using low-quality repositories may infer that Packet generally provides a low quality experience.
  • Many of our repositories are put online with no formal communication with, or training for, customer success. This leads to a below average support experience when things do go wrong.
  • We spend a huge amount of time supporting users through various channels when with better upfront planning, documentation and testing much of this support work could be eliminated.

To that end, we propose three tiers of repositories: Private, Experimental, and Maintained.

As a resource and example of a maintained repository, we've created https://github.com/packethost/standards. This is also where you can file any requests for assistance or modification of scope.

The Goal

Our repositories should be the example from which adjacent, competing, projects look for inspiration.

Each repository should not look entirely different from other repositories in the ecosystem, having a different layout, a different testing model, or a different logging model, for example, without reason or recommendation from the subject matter experts from the community.

We should share our improvements with each ecosystem while seeking and respecting the feedback of these communities.

Whether or not strict guidelines have been provided for the project type, our repositories should ensure that the same components are offered across the board. How these components are provided may vary, based on the conventions of the project type. GitHub provides general guidance on this which they have integrated into their user experience.

osie.sh based installs fail to boot

Installations done through osie.sh (legacy stuff) fail to boot since it uses Ubuntu's grub not the grub embedded in the image. Ubuntu's grub has been updated to mitigate against BootHole round 2 issues. Ubuntu's grub has a hard coded path to the grub.cfg which is not where the default grub-install puts it and thus we fail to boot on powerup.

Expected Behaviour

Legacy installations have grub properly installed, able to find the configuration file and thus boot from disk.

Current Behaviour

Grub can't find it's configuration file,and hangs the boot.

Possible Solution

Going to re-work grub installation to install via chroot, using the image's grub.

workflow-helper.sh doesn't execute 2nd docker run

Hey there,

It looks like there is some sort of race condition in workflow-helper.sh. When workflow-helper.sh is executed as part of init.rd the following command from the mentioned script is not executed:

https://github.com/tinkerbell/osie/blob/master/installer/workflow-helper.sh#L69-L81

docker run --privileged -ti \
	-e "container_uuid=$id" \
	-e "WORKER_ID=$worker_id" \
	-e "DOCKER_REGISTRY=$docker_registry" \
	-e "TINKERBELL_GRPC_AUTHORITY=$grpc_authority" \
	-e "TINKERBELL_CERT_URL=$grpc_cert_url" \
	-e "REGISTRY_USERNAME=$registry_username" \
	-e "REGISTRY_PASSWORD=$registry_password" \
	-v /worker:/worker \
	-v /var/run/docker.sock:/var/run/docker.sock \
	--log-driver=fluentd -t \
	--net host \
	"$docker_registry/tink-worker:latest"

Re-running workflow-helper.sh manually succeeds and executes without problems.

Update OSIE container image to latest Ubuntu LTS

OSIE uses Xenial (16.04) as its base image, yet we build/install newer package from source. Updating to 20.04 might avoid the need to do so which should shorten the time taken to build the container images.

18.04 was tried once, pre-open-sourcing and didn't work out for some reason I can't recall :/.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.