Giter Site home page Giter Site logo

Comments (23)

alexlarsson avatar alexlarsson commented on August 22, 2024 1

In the automotive world we often think of containers as two possible things. Either they come with the system, and are updated atomically with it, or they are separately installed. They way we expect this to work is for the system ones to be installed in a separate image store that is part of the ostree image. And then the "regular" containers will just be stored in /var/lib/container.

The automotive sig manifests ship a storage.conf that has:

[storage.options]
additionalimagestores = [   /usr/share/containers/storage ]

Then we install containers in the image with osbuild like:

      - type: org.osbuild.skopeo
        inputs:
          images:
            type: org.osbuild.containers
            origin: org.osbuild.source
            mpp-resolve-images:
              images:
                - source: registry.gitlab.com/centos/automotive/sample-images/demo/auto-apps
                  tag: latest
                  name: localhost/auto-apps
        options:
          destination:
            type: containers-storage
            storage-path: /usr/share/containers/storage

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024 1

Then we install containers in the image with osbuild like:

So IMO this issue is exactly about having bootc install and bootc update handle these images. Because as is today, needing to duplicate the app images in an osbuild manifest is...unfortunate. With this proposal, when osbuild is making a disk image, it'd use bootc install internally to the pipeline, and we wouldn't need to re-specify the child container images out of band of the "source of truth" of the parent image.

I understand that, and I merely pointed out how we currently do it in automotive, not how it would be done with bootc.

Instead, what I propose is essentially:

Dockerfile:

FROM bootc-base
RUN podman --root /usr/lib/containers/my-app pull quay.io/my/app
ADD my-app.container /etc/containers/systemd

my-app.container:

[Container]
Image=quay.io/my/app
PodManArgs=--storage-opt=overlay.additionalimagestore=/usr/lib/containers/my-app

And then you have an osbuild manifest that just deploys the above image like any normal image.

Of course, instead of open-coding the commands like this, a tool could do the right thing automatically.

You might also want the tool to tweak the image name in the quadlet to contain the actual digest so we know that the exact right image version is used every time.

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024

This was part of the driver for the need for composefs to be able to contain overlayfs base dirs (overlay nesting). Although that is less important if container/storage also uses composefs.

from bootc.

rhatdan avatar rhatdan commented on August 22, 2024

I love the idea of additonal stores for this.

from bootc.

vrothberg avatar vrothberg commented on August 22, 2024

Quadlet supports .image files now which can be directly referenced in .container files. Maybe that's a way to achieve a similar effect.

The .image files don't yet (easily) allow for pulling into an additional store, but this could be a useful feature.

Cc: @ygalblum

from bootc.

cgwalters avatar cgwalters commented on August 22, 2024

Then we install containers in the image with osbuild like:

So IMO this issue is exactly about having bootc install and bootc update handle these images. Because as is today, needing to duplicate the app images in an osbuild manifest is...unfortunate. With this proposal, when osbuild is making a disk image, it'd use bootc install internally to the pipeline, and we wouldn't need to re-specify the child container images out of band of the "source of truth" of the parent image.

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024

Its also interesting to reflect on the composefs efficiency in a setup like this.

If we use composefs for the final ostree image, we will get perfect content sharing, even if each of the individual additional-image-stores use its own composefs objects dir. Even if no effort is made to try to share object files between image store directories. Because all the files will eventually be deduplicated as part of the full ostree composefs image.

In fact, we will even deduplicate files between image stores that use the traditional overlayfs or vfs container store formats.

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024

In fact, maybe using vfs backend is the right approach here? It is a highly stable on-disk format, and its going to be very efficient to start such a container. And we can ignore all the storage inefficiencies, because they are taken care off by the outer composefs image.

from bootc.

ygalblum avatar ygalblum commented on August 22, 2024

my-app.container:

[Container]
Image=quay.io/my/app
PodManArgs=--storage-opt=overlay.additionalimagestore=/usr/lib/containers/my-app

Just wanted to note that --storage-opt is a global argument. So, the key to use is GlobalArgs instead of PodmanArgs.

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024

I wonder if we should tweak the base images to have a standardized /usr location for additional image store images.

from bootc.

rhatdan avatar rhatdan commented on August 22, 2024

/usr/lib/containers/storage?

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024

@rhatdan Yeah, that sounds good to me. Can we perhaps just add it alwas to our /usr/share/containers/storage.conf file?

from bootc.

rhatdan avatar rhatdan commented on August 22, 2024

You want that in the default storage.conf in containers/storage?

from bootc.

rhatdan avatar rhatdan commented on August 22, 2024

If you setup an empty additionalstore you need to precreate the directories and lock files. This is what we are doing to setup an empty AdditonalStore. We should fix this in containers/storage to create these files and directories if they do not exists.

RUN mkdir -p /var/lib/shared/overlay-images \
             /var/lib/shared/overlay-layers \
             /var/lib/shared/vfs-images \
             /var/lib/shared/vfs-layers && \
    touch /var/lib/shared/overlay-images/images.lock && \
    touch /var/lib/shared/overlay-layers/layers.lock && \
    touch /var/lib/shared/vfs-images/images.lock && \
    touch /var/lib/shared/vfs-layers/layers.lock

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024

@rhatdan Would it maybe be possible instead to have containers/storage fail gracefully when the directory doesn't exist?

from bootc.

rhatdan avatar rhatdan commented on August 22, 2024

Yes that is the way it should work. If I have time I will look at it. Basically ignore the storage if it is empty.

from bootc.

rhatdan avatar rhatdan commented on August 22, 2024

Actually I just tried it out, as long as the additional image store directory exists, the store seems to work. No need for those additonal files and directories.

from bootc.

rhatdan avatar rhatdan commented on August 22, 2024
cat /etc/containers/storage.conf
[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
pull_options={enable_partial_images = "true", use_hard_links = "false", ostree_repos=""}
additionalimagestores = [
"/usr/lib/containers/storage",
]

Additional store directory is empty

 ls -l /usr/lib/containers/storage/
total 0
podman info 
...

So podman will write to the empty directory and create

# ls -lR /usr/lib/containers/storage/
/usr/lib/containers/storage/:
total 4
drwx------. 2 root root 4096 Nov 24 07:03 overlay-images

/usr/lib/containers/storage/overlay-images:
total 0
-rw-r--r--. 1 root root 0 Nov 24 07:03 images.lock

So podman will write to the empty directory and create the missing content.

If the file system is read-only it fails.

podman info
Error: creating lock file directory: mkdir /usr/lib/containers/storage/overlay-images: read-only file system

from bootc.

alexlarsson avatar alexlarsson commented on August 22, 2024

So, I've been thinking about the details around this for a while. In particular about the best storage for these additional image directories. The natural approach would be to use the overlay backend, as we can then use overlay mounts for the actual container, but this has some issues.

First of all, historically, ostree doesn't support whiteout files. This has been recently fixed, although even that fix requires adding custom options to ostree. In addition, if ostree is using composefs, there are some issues with encoding both the whiteouts as well as the overlayfs xattrs in the image. These are solved by the overlay xattr escape support I have added in the most recent kernel, although we don't yet have that backported into the CS9 kernel.

However, I wonder if using overlay directories for the additional image dir is even the right approach? All the files in the additional image dir will anyway be deduplicated by ostree, so maybe it would be better if we used an approach more like the vfs backend, where each layers is completely squashed (and then we rely on the wrapping ostree to de-duplicate these). Such a layer would be faster to setup and use (since it is shallower), and fix all the issues regarding whiteouts and overlay xattrs.

I see two approaches for this:

  1. Use overlay backend with composefs format. This moves all the xattrs and whiteouts into the composefs image file, which will work fine in any ostree image
  2. Teach the overlay container/storage backend the ability to squash individual layers, and then do this for all the images in the additional image store.

Opinions?

from bootc.

cgwalters avatar cgwalters commented on August 22, 2024

So there's two totally different approaches going on here (and the second approach has two sub-approaches):

Physically embed the app images in the base image

In this model, bootc upgrade and bootc rollback will also upgrade/rollback the system images "naturally", the same way as any other files. (There's a lot of discussion above about the interactions with whiteouts/composefs/etc. though)

From the UX point of view, a really key thing is there is one container image - keeping the problem domain of "versioning/mirroring" totally simple.

However...note that this model "squashes" all the layers in the app images into one layer in the base image, so on the network, e.g. the base image used by an app changes, it will force a re-fetch of the entire app (all its layers), even if some of the app layers didn't change.

I think there's also the converse problem - unless we very carefully ensure that the podman pull or equivalent that generates the layer is fully reproducible (e.g. timestamps) it means any updates to the base image will generate a different squashed app layer, which is also quite problematic. (Forcing a new storage in the registry)

In other words, IMO this model breaks some of the advantages of the content-addressed storage in OCI by default. We'd need deltas to mitigate.

(For people using ostree-on-the-network for the host today, this is mitigated because ostree always behaves similarly to zstd:chunked and has static deltas; but I think we want to make this work with OCI)

Longer term though, IMO this approach clashes with the direction I think we need to take for e.g. configmaps - we really will need to get into the business of managing more than just one bootable container image, which leads to:

Reference the app images

A common advantage/disadvantage of the below is that the user must manage multiple container images for system installs - e.g. for a disconnected/offline install they must all be mirrored, not just one.

I am sure someone has already invented this, but I think we should suppport a "rollup" OCI artifact that (much like a manifest list) is just a pointer to a bunch of other container images. A bit like the OCP "release image" except not an executable itself.

Then tools like skopeo copy would know how to recurse into it and mirror all the sub-images, and bootc install could honor this image. bootc would learn about this too, so bootc upgrade would find all the things.

Loose binding

In this model, the app images would only be referenced from the base image as .image files.

We would teach bootc install (i.e. at disk write time) to support "pre-pulling" container images referenced by /usr/share/containers/systemd/*.image files in the tree (using the credentials embedded in the base image) - but physically the container images live in /var in the final installed filesystem.

(There's an interesting sub-question here of whether we do this by default for .image files we find)

Anyways though, here these images are disconnected from the base image lifecycle; bootc upgrade/rollback would not affect them. They can be fully uninstalled (though to do so the .image file would need to be masked). Updates to them work by fetching from the registry directly.

A corollary to this is that for e.g. disconnected installs, the user must mirror all the application container images too.

This for example is the model used AFAIK by Fedora Workstation when flatpaks are installed - they are embedded in the ISO, but live in /var.

Strict binding

A key aspect of this would be that like "loose binding", the container images would be fetched separately from a registry. For disconnected installs, the admin would need to mirror them all. But we wouldn't lose all the efficiency bits of OCI.

This is what I was getting at originally; the images would still live in /var/lib/containers (I think), but bootc upgrade would enforce that the referenced .image files in the new root are pre-fetched before the next boot.

Hmm...more generally really I think we may need to drive something into podman where instead of .image files effectively expanding into an imperative invocation of podman pull, things like podman image prune would at least optionally know how to not prune the images. On a bootc system, we'd make sure to wire things up so that podman would avoid pruning images referenced from .image files in both the booted root and the rollback.

That said, again once we switch to podman storage for bootc then it may just make more sense to physically locate the images in the bootc container storage and have bootc own all updates/GC.

from bootc.

cgwalters avatar cgwalters commented on August 22, 2024

I am sure someone has already invented this, but I think we should suppport a "rollup" OCI artifact that (much like a manifest list) is just a pointer to a bunch of other container images. A bit like the OCP "release image" except not an executable itself.

I saw this go by:
https://opencontainers.org/posts/blog/2023-07-07-summary-of-upcoming-changes-in-oci-image-and-distribution-specs-v-1-1/#2-new-manifest-field-for-establishing-relationships
Although, it seems like it's almost the inverse of what we want here. I guess in the end, maybe things like "super image" are just a special case of manifest lists.

from bootc.

cgwalters avatar cgwalters commented on August 22, 2024

Some discussion about this on the podman side in containers/podman#22785

from bootc.

vrothberg avatar vrothberg commented on August 22, 2024

One discussion that intersects with parts of this issue happened in containers/podman#18182 (reply in thread). In short: we discussed how we can mark images to be un-removable.

from bootc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.