Giter Site home page Giter Site logo

Comments (32)

jacobshivers avatar jacobshivers commented on August 11, 2024 6

I am leaving this for posterity in the event someone is trying to test this and would like a reference:

  • base64 encoding for MachineConfig
$ base64 --wrap=0 <<EOF
> [crio.runtime.runtimes.fuse]
> runtime_path = "/usr/bin/runc"
> runtime_root = "/run/runc"
> runtime_type = "oci"
> allowed_annotations = [
>     "io.kubernetes.cri-o.Devices",
> ]
> EOF
W2NyaW8ucnVudGltZS5ydW50aW1lcy5mdXNlXQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsKICAgICJpby5rdWJlcm5ldGVzLmNyaS1vLkRldmljZXMiLApdCg==
  • MachineConfig that extends crio functionality
$ cat fuse-MachineConfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-crio-fuse
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - path: /etc/crio/crio.conf.d/99-crio-fuse.conf
        overwrite: true
        contents:
          source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS5ydW50aW1lcy5mdXNlXQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsKICAgICJpby5rdWJlcm5ldGVzLmNyaS1vLkRldmljZXMiLApdCg==
  • Create MachineConfig
$ oc create -f fuse-MachineConfig.yaml 
machineconfig.machineconfiguration.openshift.io/crio-fuse created
  • RuntimeClass for fuse that will be used by pod
$ cat fuse-RuntimeClass.yaml 
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: fuse
handler: fuse
  • Create RuntimeClass
$ oc create -f fuse-RuntimeClass.yaml
  • Sample pod
$ cat fuse-podSpec.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: fuse-test
  annotations:
    io.kubernetes.cri-o.Devices: "/dev/fuse"
spec:
  containers:
  - name: fuse-test
    image: ubi8/ubi-minimal
    command: ['sh', '-c', 'echo "Hello from user $(id -u)" && sleep infinity']
  runtimeClassName: fuse
  • Create pod
$ oc create -f fuse-podSpec.yaml
  • Check for /dev/fuse within pod
$ oc exec -it fuse-test -- /bin/ls -l /dev/fuse
crw-rw-rw-. 1 root root 10, 229 Feb  4 15:41 /dev/fuse

If it is necessary to call mount() within the pod to access a filesystem via /dev/fuse then the SYS_ADMIN capability must be extended to the pod and an appropriate SCC provisioned. Below is an example of such an SCC and pod specification.

$ cat fuse-SCC.yaml
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities:
- SYS_ADMIN
allowedUnsafeSysctls:
- '*'
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
groups:
- system:cluster-admins
- system:nodes
- system:masters
kind: SecurityContextConstraints
metadata:
  annotations:
    kubernetes.io/description: Test scc for fuse mounts.
  name: fuse-scc
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
seccompProfiles:
- '*'
supplementalGroups:
  type: RunAsAny
users:
- system:admin
- system:serviceaccount:builder
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
$ cat fuse-podSpec.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: fuse-test-fuse-scc
  annotations:
    io.kubernetes.cri-o.Devices: "/dev/fuse"
spec:
  containers:
  - name: fuse-test-fuse-scc
    image: registry.access.redhat.com/ubi8/ubi:8.5-226
    command: ['sh', '-c', 'echo "Hello from user $(id -u)" && sleep infinity']
    securityContext:
      capabilities:
        add: ["SYS_ADMIN"]
  runtimeClassName: fuse
$ oc create -f fuse-SCC.yaml
$ oc create -f fuse-podSpec.yaml

from enhancements.

cgwalters avatar cgwalters commented on August 11, 2024

It sounds like this change was already merged in cri-o/cri-o#3822

My feeling on this though is that changes to the default container "attack surface" and available APIs needs a bit more of a formal process. I think it would be worth making this a full enhancement.

from enhancements.

cgwalters avatar cgwalters commented on August 11, 2024

So this is the equivalent of podman run --device /dev/fuse on by default, right? Except because our default SELinux policy denies dynamic kernel module loads, you are proposing to unconditionally load the module on system bootup?

from enhancements.

rhatdan avatar rhatdan commented on August 11, 2024

Yes.

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

that PR was closed, not merged, because we've been squabbling on the approach 😃

from enhancements.

cgwalters avatar cgwalters commented on August 11, 2024

that PR was closed, not merged, because we've been squabbling on the approach smiley

Ah right, ok. So basically I think some squabbling on this is warranted because while I'd agree it's probably mostly safe it's still a notable change. (For example, if we turn this on by default it becomes another thing different from e.g. upstream Kubernetes + Docker)

from enhancements.

rhatdan avatar rhatdan commented on August 11, 2024

We can not be locked into Docker forever. Docker made some choices that were not correct, and we need to be able to evolve.

Google is not using what Docker did, they are using gvisor containers. Amazon is using Firecracker, Alibaba is using Kata. We need to be able to evolve past what Docker decided 7 years ago.

The important thing is that the OCI images stored at container registries are able to run, that we are able to pass all of the Kubernetets test suites. If we are able to handle additional workloads that Kubernetes upstream can not, that is not a bad thing.

from enhancements.

cgwalters avatar cgwalters commented on August 11, 2024

Totally agree! Again I'm not saying we should block on anything, just we should have at least some slightly formal process before we expose new APIs/attack surfaces to containers by default.

from enhancements.

rhatdan avatar rhatdan commented on August 11, 2024

@haircommander Could you reopen your PR. We now have the issue with OpenShift.

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

it is not clear we've figured out the best route. the options are:

  • have cri-o unconditionally set it if /dev/fuse is mounted
  • add an additional device in MCO
  • add it as an additional device in the RPM

to me, the second option seems most transparent, as well as idiomatic

from enhancements.

rhatdan avatar rhatdan commented on August 11, 2024

I am fine with 2 or 3

from enhancements.

rhatdan avatar rhatdan commented on August 11, 2024

@runcom WDYT?

from enhancements.

dustymabe avatar dustymabe commented on August 11, 2024

I'm a bit late to this party, but just wondering if we could just have containers that need /dev/fuse request it like can be done for /dev/kvm today with device-plugins. See https://kubevirt.io/2018/KVM-Using-Device-Plugins.html

from enhancements.

dustymabe avatar dustymabe commented on August 11, 2024

But maybe that's not desired because it would require the user to specify they wanted that and it we want it to just be there by default?

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

yeah a user can already ask to have /dev/fuse bind mounted into the container.

The currently proposed approach is we're mixing a user namespace capable RuntimeClass with the needs of a builder pod. We'll be gating the user namespace annotations to a specific RuntimeClass (so it can be gated by scc easily). We can give that RuntimeClass /dev/fuse unconditionally, which will allow builder pods to use it. That won't give it to every pod, but theoretically most of the pods that need it

from enhancements.

jskov-jyskebank-dk avatar jskov-jyskebank-dk commented on August 11, 2024

yeah a user can already ask to have /dev/fuse bind mounted into the container

Interesting, @haircommander. How does that work?

I spent a fairly long time trying to get exactly that working on OCP this summer, and had to settle for VFS in my (non-privileged) pods. The saga is here: containers/podman#6667

I would super appreciate it, if you could be very specific, so I (n00b-level) can understand it :)

Cheers!

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

... it turns out I may have spoken without fully understanding. it was my impression that one could ask /dev/fuse to be mounted into the container and it would all magically work, but I've never actually done it (thus I am on the n00b-level 😄 ). I think my impression came from @rhatdan's comment above describing how a user can pass /dev/fuse to the additional_devices field, which is not allowed in OCP. I'm going to play with it a bit, but feel free to assume I was incorrect--I am doing so

from enhancements.

dustymabe avatar dustymabe commented on August 11, 2024

I spent a fairly long time trying to get exactly that working on OCP this summer, and had to settle for VFS in my (non-privileged) pods. The saga is here: containers/podman#6667

@jskovjyskebankdk - were you able to get /dev/fuse to work for privileged pods? I'm trying to mount /dev/fuse/ in via hostPath and I can't get that to work even with privileged. I still get fuse: failed to open /dev/fuse: Operation not permitted.

I do think if I get proper /dev/fuse/ access I might not even need privileged if I execute and unshare first. See these two (1 2) comments for why.

from enhancements.

jskov-jyskebank-dk avatar jskov-jyskebank-dk commented on August 11, 2024

@dustymabe I tried, but it did not work.

I think I recall @rhatdan commenting in some other issues, that /dev/fuse should not be mounted into the container, but created.

from enhancements.

rhatdan avatar rhatdan commented on August 11, 2024

Yes from an SELinux point of view adding /dev/fuse via the spec file gets it labeled correctly, volume mounting it in gives it the hosts label. Also works better with User Namespace.

from enhancements.

jskov-jyskebank-dk avatar jskov-jyskebank-dk commented on August 11, 2024

I can now use /dev/fuse in a pod, running privileged.

Not sure if it is because we now run 4.4, or because I made a mistake the last time I tried this.

I tried running a privileged pod while dropping capabilitiies SYS_ADMIN and others. But it does not seem to matter when running privileged. (maybe because that ultimately means running as the host's user id 0?)

@rhatdan can you think of a way for the admins hosting our OCP instance to tweak the (host) SELinux prefs, so that Pods on OCP can use a hostmounted /dev/fuse? Without running privileged that is.
In other words: can we hack our way to the feature this issue describes?

We switch to a new 4.6 cluster next week or so, if that makes a difference in any way.

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

@jskovjyskebankdk you can add /dev/fuse:/dev/fuse to the default_mounts_file (/etc/containers/mounts.conf) pass in /dev/fuse to all containers run by CRI-O. In 4.7 we're going a different and finer grained route, but that should work for the time being if you don't mind having all containers have access to /dev/fuse

from enhancements.

jskov-jyskebank-dk avatar jskov-jyskebank-dk commented on August 11, 2024

Thank you @haircommander !

Looking forward to 4.7, and will probably try this on our current platform.

Cheers!

from enhancements.

openshift-bot avatar openshift-bot commented on August 11, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

from enhancements.

zvonkok avatar zvonkok commented on August 11, 2024

Any update on this? I have a use-case for a custom build-strategy with buildah in shipwright. @adambkaplan could we make this (non-privileged builds) also "work" in shipwright on OpenShift?

from enhancements.

zvonkok avatar zvonkok commented on August 11, 2024

/remove-lifecycle stale

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

we have moved to enable this when a pod specifically requests a user namespace: i.e: gated based on runtime class

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

I would say this can be closed

from enhancements.

jskov-jyskebank-dk avatar jskov-jyskebank-dk commented on August 11, 2024

@haircommander sorry to reach out from a closed issue.

But the OCP enhancement you linked seems to be stale. At least it did not make it in time for OCP 4.7.

Is there a way to make per-Pod /dev/fuse mounts in OCP 4.7 (which we run now)?
Or plans (elsewhere) for that to be possible in a later release?

Thanks!

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

You'd want to use the allowed_annotations capability of cri-o https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md
Specifically, if you add the file

[crio.runtime.runtimes.fuse]
runtime_path = "/usr/bin/runc"
runtime_root = "/run/runc"
runtime_type = "oci"
allowed_annotations = [
    "io.kubernetes.cri-o.Devices",
]

to /etc/crio/crio.conf.d

you can specify the runtime class as fuse, and add an annotation:
io.kubernetes.cri-o.Devices: "/dev/fuse"

which will instruct cri-o to add it as a device

from enhancements.

jskov-jyskebank-dk avatar jskov-jyskebank-dk commented on August 11, 2024

Thanks!

Will it take care of the SELinux labelling?

from enhancements.

haircommander avatar haircommander commented on August 11, 2024

I believe so, though I haven't personally checked

from enhancements.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.