Comments (32)
I am leaving this for posterity in the event someone is trying to test this and would like a reference:
- base64 encoding for MachineConfig
$ base64 --wrap=0 <<EOF
> [crio.runtime.runtimes.fuse]
> runtime_path = "/usr/bin/runc"
> runtime_root = "/run/runc"
> runtime_type = "oci"
> allowed_annotations = [
> "io.kubernetes.cri-o.Devices",
> ]
> EOF
W2NyaW8ucnVudGltZS5ydW50aW1lcy5mdXNlXQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsKICAgICJpby5rdWJlcm5ldGVzLmNyaS1vLkRldmljZXMiLApdCg==
- MachineConfig that extends crio functionality
$ cat fuse-MachineConfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-worker-crio-fuse
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- path: /etc/crio/crio.conf.d/99-crio-fuse.conf
overwrite: true
contents:
source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS5ydW50aW1lcy5mdXNlXQpydW50aW1lX3BhdGggPSAiL3Vzci9iaW4vcnVuYyIKcnVudGltZV9yb290ID0gIi9ydW4vcnVuYyIKcnVudGltZV90eXBlID0gIm9jaSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsKICAgICJpby5rdWJlcm5ldGVzLmNyaS1vLkRldmljZXMiLApdCg==
- Create MachineConfig
$ oc create -f fuse-MachineConfig.yaml
machineconfig.machineconfiguration.openshift.io/crio-fuse created
- RuntimeClass for fuse that will be used by pod
$ cat fuse-RuntimeClass.yaml
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: fuse
handler: fuse
- Create RuntimeClass
$ oc create -f fuse-RuntimeClass.yaml
- Sample pod
$ cat fuse-podSpec.yaml
apiVersion: v1
kind: Pod
metadata:
name: fuse-test
annotations:
io.kubernetes.cri-o.Devices: "/dev/fuse"
spec:
containers:
- name: fuse-test
image: ubi8/ubi-minimal
command: ['sh', '-c', 'echo "Hello from user $(id -u)" && sleep infinity']
runtimeClassName: fuse
- Create pod
$ oc create -f fuse-podSpec.yaml
- Check for /dev/fuse within pod
$ oc exec -it fuse-test -- /bin/ls -l /dev/fuse
crw-rw-rw-. 1 root root 10, 229 Feb 4 15:41 /dev/fuse
If it is necessary to call mount()
within the pod to access a filesystem via /dev/fuse
then the SYS_ADMIN capability must be extended to the pod and an appropriate SCC provisioned. Below is an example of such an SCC and pod specification.
$ cat fuse-SCC.yaml
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities:
- SYS_ADMIN
allowedUnsafeSysctls:
- '*'
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups:
- system:cluster-admins
- system:nodes
- system:masters
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: Test scc for fuse mounts.
name: fuse-scc
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
type: RunAsAny
seLinuxContext:
type: MustRunAs
seccompProfiles:
- '*'
supplementalGroups:
type: RunAsAny
users:
- system:admin
- system:serviceaccount:builder
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
$ cat fuse-podSpec.yaml
apiVersion: v1
kind: Pod
metadata:
name: fuse-test-fuse-scc
annotations:
io.kubernetes.cri-o.Devices: "/dev/fuse"
spec:
containers:
- name: fuse-test-fuse-scc
image: registry.access.redhat.com/ubi8/ubi:8.5-226
command: ['sh', '-c', 'echo "Hello from user $(id -u)" && sleep infinity']
securityContext:
capabilities:
add: ["SYS_ADMIN"]
runtimeClassName: fuse
$ oc create -f fuse-SCC.yaml
$ oc create -f fuse-podSpec.yaml
from enhancements.
It sounds like this change was already merged in cri-o/cri-o#3822
My feeling on this though is that changes to the default container "attack surface" and available APIs needs a bit more of a formal process. I think it would be worth making this a full enhancement.
from enhancements.
So this is the equivalent of podman run --device /dev/fuse
on by default, right? Except because our default SELinux policy denies dynamic kernel module loads, you are proposing to unconditionally load the module on system bootup?
from enhancements.
Yes.
from enhancements.
that PR was closed, not merged, because we've been squabbling on the approach 😃
from enhancements.
that PR was closed, not merged, because we've been squabbling on the approach smiley
Ah right, ok. So basically I think some squabbling on this is warranted because while I'd agree it's probably mostly safe it's still a notable change. (For example, if we turn this on by default it becomes another thing different from e.g. upstream Kubernetes + Docker)
from enhancements.
We can not be locked into Docker forever. Docker made some choices that were not correct, and we need to be able to evolve.
Google is not using what Docker did, they are using gvisor containers. Amazon is using Firecracker, Alibaba is using Kata. We need to be able to evolve past what Docker decided 7 years ago.
The important thing is that the OCI images stored at container registries are able to run, that we are able to pass all of the Kubernetets test suites. If we are able to handle additional workloads that Kubernetes upstream can not, that is not a bad thing.
from enhancements.
Totally agree! Again I'm not saying we should block on anything, just we should have at least some slightly formal process before we expose new APIs/attack surfaces to containers by default.
from enhancements.
@haircommander Could you reopen your PR. We now have the issue with OpenShift.
from enhancements.
it is not clear we've figured out the best route. the options are:
- have cri-o unconditionally set it if /dev/fuse is mounted
- add an additional device in MCO
- add it as an additional device in the RPM
to me, the second option seems most transparent, as well as idiomatic
from enhancements.
I am fine with 2 or 3
from enhancements.
@runcom WDYT?
from enhancements.
I'm a bit late to this party, but just wondering if we could just have containers that need /dev/fuse
request it like can be done for /dev/kvm
today with device-plugins. See https://kubevirt.io/2018/KVM-Using-Device-Plugins.html
from enhancements.
But maybe that's not desired because it would require the user to specify they wanted that and it we want it to just be there by default?
from enhancements.
yeah a user can already ask to have /dev/fuse bind mounted into the container.
The currently proposed approach is we're mixing a user namespace capable RuntimeClass with the needs of a builder pod. We'll be gating the user namespace annotations to a specific RuntimeClass (so it can be gated by scc easily). We can give that RuntimeClass /dev/fuse unconditionally, which will allow builder pods to use it. That won't give it to every pod, but theoretically most of the pods that need it
from enhancements.
yeah a user can already ask to have /dev/fuse bind mounted into the container
Interesting, @haircommander. How does that work?
I spent a fairly long time trying to get exactly that working on OCP this summer, and had to settle for VFS in my (non-privileged) pods. The saga is here: containers/podman#6667
I would super appreciate it, if you could be very specific, so I (n00b-level) can understand it :)
Cheers!
from enhancements.
... it turns out I may have spoken without fully understanding. it was my impression that one could ask /dev/fuse to be mounted into the container and it would all magically work, but I've never actually done it (thus I am on the n00b-level 😄 ). I think my impression came from @rhatdan's comment above describing how a user can pass /dev/fuse
to the additional_devices
field, which is not allowed in OCP. I'm going to play with it a bit, but feel free to assume I was incorrect--I am doing so
from enhancements.
I spent a fairly long time trying to get exactly that working on OCP this summer, and had to settle for VFS in my (non-privileged) pods. The saga is here: containers/podman#6667
@jskovjyskebankdk - were you able to get /dev/fuse
to work for privileged pods? I'm trying to mount /dev/fuse/
in via hostPath and I can't get that to work even with privileged
. I still get fuse: failed to open /dev/fuse: Operation not permitted
.
I do think if I get proper /dev/fuse/
access I might not even need privileged
if I execute and unshare
first. See these two (1 2) comments for why.
from enhancements.
@dustymabe I tried, but it did not work.
I think I recall @rhatdan commenting in some other issues, that /dev/fuse should not be mounted into the container, but created.
from enhancements.
Yes from an SELinux point of view adding /dev/fuse via the spec file gets it labeled correctly, volume mounting it in gives it the hosts label. Also works better with User Namespace.
from enhancements.
I can now use /dev/fuse in a pod, running privileged.
Not sure if it is because we now run 4.4, or because I made a mistake the last time I tried this.
I tried running a privileged pod while dropping capabilitiies SYS_ADMIN and others. But it does not seem to matter when running privileged. (maybe because that ultimately means running as the host's user id 0?)
@rhatdan can you think of a way for the admins hosting our OCP instance to tweak the (host) SELinux prefs, so that Pods on OCP can use a hostmounted /dev/fuse? Without running privileged that is.
In other words: can we hack our way to the feature this issue describes?
We switch to a new 4.6 cluster next week or so, if that makes a difference in any way.
from enhancements.
@jskovjyskebankdk you can add /dev/fuse:/dev/fuse
to the default_mounts_file
(/etc/containers/mounts.conf) pass in /dev/fuse
to all containers run by CRI-O. In 4.7 we're going a different and finer grained route, but that should work for the time being if you don't mind having all containers have access to /dev/fuse
from enhancements.
Thank you @haircommander !
Looking forward to 4.7, and will probably try this on our current platform.
Cheers!
from enhancements.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
from enhancements.
Any update on this? I have a use-case for a custom build-strategy with buildah in shipwright. @adambkaplan could we make this (non-privileged builds) also "work" in shipwright on OpenShift?
from enhancements.
/remove-lifecycle stale
from enhancements.
we have moved to enable this when a pod specifically requests a user namespace: i.e: gated based on runtime class
from enhancements.
I would say this can be closed
from enhancements.
@haircommander sorry to reach out from a closed issue.
But the OCP enhancement you linked seems to be stale. At least it did not make it in time for OCP 4.7.
Is there a way to make per-Pod /dev/fuse mounts in OCP 4.7 (which we run now)?
Or plans (elsewhere) for that to be possible in a later release?
Thanks!
from enhancements.
You'd want to use the allowed_annotations
capability of cri-o https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md
Specifically, if you add the file
[crio.runtime.runtimes.fuse]
runtime_path = "/usr/bin/runc"
runtime_root = "/run/runc"
runtime_type = "oci"
allowed_annotations = [
"io.kubernetes.cri-o.Devices",
]
to /etc/crio/crio.conf.d
you can specify the runtime class as fuse, and add an annotation:
io.kubernetes.cri-o.Devices: "/dev/fuse"
which will instruct cri-o to add it as a device
from enhancements.
Thanks!
Will it take care of the SELinux labelling?
from enhancements.
I believe so, though I haven't personally checked
from enhancements.
Related Issues (20)
- need okd4.x hello-world example and suppport webui HOT 4
- Integration with 5G network (MEC-like functionality) HOT 4
- Geo redundancy feasible distance & solutions HOT 4
- [RFE] Operators Health Metric HOT 16
- SSO logout redirect uri new spec HOT 6
- [RFE] Sidecar container specification in the Secondary Scheduler Operator HOT 4
- Add the ability to install via dnf in fedora HOT 4
- vSphere IPI Zonal - Streched Clusters HOT 12
- `make lint` doesnt work HOT 4
- API counts HOT 5
- Openshift-installer support for IBM Cloud HOT 5
- template-lint.sh can't handle committed file renames HOT 10
- arm64 installer files for OKD4 HOT 4
- Enhancement necessary for resource pool support on vSphere? HOT 5
- Improve markdown linter; Optimize for human time vs brittle manual markdown formatting. HOT 4
- Default tls termination configuration for user-created routes HOT 4
- OKD machineset with "Highly Available" option in oVirt HOT 5
- Support for SOCA AWS MachinePools to support EC2 Spot Best practices HOT 4
- Support or information on more cluster configurations HOT 4
- Allow Users To List Own Groups HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enhancements.