Comments (7)
First of all, is your network device for the container or the sandbox? If it is for sandbox, how is the device passed to kata? If it is a network device, the device should theoretically be set to the netns of the pod through CNI, including the device and network address. If it is for the container, it should be passed through the device of oci spec? If it is passed through the device, who is responsible for setting your network address?
from kata-containers.
The VFIO network device is for the k8s Pod, the ovn-k8s CNI supposes to configure the network interfaces through Sandbox Hotplug API.
Isn't Sandbox Hotplug API for such kind of use case?
from kata-containers.
The VFIO network device is for the k8s Pod, the ovn-k8s CNI supposes to configure the network interfaces through Sandbox Hotplug API.
Isn't Sandbox Hotplug API for such kind of use case?
Hi @l8huang
Yes, sandbox hotplug api was used for device hotplug. But for network interface, kata hasn't support netwrok device hotplug. Compared to other devices such as block/gpu etc, kata could get the device type thus it would specify the proper operation in guest. But for network devices, kata not only needs to obtain the type of device, but also needs to know the network address and other information of the device. Therefore, currently kata only goes to pod netns to obtain all network endpoints when starting a pod. According to the obtained The network endpoint type determines how to add network devices to kata.
Regarding your question, what I want to confirm is how the VFIO network device here is passed to kata runtime? Only by knowing how your network device information is transmitted can we decide how to hot-plug your network device.
from kata-containers.
@l8huang BTW, do you mean the network device info passed to kata runtime using ENV just as #9605 said?
from kata-containers.
Thanks for your reply.
how the VFIO network device here is passed to kata runtime?
Long story short, the VFIO network device info is in OCI spec config.json when Sandbox.CreateContainer() is called, e.g.:
"linux": {
"devices": [
{
"path": "/dev/vfio/vfio",
"type": "c",
"major": 10,
"minor": 196,
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/vfio/130",
"type": "c",
"major": 238,
"minor": 3,
"fileMode": 384,
"uid": 0,
"gid": 0
}
],
Then GetAllVFIODevicesFromIOMMUGroup()
is called to get VFIODev
when attaching the device. The env PCIDEVICE_<prefix>_<resource-name>_INFO
in #9605 is not relevant here.
But for network devices, kata not only needs to obtain the type of device, but also needs to know the network address and other information of the device.
The network address and route config for the VFIO interface are in Pod's annotations:
k8s.ovn.org/pod-networks: '{
"default":{
"ip_addresses":["10.1.2.62/26"],
"mac_address":"0a:58:0a:c0:02:3e",
"gateway_ips":["10.1.2.1"],
"routes":[
{"dest":"10.1.0.0/16","nextHop":"10.1.2.1"},
{"dest":"10.2.0.0/16","nextHop":"10.1.2.1"},
{"dest":"10.3.0.0/16","nextHop":"10.1.2.1"}],
"mtu":"1500",
"ip_address":"10.1.2.62/26",
"gateway_ip":"10.1.2.1"}}'
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "default/ovn-primary-vfio",
"interface": "eth0",
"ips": [
"10.1.2.62"
],
"mac": "0a:0a:0a:c0:02:3e",
"default": true,
"dns": {},
"device-info": {
"type": "pci",
"version": "1.0.0",
"pci": {
"pci-address": "0000:84:02.1"
}
}
}]
If Sandbox Hotplug API is no longer exposed to external controllers, then kata runtime need to interpret above annotations and invokes corresponding APIs to config network in guest VM. In this case, the problem becomes determining which annotations the Kata runtime should look at to gather network configurations(k8s.ovn.org/pod-networks
+ k8s.ovn.org/pod-networks
in case of using ovn-kubernetes.
Another way is expose Sandbox Hotplug API, so CNI can call them to config the network in guest VM.
Please kindly let me know how to proceed, thanks.
from kata-containers.
Hi @l8huang
I roughly understand your needs.
The CNI currently supported by Kata creates the network during the create sandbox stage, and sets the network to pod netns. Then, kata scans network devices from netns when creating pause container, and cold-plugs the scanned network devices into the hypervisor.
What I need to confirm is that the VFIO network device and corresponding network address and other information generated by your CNI are all generated when the business container is created? Isn't it generated when creating sandbox?
from kata-containers.
Indeed, the allocation of the VFIO network device(done by kubelet and sriov network device plugin) and the corresponding network address(done by CNI) occurs before containerd creates the sandbox. But the VFIO device info presents in the first container's OCI spec when it being created.
We have a mutation webhook to check VFIO device resource requirement based on a Pod's network config, for example, below annotation set Pod's primary network interface to a VFIO device:
v1.multus-cni.io/default-network: default/ovn-primary-vfio
The mutation webhook parses the annotation, and patches 1st container's resources as below:
resources:
limits:
nvidia.com/bf2_vfio: "1"
requests:
nvidia.com/bf2_vfio: "1"
At the Pod level, there are no resource settings, so the resources setting for the first container is patched, even though the VFIO network device is intended for the Pod's network interface.
from kata-containers.
Related Issues (20)
- There is no /dev/disk directory in the container HOT 1
- ci: k0s: With the update to k0s to 1.30.x, required CIs started failing
- runtime-rs: ctr run --runtime=io.containerd.kata-dragonball.v2 load TOML config failed HOT 2
- kata-deploy: tdx_not_supported_warning: command not found
- cached components: Ensure `tools/packaging/kata-deploy/local-build/*` are taken into consideration and invalidate the cache if those are changed
- agent: runtime error when AGENT_POLICY is enabled on s390x HOT 1
- ResizeMemory in runtime for CH missing check
- Prerequisites for switching runtime-rs to default runtime
- ci: Add workflow linting to CI
- Pod without command or arguments is running forever HOT 3
- genpolicy: support raw block devices
- [RFC] New tests for shared_fs=none HOT 11
- CI: Wrong arch image pulled for initramfs-cryptsetup
- metrics: Launch times test fails erratically
- k8s: Check custom dns test is consistently failing on confidential tests HOT 3
- k8s: guest-pull: Kill all processes in container test fails when pulling the image inside the guest HOT 1
- k8s: guest-pull: "Liveness probe" test "fails"
- k8s: guest-pull: "Setting sysctl" test fail
- k8s: guest-pull: "Test readonly volume for pods" fails
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kata-containers.