nirmata / kube-netc Goto Github PK
View Code? Open in Web Editor NEWA Kubernetes eBPF network monitor
License: Apache License 2.0
A Kubernetes eBPF network monitor
License: Apache License 2.0
from a test system:
ubuntu@ip-10-10-129-78:~$ kubectl get pods -A | grep kube-netc
default kube-netc-5hrkd 1/1 Running 25 3d7h
ubuntu@ip-10-10-129-78:~$
ubuntu@ip-10-10-129-78:~$ kubectl describe pod kube-netc-5hrkd
Name: kube-netc-5hrkd
Namespace: default
Priority: 0
Node: ip-10-10-129-78.us-west-1.compute.internal/10.10.128.238
Start Time: Thu, 11 Jun 2020 16:09:21 +0000
Labels: controller-revision-hash=9f5c6789c
name=kube-netc
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.244.0.138
IPs: <none>
Controlled By: DaemonSet/kube-netc
Containers:
kube-netc:
Container ID: docker://2650292745d9226bc5fb185c03cdae315d3a72cfe5551a41d981bc1c696fee33
Image: nirmata/kube-netc
Image ID: docker-pullable://nirmata/kube-netc@sha256:6b754b4a759d7ca992a3b464d96613f1db3851d679fbbd63b63c23ced8714e21
Port: 2112/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 14 Jun 2020 21:07:34 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 14 Jun 2020 19:07:26 +0000
Finished: Sun, 14 Jun 2020 21:07:32 +0000
Ready: True
Restart Count: 25
Environment: <none>
Mounts:
/sys/fs/bpf from bpf (rw)
/sys/fs/cgroup from cgroup (rw)
/sys/kernel/debug from debug (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-6dwlr (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
bpf:
Type: HostPath (bare host directory volume)
Path: /sys/fs/bpf
HostPathType:
cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
debug:
Type: HostPath (bare host directory volume)
Path: /sys/kernel/debug
HostPathType:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
default-token-6dwlr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-6dwlr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events: <none>
kube-netc has been using a fork of datadog/datadog-agent's ebpf library because of some mismatched functions that hadn't been committed upstream to iovisor/gobpf. Their ebpf library does work with their fork datadog/gobpf. Some changes are now needed to fix a bug with ebpf:
2020/06/04 21:27:27 [2020-06-04 21:27:27.10331117 +0000 UTC m=+0.023524112] error: could not enable kprobe(kprobe/tcp_get_info) used for offset guessing: cannot write "p:ptcp_get_info tcp_get_info\n" to kprobe_events: write /sys/kernel/debug/tracing/kprobe_events: file exists
This issue is referenced. I need to bring my fork up to date to fix these changes until their gobpf fork is merged upstream.
The tracker package has never been formally verified to give reasonable results for the stats that are being calculated.
The package needs to be run and tested to make sure that no connection is reporting, specifically, impossibly large bytes/second.
Hi! I was trying out kube-netc by following the Getting Started guide. I tried using both minikube
and kind
clusters but did not see the kube-netc pod listed after installing the daemon set.
I did receive a warning though after installing the daemon set which could be the reason for this. I've attached a screenshot of my terminal showing the exact commands I ran. Please let me know if I messed up something while setting up if that's the case. Thanks.
add labels like:
We may also want to include standard labels: https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/#labels
See proposal at:
https://docs.google.com/document/d/1WL98CLbDcdFY65Sg2Dx0AQB3tguZfw7LZYJlZHr3vm4/edit?pli=1
I installed kube-netc in on of our clusters and checked the metrics. I noticed that bytes_recv being reported for pods that don't have any ports.
e.g. nirmata-cni-installer-5fkmp pod is part of a daemonset and does not have any ports configured. Also 10.10.1.210:2379 is actually etcd container running on the node (outside kubernetes)
Also, another observation is that there are multiple records between the same source and destination IP. This increases the size of the metrics data creating scale issues with prometheus.
bytes_recv{destination_address="10.10.1.210:2379",destination_pod_name="nirmata-cni-installer-5fkmp",source_address="10.10.1.210:49080",source_pod_name="nirmata-cni-installer-5fkmp"} 459
bytes_recv{destination_address="10.10.1.210:2379",destination_pod_name="nirmata-cni-installer-5fkmp",source_address="10.10.1.210:49082",source_pod_name="nirmata-cni-installer-5fkmp"} 1.791227e+06
bytes_recv{destination_address="10.10.1.210:2379",destination_pod_name="nirmata-cni-installer-5fkmp",source_address="10.10.1.210:49084",source_pod_name="nirmata-cni-installer-5fkmp"} 787
bytes_recv{destination_address="10.10.1.210:2379",destination_pod_name="nirmata-cni-installer-5fkmp",source_address="10.10.1.210:49090",source_pod_name="nirmata-cni-installer-5fkmp"} 29955
bytes_recv{destination_address="10.10.1.210:2379",destination_pod_name="nirmata-cni-installer-5fkmp",source_address="10.10.1.210:49092",source_pod_name="nirmata-cni-installer-5fkmp"} 2026
DaemonSet spec (partial)
spec:
containers:
- image: index.docker.io/nirmata/nirmata-cni-installer:1.10
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- cat
- /run.sh
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: install-cni
readinessProbe:
exec:
command:
- cat
- /run.sh
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/cni/bin/
name: cni-bin
dnsPolicy: ClusterFirst
hostNetwork: true
imagePullSecrets:
- name: default-registry-secret
initContainers:
- command:
- chown
- -R
- 1000:1000
- /opt/cni/bin/
image: alpine:3.6
imagePullPolicy: IfNotPresent
name: take-data-dir-ownership
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/cni/bin/
name: cni-bin
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
volumes:
- hostPath:
path: /opt/cni/bin
type: ""
name: cni-bin
updateStrategy:
type: OnDelete
The quick start guide that we provide is quite is a bit long. To shorten it and provide a good way to demo kube-netc, we should put together a helm chart to package kube-netc, Prometheus and Grafana together for quick deployment.
Some suggested changes:
kube-netc exposes quite a lot of data in its current state. We were thinking about implementing a method to narrow down the metrics exposed to a given namespace. This can be done now at the end by querying the metrics for our source and destination namespace labels, but they still bog down Prometheus.
The proposed feature would halt the tracking of networking stats if it found that they are coming from an ip that is known to be outside a certain namespace.
The documentation listed in the README while still correct does not list the capabilities of the DaemonSet, and is not clear enough as to kube-netc's capabilities.
It looks like this repo has disappeared and the datadog-agent library depends on it.
This results in the following error when building the binary:
go build -tags="linux_bpf" -o main main.go
go: github.com/DataDog/[email protected] requires
github.com/DataDog/[email protected] requires
github.com/operator-framework/[email protected] requires
github.com/operator-framework/[email protected] requires
bitbucket.org/ww/[email protected]: reading https://api.bitbucket.org/2.0/repositories/ww/goautoneg?fields=scm: 404 Not Found
Makefile:24: recipe for target 'build' failed
make: *** [build] Error 1
Thanks to @mda590 for pointing this out to me.
I tried to install kube-netc on docker-desktop for mac, but the pod crashed.
kubectl -n kube-system get pod
kube-netc-pt2bz 0/1 CrashLoopBackOff 4 115s
Collected log:
kubectl -n kube-system logs -f kube-netc-pt2bz
Started clearing old kprobes...
Finished Clearing probes...
./entrypoint.sh: line 5: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 6: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 7: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 8: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 9: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 10: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 11: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 12: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 13: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 14: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 15: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 16: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 17: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 18: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 19: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 20: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 21: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 22: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 23: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
./entrypoint.sh: line 24: can't create /sys/kernel/debug/tracing/kprobe_events: nonexistent directory
[SERVER STARTED ON :2112]
W0619 17:59:09.432190 7 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
[NEW] Pod compose-api-6ffb89dc58-kzv8r added
Labels: map[com.docker.deploy-namespace:docker com.docker.fry:compose.api com.docker.image-tag:v0.4.25-alpha1 pod-template-hash:6ffb89dc58]
IP: 192.168.65.3
Pod Update: 192.168.65.3 -> compose-api-6ffb89dc58-kzv8r
[NEW] Pod coredns-5644d7b6d9-wxnwc added
Labels: map[k8s-app:kube-dns pod-template-hash:5644d7b6d9]
IP: 10.1.1.104
[NEW] Pod etcd-docker-desktop added
Labels: map[component:etcd tier:control-plane]
IP: 192.168.65.3
Pod Update: 10.1.1.104 -> coredns-5644d7b6d9-wxnwc
Pod Update: 192.168.65.3 -> etcd-docker-desktop
[NEW] Pod kube-apiserver-docker-desktop added
Labels: map[component:kube-apiserver tier:control-plane]
IP: 192.168.65.3
Pod Update: 192.168.65.3 -> kube-apiserver-docker-desktop
[NEW] Pod kube-proxy-wrs8l added
Labels: map[controller-revision-hash:759676c746 k8s-app:kube-proxy pod-template-generation:1]
IP: 192.168.65.3
[NEW] Pod metrics-server-64c4b5b584-pvhnb added
Labels: map[k8s-app:metrics-server pod-template-hash:64c4b5b584]
IP: 10.1.1.103
Pod Update: 192.168.65.3 -> kube-proxy-wrs8l
Pod Update: 10.1.1.103 -> metrics-server-64c4b5b584-pvhnb
[NEW] Pod cert-manager-webhook-5994b95c9f-5fgh9 added
Labels: map[app:webhook app.kubernetes.io/component:webhook app.kubernetes.io/instance:cert-manager app.kubernetes.io/managed-by:Helm app.kubernetes.io/name:webhook helm.sh/chart:cert-manager-v0.15.0 pod-template-hash:5994b95c9f]
IP: 10.1.1.97
Pod Update: 10.1.1.97 -> cert-manager-webhook-5994b95c9f-5fgh9
[NEW] Pod vpnkit-controller added
Labels: map[component:vpnkit-controller]
IP: 10.1.1.93
[NEW] Pod cert-manager-cainjector-589d59c486-bvp28 added
Labels: map[app:cainjector app.kubernetes.io/component:cainjector app.kubernetes.io/instance:cert-manager app.kubernetes.io/managed-by:Helm app.kubernetes.io/name:cainjector helm.sh/chart:cert-manager-v0.15.0 pod-template-hash:589d59c486]
IP: 10.1.1.99
Pod Update: 10.1.1.93 -> vpnkit-controller
Pod Update: 10.1.1.99 -> cert-manager-cainjector-589d59c486-bvp28
[NEW] Pod nginx-deployment-65f6d95869-vrjnp added
Labels: map[app:nginx pod-template-hash:65f6d95869]
IP: 10.1.1.91
Pod Update: 10.1.1.91 -> nginx-deployment-65f6d95869-vrjnp
[NEW] Pod compose-78f95d4f8c-7lnh4 added
Labels: map[com.docker.default-service-type: com.docker.deploy-namespace:docker com.docker.fry:compose com.docker.image-tag:v0.4.25-alpha1 pod-template-hash:78f95d4f8c]
IP: 10.1.1.92
[NEW] Pod kube-controller-manager-docker-desktop added
Labels: map[component:kube-controller-manager tier:control-plane]
IP: 192.168.65.3
Pod Update: 10.1.1.92 -> compose-78f95d4f8c-7lnh4
Pod Update: 192.168.65.3 -> kube-controller-manager-docker-desktop
[NEW] Pod nginx-6db489d4b7-xv2fg added
Labels: map[pod-template-hash:6db489d4b7 run:nginx]
IP: 10.1.1.96
Pod Update: 10.1.1.96 -> nginx-6db489d4b7-xv2fg
[NEW] Pod storage-provisioner added
Labels: map[component:storage-provisioner]
IP: 10.1.1.94
[NEW] Pod kube-netc-pt2bz added
Labels: map[controller-revision-hash:596d88f854 name:kube-netc pod-template-generation:1]
IP: 10.1.1.105
Pod Update: 10.1.1.94 -> storage-provisioner
Pod Update: 10.1.1.105 -> kube-netc-pt2bz
[NEW] Pod cert-manager-67489d9b9d-9gstz added
Labels: map[app:cert-manager app.kubernetes.io/component:controller app.kubernetes.io/instance:cert-manager app.kubernetes.io/managed-by:Helm app.kubernetes.io/name:cert-manager helm.sh/chart:cert-manager-v0.15.0 pod-template-hash:67489d9b9d]
IP: 10.1.1.98
[NEW] Pod nginx-695ff68dbd-jht8x added
Labels: map[pod-template-hash:695ff68dbd run:nginx]
IP: 10.1.1.95
Pod Update: 10.1.1.98 -> cert-manager-67489d9b9d-9gstz
Pod Update: 10.1.1.95 -> nginx-695ff68dbd-jht8x
[NEW] Pod nginx-695ff68dbd-8plxb added
Labels: map[allow-deletes:false pod-template-hash:695ff68dbd run:nginx]
IP: 10.1.1.101
Pod Update: 10.1.1.101 -> nginx-695ff68dbd-8plxb
[NEW] Pod coredns-5644d7b6d9-t5dr4 added
Labels: map[k8s-app:kube-dns pod-template-hash:5644d7b6d9]
IP: 10.1.1.102
[NEW] Pod kube-scheduler-docker-desktop added
Labels: map[component:kube-scheduler tier:control-plane]
IP: 192.168.65.3
Pod Update: 10.1.1.102 -> coredns-5644d7b6d9-t5dr4
Pod Update: 192.168.65.3 -> kube-scheduler-docker-desktop
[NEW] Pod kyverno-554dfcb678-d6xzf added
Labels: map[app:kyverno pod-template-hash:554dfcb678]
IP: 10.1.1.100
Pod Update: 10.1.1.100 -> kyverno-554dfcb678-d6xzf
2020/06/19 17:59:09 [2020-06-19 17:59:09.49324911 +0000 UTC m=+0.106880719] error: system-probe unsupported: debugfs is not mounted and is needed for eBPF-based checks, run "sudo mount -t debugfs none /sys/kernel/debug" to mount debugfs
Travis is showing a build error:
The kube-netc container does not currently run on Windows. At this point in time, this is not a major problem, however, for the future Windows should be supported.
This is the error that is presented when trying to bring the container up on Windows.
docker run --name kube-netc-server --rm -v /sys/kernel/debug:/sys/kernel/debug -v /sys/fs/cgroup:/sys/fs/cgroup -v /sys/fs/bpf:/sys/fs/bpf --privileged kube-netc
2020/04/29 03:47:13 [2020-04-29 03:47:13.822657118 +0000 UTC m=+0.034438890] error: system-probe unsupported: debugfs is not mounted and is needed for eBPF-based checks, run "sudo mount -t debugfs none /sys/kernel/debug" to mount debugfs
The logging provided by kube-netc is unstandardized and quite messy. We want to implement some standardized method of logging various information related to stats, warnings or helpful debugging notes depending on the users use case.
The frameworks that are being considered at this time are zap, klog, and potentially glog.
For stability reasons we want to move off of DataDog's main datadog-agent/ebpf package and use Nirmata's fork.
Kube-netc currently reports 5 key metrics that were decided on from the beginning. To expand the use of the project we are thinking about what kind of metrics would be useful to implement next, specifically as per this post from WeaveScope.
There are a few considerations holding us back from simply writing these into the tracker, mainly whether or not our eBPF library can supply the necessary stats.
When installing kube-netc you get a warning stating that the way the rbac authorization is implemented is going out of date.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.